"We really believe that streaming is the way the world is going. Instead looking at data from two months or two years ago, the data you really care about is happening right now," said Tom Kershaw, director of product management for the Google Cloud Platform.
To this end, Google has launched a real-time data processing engine called Google Cloud Dataflow, first announced a year ago. It has also added new features to its BigQuery analysis tool, introduced in 2010. The two cloud services can be used together to facilitate the real-time processing of large amounts of data, Kershaw said.
Now available as a beta, Google Cloud Dataflow provides the ability to analyze data as it comes from a live stream of updates. Google takes care of all the hardware provisioning and software configuration, allowing users to ramp up the service without worrying about the underlying infrastructure. The service can also analyze data already stored on disk, in batch mode, allowing an organization to mix historical and current analysis in the same workflow.
The service provides a way "for any Java or Python programmer to write applications using big data," Kershaw said. "It makes it easy to run end-to-end jobs across very complex data sets."
In addition to moving Cloud DataFlow into an open beta program, Google also updated its BigQuery service.
BigQuery provides a SQL (Structured Query Language) interface for large unstructured datasets. SQL is commonly used for traditional relational databases, so it is almost universally understood by database administrators. With this update, Google has improved the service so it can now ingest up to 100,000 rows per second per table.
The company has expanded the footprint of BigQuery so European customers can now use the service. BigQuery data can be stored in Google European data centers, which will help organizations that need to meet the European Union's data sovereignty regulations.
The company has also added row-level permissions to BigQuery, which can limit the accessibility of information based on the user's credentials. This allows organizations to protect portions of the data, such as names and addresses, while allowing wider access to other portions, such as anonymous purchase history, to be used for research or other purposes.
BigQuery and Dataflow can be used in conjunction with each other, Kershaw said. "The two are very much aligned. You can use Cloud Dataflow for processing and BigQuery to analyze," he said.