Enterprise data sets have gotten so voluminous that they can't fit into even the largest data warehouses anymore, many businesses find. Now, companies running these overstuffed data stores have an on-ramp to newfangled, big-data style processing through a combined effort between analytics systems supplier Teradata and Apache Hadoop distribution provider MapR.
The partnership is aimed at letting users of systems based on the Teradata Unified Data Architecture seamlessly use MapR's distribution of the Hadoop open-source software framework for the distributed processing of big data.
Combining the control tools and support of Teradata -- a long-familiar name to enterprises -- with a commercially refined Hadoop distribution such as MapR's offers organizations a potentially easy way to incorporate big data analysis in their operations, without the administrative headaches of setting up and running Hadoop from scratch, Teradata says.
The partnership will also benefit the dozens of enterprises already using both MapR and Teradata.
Teradata software such as Teradata QueryGrid and Teradata Loom, designed to orchestrate work processes, will work with the MapR software. For the integration work, Teradata has prepared a connector to MapR. This allows organizations that have QueryGrid to use MapR to process data from Teradata databases and other sources.
The two companies have also reconciled their roadmaps so their respective products can continue to be integrated with each other.
Pairing the technologies from the two companies could be beneficial in a number of ways, said Jack Norris, MapR chief marketing officer.
A Hadoop distribution can store massive amounts of data that could be analyzed on the fly, either through a file process or through a NoSQL database query. If an organization finds a subset or aggregation of the data to be particularly useful, it could then routinely copy that material into a data warehouse for faster and more structured analysis.
"Teradata systems typically have a density of high-value data," Norris said. "MapR and Hadoop fits in where the density is of a different construct: It is typically an unknown or unproven density of data. We're collecting all of the Web logs or six years of data, and you can selectively do the transformations and upload that into a data warehouse."
Conversely, data that is currently being stored in a data warehouse that is no longer consulted as frequently could be moved off to lower-cost commodity storage servers running the Hadoop File System (HDFS), Norris said.
A leader in the field of high performance data warehouse products, Teradata has been expanding the scope of its technology to include sources outside of data warehouses. Data warehouses are used to collect data from databases, to be scrutinized with complex analysis. The Apache Hadoop data processing platform, often packaged in commercial releases by companies such as MapR, can hold vast reams of data, typically more than can be stored in a data warehouse.
Extending data warehouse and other database tools to incorporate the relatively new Hadoop technologies is becoming an increasingly common strategy to introduce Hadoop to the enterprise. Earlier this week, Hewlett-Packard announced that it had integrated its Vertica columnar oriented database with Hadoop, allowing users to query Hadoop databases with the widely used SQL (Structured Query Language).
Although not a Teradata customer, security services company Solutionary has used MapR to expand its analysis capabilities beyond traditional database analysis tools, while cutting hardware and software costs.
The company has two sets of customer data: It stores all the security and events logs from its customers networks on a set of file servers, and keeps another set of metadata about these events on a data mart running on Oracle Real Application Cluster. The company used MapR to merge these two sets of data together. MapR also provided a way to use commodity storage to keep hardware and software licensing costs down, while giving the company more computational power to do predictive modeling. Using Hadoop also allows Solutionary to offer additional features for their customers, such as a log search.
"It was a real natural fit for where we were at and where we wanted to go," said Scott Russmann, Solutionary's director of software engineering.
Teradata will provide full technical support for its customers using MapR. The company will also offer consulting services to help customers set up their Hadoop distributions, prepare the data for Hadoop analysis, and develop a set of analysis tools.
Image: Barbara Piuma, Flickr