Despite some lingering technology issues, Hadoop is ready for enterprise use, IT executives said at the Hadoop World conference.
Larry Feinsmith, managing director at JP Morgan Chase, told a keynote audience that the financial services firm has been using the open source storage and data analysis framework for close to three years now and is currently leveraging the technology for fraud detection, IT risk management, self service and other applications.
Chase still relies heavily on core relational database technologies for transaction processing, but uses Hadoop-based products for a growing number of tasks, Feinsmith said. Five out of seven Chase business units use Hadoop in some way, he added.
Hadoop's ability to store vast volumes of unstructured data has allowed Chase to collect and store weblogs, transaction data and social media data, Feinsmith said. The company is aggregating the data into a common platform, and runs a range of customer-focused data mining and data analytics applications to utilise it, he said.
With over 150 petabytes of online storage, 30,000 databases and 3.5 billion logins to Chase user accounts, data is the lifeblood of the company, Feinsmith said.
For the moment at least, relational database technologies appear to be more suited for running transaction applications, he said.
Transactions are the key
The big debate among technologists at the bank right now is whether incumbent relational database technologies will evolve to meet the bank's emerging big data needs, or Hadoop-based technology can become adept at transaction processing, Feinsmith said.
Hugh Williams, vice president of experience, search and platforms at eBay, said that the auction site is revamping its core search engine technology using Hadoop and Hbase, a technology that enables real-time analysis of data in Hadoop environments.
The new eBay search engine, codenamed Cassini, will replace the Voyager technology that's been used since the early 2000s. The update is needed in part due to surging volumes of data that needs to be managed, Williams said.
Williams said that eBay currently has more than 97 million active buyers and sellers and over 200 million items across 50,000 categories for sale. The auction site handles close to 2 billion page views, 250 million search queries and tens of billions of database calls each day, he said. The company has 9 petabytes of data stored on Hadoop and Teradata clusters, and the amount of data is growing quickly.
Hadoop and Hbase allow eBay to build a far more sophisticated search engine than Voyager. Cassini will deliver more accurate and more context-based results to user search queries, he said.
With more than 100 engineers assigned to Project Cassini full time, the development effort is one of the largest ever at eBay. Cassini is expected to go live next year.
Data centre star
Hadoop allows companies to store and manage far bigger volumes of structured and unstructured data than can be managed affordably by today's relational database management systems.
Large web companies like Yahoo and Google have been using Hadoop for several years, but the open source technology has only recently started to attract the attention of enterprise IT executives.
"Hadoop is no longer ancillary in the data centre. It is the place where data goes first" in a growing number of instances, said Mike Olson, CEO of Cloudera, the host of Hadoop World. Increasingly, companies are using Hadoop to collect, aggregate and share very large volumes of data from multiple, disparate sources, he said.
"We have spent three years talking speeds and feeds. We have spent three years saying Hadoop is happening," Olson said. "What we will see going forward is much more innovation around business focused solutions,"
According to Feinsmith, there are several considerations that enterprises need to keep in mind when deploying Hadoop. The marketplace for the technology is still "very confusing" with oft-changing Hadoop vendors, products and standards.
In addition, companies considering Hadoop must be sure that it can integrate with their existing IT investments, he said. The relative of lack of skilled hadoop engineers is also a concern, Feinsmith said.
And the massive data aggregation enabled by Hadoop can raise concerns related to security, data access, data entitlement, monitoring, high availability and business continuity, he said. Related technologies such as Hbase are just starting to emerge, which raises stability questions, Williams said.