Data management vendor Syncsort will announce on Wednesday its entry into the Apache Hadoop community, with plans to enable high-performance data sorts to be used with the open source distributed computing platform.
Instead of using Hadoop's default sort, users could swap in another sorting system via an external sort plug-in capability Syncsort is contributing to the Hadoop open source community. Syncsort also will offer a Hadoop edition of its DMExpress data acceleration software, providing an alternative to the default Hadoop sort.
DMExpress Hadoop Edition features Hadoop Distributed File System connectivity. Users can create jobs via the DMExpress graphical user interface and run them in MapReduce, which is the Hadoop programming model and software framework for writing applications that process large amounts of data in parallel or in clusters.
Hadoop is generally associated with the term "big data", in which users need to analyse terabytes of data. "The interest in Hadoop is growing dramatically and not just in web-based companies," said Keith Kohl, Syncsort director of product management for data integration. Companies in the financial services and telecommunication spaces also are using it, he said.
An early user of DMEpxress Hadoop edition said it offered a performance boost. "[SyncSort is] very adept at providing highly efficient and scalable sort," said Mike Brown, CTO at comScore, an Internet ratings service. "It is much faster than what you would get out of the box with Hadoop."
The plug-in has been tested with the Cloudera Hadoop distribution. DMExpress Hadoop Edition will be available in a beta release this June, with general availability planned for later this year.