Helping scientific supercomputing take advantage of emerging big-data technologies, high-performance computing manufacturer Cray is releasing a set of packages promising to optimize the process of running Hadoop on the company's XC30 machines.
The Cray Framework for Hadoop, along with the Cray Performance Pack for Hadoop, provides a set of tools and best practices for configuring and optimizing an XC30 to run Hadoop for scientific big-data-style projects, according to the company.
Hadoop's Java-based MapReduce model of data analysis could bring a number of benefits to supercomputing, though it has not found widespread acceptance in that community yet, even though both deploy parallel processing and extremely large data sets.
Cray has seen some interest in Hadoop from its users, though the open-source data processing platform was not set up to meet most scientific supercomputing use cases, said Bill Blake, chief technical architect of Cray, in a statement.
Hadoop's approach of bringing the computation to the data differs from the traditional supercomputing approach of moving the data to the processors.
Traditional supercomputing scientific number-crunching tends to rely on large hierarchical file formats and libraries for boosting rates of I/O (input/output), neither of which Hadoop was geared well for handling. Scientific computing relies on parallel file systems and fast interconnects typically not found in Hadoop deployments.
Scientific workloads also tend to have more complex workflows, incorporating both scientific compute and analytics workloads. Data models are also co-mingled with math models in scientific computing, also not the norm for Hadoop.
The Cray Framework for Hadoop and the Cray Performance Pack for Hadoop will address these issues, allowing users to get the most computational power out of the XC30s for Hadoop jobs, according to the company.
An update to the performance pack, to be made available in early 2014, will also include additional system code to optimize the XC30's use of the Lustre file system library and the Aries system interconnect used on Cray machines.
The XC30 is Cray's premier supercomputer, featuring integrated servers and switches, the Lustre parallel file system, Aries high-speed interconnects, an innovative cooling system, and the Dragonfly network topology for minimizing locality constraints.
Cray announced the packages at the SC2013 supercomputing conference, being held this week in Denver.
Cray also announced that it is upgrading the University of Stuttgart's XC30, nicknamed "Hornet," so it will offer more than seven petaflops (quadrillion mathematical calculations per second) of processing power.