Big Blue has put 3,500 researchers and developers to help in its upkeep and further development as it prepares to offer Spark as a service.
It is also contributing SystemML, a programming language for machine learning tasks, to the Apache project, and will work with Databricks, the company that has largely shepherded the development of Spark to date. In machine learning, computer systems can refine their performance on given tasks as they acquire new information.
"Spark represents for us a whole new way of working with data," said Joel Horowitz, director of marketing for IBM analytics. "It is a very powerful in-memory compute engine with a very easy-to-use interface for data scientists and developers."
Spark, which many view as a successor to the Hadoop big data processing platform, is well suited for machine learning tasks, which typically require large clusters of computers to execute.
The latest version of the platform released last week extends it to run machine-learning algorithms.
"Machine learning is a very powerful technique of extracting the essence of value from data," Horowitz said. Machine learning algorithms are especially good at tasks such as automated classification and helping devices sense their surroundings with greater sophistication, he said. Such tasks were previously considered to be too compute-intensive to be carried out on a single server. Spark can coordinate multiple computers to work in tandem.
IBM already offers a number of platform services based on machine learning algorithms, such as language translation and data visualisation. The Spark service, which will be available by the end of this month, will allow developers to build and run their own machine learning algorithms, Horowitz said.
Spark will be available on the IBM Bluemix, a set of platform services for developers. The Spark service will provide an easy way to load data, examine the data, and pass the results back to another application, all without the work of setting up the supporting infrastructure.
In the past year, the Spark has grown in popularity, as more organisations have incorporated big-data-level analysis into their operations. Companies such as eBay, NASA, Opentable and Yahoo have all used Spark to make sense of large collections of data. About 17 percent of 3,000 Java professionals noted that they were running Spark in their operations, according to a December 2014 survey conducted by Java tool provider TypeSafe.