IBM wants to help IT managers apply company policies to their big data analysis projects.
The company will be introducing new products and features to help organizations manage their new big data systems with the same rigor that they manage other IT operations, said Bob Picciano, general manager of IBM information management.
With traditional data analysis systems, "there's been years of focus on the disciplines over enterprise data management, whether it is governance, security, or lifecycle management. [But] the big data space is still like the Wild West, for the most part," Picciano said.
IBM will add new features to its InfoSphere line of information integration and management software. It has also announced the general release of PureData System for Hadoop, a system configured for running Hadoop workloads.
IBM announced and demonstrated these products at an event called Building Confidence in Big Data, held in New York City Wednesday.
Applying proper IT governance to big data projects will ensure organizations that the data they use will be accurate and secured, Picciano said.
"It comes back to confidence in the data. If [the data] does not come through an authoritative source and it hasn't gone through the controls, it is not trusted," said Heather Wilson, chief data officer for insurance company AIG, in a user panel discussion at the event.
One planned feature for InfoSphere, called Data Click, will allow managers to easily fetch data from data repositories, including those that run on Hadoop and NoSQL data stores. Another feature, Big Insights, can be used to compile a metadata catalogue of data within an organization, so it can be more easily discovered.
InfoSphere will offer a dashboard that shows how big data sources are being maintained successfully to comply with company policies for security and data integrity.
Versions of InfoSphere can also be equipped with a feature, known as Big Master, that aggregates all the accessible information known about an individual customer, from both company and public sources, such as the customer's purchase history and messages from his or her Twitter accounts.
Customer service agents and marketing and public relations managers could use this information to help customers better solve issues or mitigate potential issues, said Inhi Cho Suh, IBM vice president for big data integration and governance.
IBM will be expanding InfoSphere Guardium, which monitors how data gets used and enforces policies of data access, so that it can handle Hadoop and other sources of big data.
IBM had first launched PureData System for Hadoop in April for a select number of customers. PureData is a line of IBM clusters and associated software configured to execute specialized tasks. Since April, IBM has refined the system based on the early use cases and developed a number of modules that can handle common tasks.
One feature can identify rarely consulted data, so it can be moved to an archiving system. IBM has also developed a number of algorithms that will help customers analyze their data, freeing them from the need to write their own algorithms.
IBM has also announced a consulting service, called Big Data Stampede, designed to help potential customers devise a big data strategy.