"Big data" has arrived as a big business initiative. But the hip, experimental, ad hoc veneer of blending data streams to surface bold discoveries belies a massive cultural and technological undertaking not every organization is ready for.
Without a strategic plan that includes coherent goals, strong data governance, rigorous processes for ensuring data accuracy, and the right mentality and people, big data initiatives can easily end up being a big-time liability rather than a valuable asset.
[ InfoWorld's Andrew Lampitt looks beyond the hype and examines big data at work in his new blog Think Big Data. | Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview. ]
Following are five strategic tips for avoiding big data failure. In many cases, the advice pertains to any data management project, regardless of the size of the data set. But the advent of massive data stores has brought with it a particular set of pitfalls. Here's how to increase the chances that your organization's urge to mix large data pools from disparate sources is a success.
Big data success tip No. 1: Make big data a central business tenetRearden Commerce CTO Phil Steitz succinctly sums up the single most important driver of big data success: You must integrate analytics and data-driven decision making into the core of your business strategy.
"If 'big data' is just a buzzword internally, it becomes a solution looking for a problem," Steitz says.
For Reardon Commerce, whose e-commerce platform leverages big data and other resources to optimize the exchange of goods, services, and information between buyers and sellers, the concept of "absolute relevance" -- putting the right commercial opportunity in front of the right economic agent at the right time -- is key.
"It is an example of this kind of thinking originating and centrally driving strategy at the top of the house," Steitz says.
Part of this approach includes developing a small, high-powered team of data scientists, semantic analysts, and big data engineers, then opening a sustained, two-way dialog between that team and forward-thinking decision makers in the business, Steitz says.
"The biggest challenge in really getting value out of contemporary analytics and semantic analysis technologies is that the technologists who can really bring out what is possible need to be deeply engaged with business leaders who 'get it' and can help winnow out what is really valuable," Steitz says.
Another key success factor in making big data a part of the overall business strategy is effective management of data partnerships.
"Really optimizing customer experience and economic value in today's world inevitably requires sharing data across enterprises," Steitz says. "Naïve approaches to this -- 'just send us your full transaction file nightly' -- fail miserably for operational as well as privacy and security reasons."
Big data success tip No. 2: Data governance is essentialBig data projects bring with them significant concerns over security, privacy, and regulatory compliance. Nowhere is this a more sensitive issue than in the health care industry.
Health care provider Beth Israel Deaconess Medical Center is one organization becoming increasingly involved in big data, as it works with electronic medical records, new health care reimbursement models, and the vast amounts of clinical and claims data that has been collected over the years. Data governance will play a key role.
"There will be a lot of pressure put on health IT organizations to turn the data around rapidly," says Bill Gillis, CIO of Beth Israel Deaconess.
Having strong governance in place enables organizations to make sure the data is accurate and tells the clinical story they need in order to provide quality and improved care.
"It's critical that the 'tyranny of the urgent' not win over," Gillis says. "Having governance in place up front can help avoid that pitfall and keep things on track."
Of course, security and privacy are a big part of this.
"Given the uncertainties that surround new big data, for the important brands the privacy and security bar is so high that the protections afforded for this new data are higher than most other traditional external decision data," says Charles Stryker, chairman and CEO of Venture Development Center, a consulting firm that has provided big data advice for companies such as AOL, Cisco, First Data, and Yahoo. "No major brand wants to test the limits of where the privacy and security line falls," Stryker says.
From a project's outset, companies need to consider data provenance (the metadata that describes the source of the data) and make appropriate pedigree decisions (confidence in the data) when using this data in any big data solution, says Louis Chabot, senior technical adviser and big data lead at technology and management consulting firm DRC, which has helped government agencies implement big data projects.
"Maintaining data provenance metadata and pedigree-based decision making is not something you 'bolt on' after the fact," Chabot says. "It is an integral part of the initiative that must be designed and included from the outset." When appropriate, Chabot says, specialized techniques such as digital signatures should be used to protect provenance from accidental and/or malicious tampering.
Organizations also need to respect data privacy laws and regulations. "Various techniques such as anonymization of the data, stripping out elements of the data, and restricting distribution [and] usage of the data can be used" so that organizations are in compliance with security and privacy regulations, Chabot says.
According to the report, best-in-class companies (as determined by Aberdeen metrics) reported that 94 percent data accuracy was their organizational goal and 1 percent improvement was needed to meet this goal. But industry-average companies reported a data accuracy goal of 91 percent, and needed 18 percent improvement in their data management methodologies to achieve this, while "laggards" reported a data accuracy goal of 80 precent and needed 40 percent improvement in their current performance to reach that.
Here, data cleansing and mastering are critical to big data success. "Contrary to some beliefs, this requirement does not go away," says Joe Caserta, founder and CEO of Caserta Concepts, a data management and big data consulting firm. "If the big data paradigm is to become the new corporate analytics platform, it must be able to align customers, products, employees, locations, etc., regardless of the data source."
In addition, known data quality issues that have long jeopardized credibility of data analyses will have the same impact on big data analytics if not properly addressed, he says.
On a typical big data project, data management is often "deprioritized" by development staff and can go unresolved, DRC's Chabot notes. Effective data management involves ensuring mature techniques -- process and automation -- are put in place to address model management, metadata management, reference data management, master data management, vocabulary management, data quality management, and data inventory management, he says.
Big data success tip No. 4: Pool best practices for best resultsPeople are discovering what works and what doesn't when it comes to managing big data and analytics. When they are employed by the same organization, why not share this knowledge?
One way to do this is by creating a big data COE (center of excellence), a shared entity that provides leadership, best practices, and in some cases support and training.
Typically, COEs have a dedicated budget and are designed to analyze issues; define initiatives, future state, and standards; train users; execute plans and maintain progress, says Eliot Arnold, co-founder of Massive Data Insight, a consulting firm that specializes in big data and analytics programs. Getting a COE started requires an audit of available resources and a senior executive sponsor, he says.
While a big data COE is a good idea on paper, its effectiveness will be determined by how well it's implemented in practice, DRC's Chabot says.
There are many basic challenges with a COE covering the entire data lifecycle, Chabot says, including authoring and identifying the best practices; vetting them in a nonbiased fashion; properly documenting their applicability; overseeing their adoption; and modernizing them over time.
DRC has defined a big data maturity level similar to the CMMI (Capability Maturity Model Integration), a process improvement framework used by organizations. The big data maturity-level models map out relevant best practices. These are divided into four groups: planning/management, project execution, architecture, and deployment/runtime/execution, for organizations to incrementally adopt over time. This avoids the pitfalls of trying to be too sophisticated too quickly, Chabot says.
Big data success tip No. 5: Expertise and collaboration are keyBig data is a business initiative, not just a technology project, so it's vital that business and IT leaders are on the same page with planning, execution, and maintenance.
"One of the biggest pitfalls for a program is disconnect between IT and the business on who controls strategy and initiatives," Arnold says. "In less mature organizations there is no documented strategy, a hodgepodge of tools are in production, and decision makers favor intuition for charting strategic direction. These types of firms are mostly unaware of the asset value of data."
Business leaders and IT professionals can ensure their big data project is successful by carefully identifying objectives, needs, and requirements; calculating a return on their investment; mapping analytical capabilities to business/mission needs; and installing a mechanism for continuous feedback, DRC's Chabot says. "A big data project should be divided into multiple phases, incrementally adding value to the organization," he says.
But getting IT and business leaders to agree, as well as getting departments to work together on data initiatives is not always easy.
"In my experience, for the major companies this is becoming a real corporate challenge," Venture Development Center's Stryker says. "Does the job responsibility associated with chief data officer rest within the IT department, the marketing department, the risk management department, or do each of these departments have their own big data initiatives and coordinate with each other?"
Companies also need to bring in the necessary expertise to exploit big data technologies such as Hadoop, which has enabled low-cost, computationally efficient management of very large data sets and analysis tasks.
"The paradigm shift to big data introduces a new role in the corporate organization, the data scientist," Caserta says. "This role requires deep understanding of advanced mathematics, system engineering, data engineering, and [business] expertise." In practice, it's common to use a data science team, where statisticians, technologists, and business subject matter experts collectively solve problems and provide solutions, he says.
Many of the people already working in data analytics will need to prepare for culture shock, Caserta says.
"Before a big data project is launched, a strategic readiness test should be performed to assess the adoption of the new paradigm," he says. Business analysts will need to be retrained or repurposed. The goal of shifting to a big data platform may include changing from reactive analysis (for example, how well a campaign worked), to predictive (what should the next campaign offer), he says, "because now we can proactively influence nonbuyers to follow behavior patterns of loyal customers; or restimulate active customers when their behavior pattern begins to look like a lost customer."
What are the risks of not building a strong, cohesive big data strategy? Launching expensive endeavors that fail to deliver on their promise.
"Typically big data projects are multidimensional and complex initiatives," Chabot says. "They require significant upfront planning." Before embarking on a big data project, he says, organizational leaders should ensure alignment between strategic, functionality, data, analytics, and technology road maps. These road maps need to be reflected in a business, system, software, data, and technology architecture.
"Misalignment between any of these road maps can cause the entire project to derail," Chabot says. "The risks of not having a strong, cohesive big data strategy with the proper road maps and architectures are likely to be excessive costs, expectation mismatch, lack of value, and ultimately program failure."
- NoSQL showdown: MongoDB vs. Couchbase
- InfoWorld Big Data Analytics Deep Dive
- InfoWorld Hadoop Deep Dive
- How to get a hot job in big data
- 7 top tools for taming big data
- Enterprise Hadoop: Big data processing made easier
This story, "5 strategic tips for avoiding a big data bust," was originally published at InfoWorld.com. Follow the latest developments in big data at InfoWorld.com. For the latest developments in business technology news, follow InfoWorld.com on Twitter.
Read more about business intelligence in InfoWorld's Business Intelligence Channel.