When you're trying to learn more about the universe with the Large Hadron Collider (LHC), which generated 30 terabytes of data this year, using Big Data technology is vital for information analysis, according to CTO Sverre Jarp.
Speaking at the Big Data Warehousing and Business Intelligence 2012 conference in Sydney this week, European Centre For Nuclear Research (CERN) Openlab's Jarp told delegates that physics researchers need to measure electrons and other elementary particles inside the LHC at Geneva, Switzerland.
"These particles fly at practically the speed of light in the LHC so you need several metres in order to study them," he said. "When these collide, they give tremendous energy to the secondary particles that come out."
CERN Openlab uses a mix of tape and disk technology to store this large amount of research data.
"Today, the evolution of Big Data has been such that we can put one terabyte of data on one physical disk or tape cartridge," he said.
"We are safely in the domain of petabytes and we are moving to exabytes in the future."
When asked why the LHC generates so much data, he explained that each particle detector has millions of sensors but the data they sense is "very unstructured."
"A particle may have passed by a sensor in the LHC and this happens at the incredible speed of 40 megahertz or 40 million times per second."
Despite using a mix of disk and tape data storage technology, CERN Openlab experiences disk failures every day.
"We have a team walking around the centre exchanging bad disks for good ones and hoping the storage technology we use is good enough for keeping everything alive," he said.
Jarp advised CIOs and IT managers to get their unstructured data into a structured form as quickly as possible.
"Big Data management and analytics require a solid organisational structure at all levels," he said.
"A change in corporate culture is also required. Our community started preparing for Big Data more than a decade before real physics data arrived."
Jarp added that he estimates the LHC will run for another 15 to 20 years with exabytes of data to be generated and stored.