Aster Data has stepped up its efforts to bring MapReduce functionality to the enterprise by launching a new library of code. MapReduce is the engineering framework developed by Google to help organisations perform detailed analytics of extremely large databases.
The company has produced more than 30 ready-to-use advanced analytic packages and more than 1,000 MapReduce-ready functions that will enable organisations to perform advanced analytics within large data warehousing environments.
Aster has worked on combining SQL with MapReduce to bring the ease of using SQL with the powerful performance of MapReduce. The company claimed that the new suite of functions will greatly enhance that ability.
One of Aster Data's customers is social media site, My Space. Its chief data architect, Don Watters, has not only been a user of the Aster software but has actually helped develop some of the functionality. "Part of the functions we built internally are being released as part of Aster," said Watters.
Some of the work that Watters and his team has produced has broken one of the shibboleths of the Internet – the importance of page views. "Some of the work that's been incorporated in the Aster release is some of the stuff we've done on time frames. There's this view that page views is the king. But it's not for us, one user may be on the same page for a long time and the key thing for us is to work out how long someone is on that page."
MySpace has plenty of experience to bring to the table. Watters said that the company handles a whopping six billion records a day, between two and three terabytes of data. And as the site is expected to grow to handling 10TB of data a day, there are a lot of queries to handle. He added that Aster's ability to offer parallelised loading and the way that it scaled analytics as well as processing made it a formidable tool.
Among the new tools from Aster are Text Analysis to allow customers to count the occurrences of words as well as track the positions of words/multi-word phrases and Cluster Analysis which groups data into naturally occurring clusters.