High performance database provider Vertica Systems has dismissed the claims of one of its rivals over data loading speeds, after boasting that its own figure is properly benchmarked and therefore transparent and open.
Earlier this month, Greenplum released new technology which it said could speed the loading of data into large scale databases, without compromising overall performance. Indeed, Greenplum pointed to one of its customers, who said he was achieving production loading speeds of over 4TB per hour.
"This is definitely the fastest in the industry," said Greenplum's Ben Werther, director of product marketing, at the time. "Netezza for example quotes 500GB an hour, and we have not seen anyone doing more than 1TB an hour."
But rival outfit Vertica has taken exception to this. It points to a benchmark figure it set in collaboration with HP in December last year, where Syncsort's data integration product, DMExpress v4.8 extracted, transformed, cleansed and loaded 5.4TB of raw data into the Vertica Analytic Database in 57 minutes 21.51 seconds. The data was generated using the data generation tool of the TPC-H benchmark.
"Fundamentally, we are a relational DBMS (database management system)," said Dave Menninger, VP of marketing and product management for Vertica. "But under the covers we do things differently with the data, to improve performance."
"It is possible he [Greenplum's Ben Werther] was ignoring our benchmark when he made that claim, but I suspect he probably knew about it," said Menninger. "We ran a test, published the results and let everyone know the specifications of the test itself."
"Greenplum claims are incomplete," Menninger added, citing the lack of knowledge about the specifications of the machines involved in Greenplum's claims and pointing to the full disclosure of Vertica's benchmark.
But Greenplum soon hit back. "I was aware of their [Vertica's] benchmark, but I was referring to real world usage," said Werther, responding to Vertica's comments. "A number of people have been fairly amused by this. Our focus is on the customer doing something real, not targeting high loading speeds. Ours is a real world system. Vertica's numbers are devoid of any customer references."
Werther said that Vertica's figures were a classic benchmark, where the database assumes to have 7TB to 8TB but starts empty, with clean data to be loaded. There is even no redundant RAID for storage [in their benchmark] said Werther. "We are loading real data, data that is messy; it is not clean but has to worked on." He said that with Vertica's benchmark, they know sort the order of data that ensures they get clean figures. "It is a benchmark, and they tuned and tweaked it, to get a good number," he said.
So how does Vertica respond to charges that benchmarks are artificial and do not reflect real world scenarios? "That is a potentially valid criticism, that is why disclosure is so important," said Menninger. "People can understand what has been done, any special tweaks etc, so it is transparent and open and that has value. Benchmarks are important as they provide direction and give some sense of what is possible."
But Werther disagrees, and he feels that real world customers quoting figures are of more use than artificial benchmarks. "In our case, it was a customer putting their name on the line and saying those figures. With the Vertica disclosure, yes it published the way it was set up, but it took me couple of hours to decipher what they were doing." He said he could not imagine a user being so patient.
"We have great customers and we love to use them to showcase what they are doing, which speaks far more than artificial numbers someone else can generate," concluded Werther.
Werther pointed to a paper by Professor Joe Hellerstein (University of Berkeley) which provides a much more detailed analysis of the work that Greenplum has done with Fox Media, the customer concerned.