Techworld readers will most probably already know about the Sun/Greenplum announcement, which described the launch of a data warehouse appliance built from open source software and Sun's X45000 server, code-named Thumper.
Sean Jackson is the recently-appointed marketing manager for Kognitio, a data warehousing product vendor. He is concerned that all was not quite what it seemed in the announcement.
In fact he said he: "would like to take this opportunity to let you ‘see’ what has in fact been left out of the press release."
What does he mean?
"Because of the way it is worded and especially for non-technical readers, the numbers in this announcement might look pretty amazing." In fact though, we need, "to read between the lines."
Jackson says the Sun/Greenplum announcement combines apples and pears by mixing two sets of info: Greenplum’s Bizgress database solution, and Sun's Thumper platform.
In more detail he said: "The text says that Greenplum can scan data at a rate of 1TB/minute; this is probably true in a lab test environment but Greenplum certainly cannot scan at this rate on the 24TB Thumper system for $70k. Greenplum should substantiate this claim with real-world examples."
We described the combined system as being able to scan data at the rate of 1TB/minute. Not so, according to Jackson, not at all.
Here is what the Sun/Greenplum release actually stated: "The data warehouse appliance capitalises on the incredibly high data throughput and storage density of the Sun Fire X4500 x64 server with AMD Opteron processors and Greenplum's Scale-Everything parallel database architecture, to move processing near the storage, dramatically boosting performance 10 to 50 times over existing systems.
"With its unique Query-In-Storage design, the solution is capable of scanning 1 terabyte of data in 60 seconds..."
That's pretty clear cut. The 'solution', that is the Sun X4500 and Greenplum database, can scan 1TB of data in 1 second.
Jackson says simply, it cannot do this and quotes Sun's own X4500 data read rate, which is 'approximately 2GBps from disk to memory'.
Jackson said: "This (X4500) system has two dual-core processors; this is the equivalent of one blade server with that blade having to deal with 24TB of data. Sun’s website states that the Thumper system can read data from disk at a rate of 2GB/second. This equates to over 3 hours to perform a single table scan always assuming the processors could cope with that data rate. Remember, most analytical queries require multiple table scans and the generation of numerous intermediate result sets."
That means it would take the X4500 8.3 minutes to scan 1TB of data, over eight times slower than the release stated.
It gets worse though: "However that is academic as a two-processor node could never process database rows at anything like that data rate. Processing database rows as they are read from the disk - to satisfy an analytical query - is CPU-intensive.
On average Kognitio’s WX2 has a CPU core for every 36GB of data while this (Sun/Greenplum) solution has a CPU core for every 6TB of data, thus the Sun/Greenplum solution is slower than Kognitio’s WX2."
Jackson thinks that the Greenplum/X4500 system simply doesn't have enough CPU horsepower: "If an organisation requires performance they need processing power. The Greenplum/Sun announcement is all about how cheaply you can store the data, not how you can effectively use it.
"These are big searchable storage system with lots of disk but only a minimum of processing power. They are 42-ton juggernauts with Fiat Panda engines. The logic is simple; if you don't have the processing power, you can't do the analytics. There is no way around this."
He says the Greenplum system could be scaled up with processing power: "Now, Greenplum’s Bizgress is a massively parallel processing (MPP) solution, so you can scale the number of servers and get more processing power. But this seems a very expensive platform to do this on.
"A 12TB system costs $32,000, the drives will add about $5,000, so the server costs $27,000. Compare that to the equivalent dual processor/dual core blade servers that Kognitio WX2 would use priced at $8,000 and you can see the price difference."
His summing up is: "It would appear that the Sun/Greenplum announcement is a pricey solution for not much performance. Something users should be aware of if considering this solution."
Sun was asked to comment on the points described above but was unable to do so.