I received a note from Robert Gaddis of ORtera. It said: "Your recent (piece) regarding HP's AppIQ technology appropriately applauds their initiative to simplify the data center professionals job by collecting information under a single pane of glass. Of course, there are necessary compromises in this implementation. I think this is revealed most vividly when one compares the information presented in SRM products like AppIQ with the information and heuristics in ORtera Atlas, a new product offered by my company."
Rob Gaddis is the CEO and Chairman of ORtera, which is a privately held startup founded last year. Dave Fisk is the chief scientist and vice chairman. He was chief knowledge offer and scientist at Imperial Technology, and a principal scientist at Sun Microsystem's Network Storage unit.
ORtera's Atlas product monitors I/O activity on a system and then analyses the results and applies intelligence (heuristics) to the data. A great description of what it does and what you can do with the data comes from as web log maintained by Adrian Cockroft, a Solaris tuner. I've simply copied his web log entry.
The web log
Monday, August 15, 2005
I wrote a while ago about Dave Fisk's Ortera Atlas tool for storage analysis. I recently had a chance to use a beta release of Atlas on a real problem, and they are about to do a GA release, its ready for prime time.
Like most tools, it can produce masses of numbers and graphs, but compared to other storage analysis tools I've seen it goes further in three ways:
1) It collects critically important data that is not provided by the O/S
2) It processes the data to tell you exactly what is wrong
3) It runs heuristics to tell you how to fix the problem
I wish more tools spent this much effort on solving the actual problem rather than making pretty graphs that only an expert would understand.
What we actually did was run the tool on a pre-production Oracle system using Veritas Filesystem and Volume Manager with Solaris on a SAN connected to a Hitachi storage array. Atlas starts off by looking at all the active processes on the system, and ignoring any that are not doing any I/O. It collects data on which files are being read or written by which process, and what the pattern and sizes are at the system call, file system and device level. You can also set the tool to focus on a set of devices, and gather information on the processes that actually talk to those devices.
Atlas immediately pointed out that two volumes had been concatenated to form a filesystem, and that 98 percent of the accesses were to one of the volumes. It recommended that the volumes be striped together for better overall performance.
It also pointed out that some of the I/O accesses were taking two seconds to complete at the filesystem level, but only two milliseconds at the device level. I guessed this was CPU starvation caused by fsflush running flat out on this machine which had over 50GB of RAM. Adding set autoup=600 to /etc/system and rebooting made the problem go away. We also saw this effect in the terminal window, where our typing would stop echoing for a few seconds every now and again. I've been told by Sun that the very latest patches finally fix fsflush so that it can't use a lot of CPU time, so large memory machines will finally work properly without needing this tweak.
Finally Atlas showed that the filesystem block size was set too small and Oracle was doing large reads that were being chopped into smaller reads by the filesystem layer before being sent to the device. It gave a specific recommendation for the block size that should be used. Reconfiguring the disks takes a long time to do, but we'll fix it before it goes into production.
We could have figured out the concatenation problem using iostat data, but the other two problems are normally invisible, and the topic of what filesystem block size to use can generate masses of discussion and confusion, so having "virtual Dave Fisk" tell you what blocksize to use can save a lot of time :-)
You can't do this with Storage Essentials
HP's clever Storage Essentials product doesn't provide the level of tuning analysis Atlas delivers. Atlas seems to have put into code the capabilities of a hugely experienced storage I/O specialist.
There is a 30-day demonstration available from ORtera. Atlas is intended for the use of IT specialists, vendors, value-added resellers, and IT consultants. It looks like a beautifully capable product opening windows into storage I/O systems that no other tool can. If you need to delve deep into storage I/O then try it out. It could be the best storage I/O analysis tool available.