Michael Glenn was wasting storage, and he knew it.
A document scanning project had created a single, 1.4TB LUN of old court records. Glenn, senior IT manager for a US court, knew that only 6% of the files had been accessed in the past year and that the rest shouldn't be on expensive Fibre Channel disk.
His challenge was determining which 94% he could move at any given time to slower, less expensive Serial ATA drives. It turned out he already had the software he needed, Dynamic Storage Technology (DST), part of his Novell Open Enterprise Server 2 implementation, to create and automatically execute file movement policies based on when files were last accessed.
After spending a week tweaking the configurations last spring, Glenn says, "I just let it alone. It's been working great," freeing up at least a dozen Fibre Channel drives. By reducing the number of active files, he also cut his daily backup time from 14 hours to 47 minutes.
Installation was simple, and configuration required just migrating the old LUN to the SATA drives, renaming that LUN, creating a smaller LUN on Fibre Channel to replace it, and designating the new LUN as the primary volume and the old LUN as the shadow. "Then I started setting up the migration rules," Glenn says. There was no extra cost for DST, he adds, but he estimates he saved about $140,000 through reduced demand for disk drives and power.
Glenn is one of the early beneficiaries of a new technology called automated data tiering, which automates not just the movement of data, but also the task of monitoring how data is being used and determining which data should be on which type of storage. Such automated tiering isn't yet in the mainstream because few vendors offer the technology and it hasn't been proved to work in very high-end, transaction-intensive environments. Also, it's typically used only within a single vendor's arrays or file system or supports only a limited number of storage protocols or topologies. But for organisations with simpler needs, the automated tiering tools available today are more than good enough.
How Tiering Became Automated
"Tiering" means moving data among various types of storage media as demand for it rises or falls. Moving older or less frequently accessed data to slower, less expensive storage such as SATA drives or even tape can reduce hardware costs, while putting the most frequently accessed or most important data on faster, more expensive Fibre Channel drives or even solid-state drives (SSD) boosts performance. Finally, automating the entire process prevents it from getting bogged down in the data classification and policy-setting that hampered earlier "tiering" efforts such as information lifecycle management (ILM).
Ready, Set, Implement?
Think your organisation is ready to tap into the benefits of automated data-tiering technologies? Consider these issues first:
- Does it provide the mix of file- and block-level tiering you require?
- Can you override the automatic tiering for performance or data-retrieval reasons?
- Does it support features such as thin provisioning or deduplication if you're using them?
- Does it, or will it, support sub-LUN tiering?
- Does the vendor provide a growth path for further automation?
Storage administrators have long been able to move data between tiers, but they had to manually initiate the process, or at least classify their data and create tiering policies upfront. While some policy creation is still required, the latest crop of automation products is designed to reduce or eliminate the need for staffers to monitor storage systems and find the specific files, volumes or blocks that need retiering and manually move them.
IT managers must first look at which criteria the software can consider (such as how often data is accessed) and whether it can evaluate and move individual blocks or files rather than just larger volumes or LUNs. Since as little as 10% of the blocks in a volume may be active enough to justify a move to faster, more expensive storage, you'll save money if you can move just those, especially if you're moving to expensive SSDs.
Other factors to consider include how quickly the software can detect and react to changes in data usage, and whether administrators can override the automated tiering if it interferes with application performance. Administrators can also use it to predict when certain data (such as accounting files for the quarterly close) will be needed, so the tiering software can update it ahead of time. Finally, administrators need to decide how comfortable they are ceding control to an automated tool.
While IT shops have struggled for years to implement ILM, several users of automated data tiering say they're realising significant benefits with software that's currently available.
Sandee Sprang, director of IT for a US Attorney General, set up a storage-area network with automated tiering using Compellent's Data Progression about five years ago, because she didn't have the staff "to determine what type of records needed to be on the most efficient storage for fast access." Determining the policies for the Compellent system took about four hours, and "the benefits have just been phenomenal," she says, noting that storage management time dropped from as much as 24 hours a week to two hours.
Compellent's block-level tiering also helps maximise disk usage, she says, and it "doesn't mean the entire case file is migrating up and down the tiers", just "the one brief you're accessing or one transcript from 15 years ago."
Brian Nielsen, technology systems architect at the Salk Institute's Computational Neurobiology Laboratory, works in a scientific computing environment with highly variable workloads and therefore prizes the real-time analysis and retiering provided by Avere's network-attached storage appliances. Before he tested, and eventually purchased, the appliances, he says, it was a challenge to move data and to identify which data to move.
Unlike earlier ILM products, which retiered data only sporadically and did so based only on when it was last accessed, the Avere system can "account for many different file I/O attributes and dynamically tier [data]" as application demands change, says Nielsen.
Brian Bosserman, network and systems operations manager at Foster Pepper PLLC, is experimenting with EMC's fully automated storage tiering (FAST) technology on the EMC Celerra NS-480s he runs in his firm's offices. He estimates that it will save 10% of the time he now spends monitoring his servers' storage demands and then planning and executing the retiering of virtual machines among them. With FAST, he says, he hopes to let EMC's Rainfinity File Management Appliance do the monitoring and moving "based on policies I give it."
Installing FAST "was very straightforward," says Bosserman. "It comes as a VMware [virtual] appliance. I just imported the FAST appliance, started it up as a Unix box, then got into it through the web interface and managed it and set it up from there."
However, automated data tiering does require some upfront effort classifying data and setting the policies that determine when certain types of data need to be moved (based on age of the data, application performance, or legal and regulatory requirements). Conventional wisdom says all that work crippled earlier "tiering" approaches such as ILM. But at least one major user, Intel CIO Diane Bryant, is putting a formal ILM process in place before looking into automated tiering. Bryant began an ILM effort last year to cut Intel's 35% compound annual growth in storage needs, and so far 40% of the company's structured data and 30% of its unstructured data is governed by ILM.
Sanford Coker, Unix clinical team lead and senior Unix administrator at Weill Cornell Medical College, is starting to use 3Par's Policy Advisor in his development and test environment. Installation was easy, he says, and creating each policy takes only about 30 minutes, although tweaking them for optimum performance takes another week or so. He says his "very conservative" estimate is that he can cut his use of Fibre Channel disk by about 25% by moving data onto less-expensive, higher-capacity SATA disk.
As it matures, automated data tiering could help drive adoption of SSD, because it will help administrators tune their tiering enough to make sure they're getting the maximum benefit for the highest-performing but most expensive storage medium. But for now, according to storage administrators, vendors and analysts, SSD is too expensive for most mainstream users.
Reichman says it's still more cost-effective to trade space for performance by "short-stroking" disk drives, intentionally using only part of their capacity to improve their performance. Pricing for tiering capabilities ranges from free (for software that's already included in products that are available now) to more than $50,000 for systems such as Avere's 2300 FCN. Users must, of course, also factor in the cost of classifying data and creating tiering policies.
Major vendors such as EMC are also working to make automated data tiering more "application-aware," meaning that the software will understand the I/O demands and other usage patterns of popular applications and automatically retier to meet those needs. Such interoperability will require standards for the information about the data being retiered. One such metadata standard is being developed by the Storage Networking Industry Association.
Those standards could pave the way for easier tiering across devices or file systems made by different vendors. They could also make it possible to tier data between an in-house data center and storage in the cloud.
Until then, the early wave of automated data-tiering products is already taking some of the work out of putting the right data on the right storage medium at the right time.