Tiered storage is getting a lot of press of late as CIOs struggle to cope with the increasing cost of data storage infrastructure and the ever-growing volumes of bits to store. Depending on the analyst report one reads, storage gear represents between 33 cents and 75 cents of every dollar spent on IT hardware today. Analyst projections regarding capacity demand growth trends are downright frightening.
In 2011, IDC said that 21 exabytes of external storage were deployed worldwide and that this capacity would increase by 30% to 40% per year through the middle of the decade. Last year, the company adjusted its estimate, citing the impact of server virtualization, to suggest that rates of growth would be closer to 350%. At about the same time, Gartner Inc. argued that a 600% spike in storage capacity demand would accompany the increasing virtualization of servers.
While the analysts debate their projections, CIOs are left with a more practical challenge -- how to accommodate growing data with the same or declining budget. The simplest answer combines storage tiering and storage reclamation. The former is garnering considerable attention in vendor circles because of its potential to help sell more gear; the latter, nobody wants to talk about.
Tiering refers to both an architecture for storage (traditionally one in which storage technologies are grouped together by performance, capacity and cost characteristics) that provides a "laddered" platform for storing data over time. Tiering also refers to the process used to move data between tiers. The latter assumes that the re-reference rates for data decline over time, which is usually true.
Typically, a new file is accessed concurrently and frequently in the hours or days after it is first written and saved by an application or end user. By the end of the week or month, however, accesses may drop to zero. Hence, early on, the file may be placed on expensive, fast storage (typically with limited capacity) so it can accessed and used efficiently. When access frequency is diminished, the data is moved to less expensive, slower and more capacious storage, which is more appropriately suited to its usage characteristics.
Ultimately, "cold data" -- that is, data that no longer sees any access requests but that nonetheless must be retained for legal, historical or business reasons -- is moved to extremely inexpensive, very high capacity storage with very slow access speeds. That, traditionally, was the role of tape or optical disk technology.
Data roach motels: Data checks in, but it can't check out
Moving data between tiers can be accomplished manually by running periodic reports against a file system to identify files that that haven't been accessed in 30, 60 or 90 days (file metadata usually contains DATE LAST ACCESSED and DATE LAST MODIFIED information), then moving files to the appropriate storage tier. Alternatively, hierarchical storage management (HSM) software is widely available for automating this process. For the most granular control over data movements, archiving software may be leveraged. One potential benefit with using archive software (besides setting business criteria for data movement and retention) is that you can make data movement transparent to the user: Some archive software leaves behind a stub for the file that is shifting tiers so that users believe they are still accessing the file in the original location where it was saved.
Tiering and storage reclamation could reduce the volume of tears shed by CIOs at budget time.
Architecturally and procedurally, challenges to tiering include proprietary elements in some storage arrays that allow data to be written to the array but restrict the migration of data out of the array. These "data roach motels" (the data checks in, but it can't check out) are designed by their vendors to encode data into a proprietary storage scheme (using encryption or deduplication or content indexing) that cannot be undone without the purchase of additional software.
With many recent products, vendors have sought to co-opt the traditional storage tiering model by combining flash memory, fast disk and slow disk in the same kit and building in HSM software on the array controller to automate data movement among tiers. While such a "one-stop-shop" approach may have appeal, scaling the capacity of any tier -- or scaling outside the confines of the physical cabinet -- can become a technically nontrivial issue. Moreover, placing all of the different flavors of storage media in one array usually means that each kind of media costs significantly more than would be the case if acquired separately.
Storage reclamation gets no respect
If a simple implementation of tiered storage architecture is desired, a good approach is to use storage virtualization. Storage virtualization technology (DataCore Software's SANsymphony-V or IBM's SAN Volume Controller are two examples) enables heterogeneous arrays to be consolidated under a common controller that can slice and dice them into tiered storage pools from which virtual volumes can be created. HSM software, deployed at the virtual storage controller layer, can be used to automate data migration in a highly scalable and vendor agnostic manner.
More on data storage and backup
A business case for data storage
Data backup: Before and after the cloud
Tiered storage trend:Automated data movement
Tiering enables greater efficiency in storage, if implemented correctly. It may help to bend the cost curve in storage by not using expensive media that has limited capacity to host data that does not require high-performance storage.
The real gains, however, are realized by leveraging the same process used to identify candidate files for movement in a tiered storage environment to identify duplicates and dreck that can be deleted to free up capacity. Such reclamation strategies get little attention in the press or at storage conferences despite research showing that up to 70% of the capacity of every disk currently deployed could be reclaimed through the proper application of data hygiene and archiving to tape.
Bottom line: Storage tiering and storage reclamation could reduce the volume of tears shed by CIOs at budget time.
About the author:
Jon Toigo is CEO and Managing Principal of Toigo Partners International, and Chairman of the Data Management Institute.