The inability to efficiently classify data and the mismatch between structured and unstructured data makes managing data lifecycles difficult. As a result, many large firms are compensating by adding storage instead of just better organizing data.
But who can point a finger?
Information lifecycle management (ILM), the intelligent storage, filing, retention, updating and final disposition of data, is not the nirvana some would have us believe. There is, however, hope.
The true value of ILM lies in its treatment of data cycling through a company in a manner that meets business needs. That means data is available to employees quickly when they need it and it's deleted according to an equally well-tailored process. Less important data is consigned to cheaper types of storage such as RAID.
The problem, experts say, is the difficulty large organizations have in classifying data. In other words, what goes where and for how long. Unfortunately, too many organizations think the answer is adding storage.
"At most sites, data retention policies are still driven by the [storage] backup schedule," said Barry Runyon, managing director at Gartner Inc. in Stamford, Conn. There are no sophisticated policies in place tying storage to data types to retention policies at most companies.
The importance of ILM and data storage increased exponentially with the passage in 2006 of the e-discovery laws. The statutes, which are amendments to the Federal Rules of Civil Procedure, establish that digital documentation and records shall be treated the same as paper documentation.
The Holy Grail of ILM
But with millions of megabytes of data, it's become a major stumbling block trying to get structured (data that resides in applications such as ERP, customer relationship management, financial and point-of-sale systems) and unstructured data (data that's held in Word documents and emails) to ride in tandem.
Unstructured data may be sitting on workstations, PDAs, phones, iPods, personal laptops, blogs, Facebook, voicemail, email, in PowerPoint presentations and even in video. The relationships between bits of unstructured data are currently impossible to consistently classify across data types. In contrast, traditional or "relational" data can be stored in tables, rows and columns.
The second obstacle is the classification conundrum -- and that's a deeper, more intractable problem than the capability to mix and match the two data types, Runyon said.
All of this unstructured data has to be classified, meaning it has to be tagged with descriptors that identify what the material is and why it's important. There are tools to classify certain types of data, but they don't work together. And you can't tell what you need to retain if you don't know what you're storing in the first place. Never mind searching across multiple types of data to locate, for example, all the material in storage related to a particular patient or case, Runyon emphasized.
There has to be a common method for classifying data intelligently so retention policies are in line with the organization's business goals or compliance requirements.
Despite the struggles in data classification, sales of ILM-related products continue to rise. Still, most of those sales were driven by sales of storage products such as RAID and optical disk. In fact, Forrester Research Inc. in Cambridge, Mass., predicts that the ILM market will rise from $280 million in 2006 to $1.3 billion by the end of this year, fueled by e-discovery law compliance.
Under the e-discovery law, companies have a 90-day window to retrieve and deliver data and paper documents pertinent to a case.
"In health care, we worry a lot about e-discovery," said Runyon, who focuses on the industry. "If some lawyer comes in who's asking for patient records that go back 20 years and you've deleted them, that could be a problem. But if you've got good, clearly documented retention policies in place that say 'If this person is healthy, we delete the records after seven years,' then you're in the clear.
"Every year, health care organizations go out and buy terabytes of storage, and that just makes the problem worse," he added. The data is stored, but no one can find it.
Mainly manual migration still
At IT headquarters for Gwinnett County in Georgia, there's an appreciation of the ideas behind ILM. But for now, efforts to manage the data lifecycle have been concentrated on storage solutions and tools. Gwinnett County comprises 437 square miles and has nearly 800,000 inhabitants. The county is 30 miles northeast of Atlanta.
At most sites, retention policies are still driven by the backup schedule.
Barry Runyon, managing director, Gartner Inc.
"At Gwinnett County, we have approximately 170 TB of data on tape backups, and an additional 150 TB of data, which is immediately accessible via storage arrays, direct-attached storage and network-attached storage," John Matelski, CIO/IT director for the county, wrote in an email. He noted that the IT department works hard to ensure that data is stored efficiently according to retention needs.
But classifying that data based on its business value and then storing it on the appropriate storage tier is still a manual process, Matelski said. "The bottom line is that there are tools to help facilitate ILM, but I've yet to find a tool that provides a way to bridge -- collect and manipulate -- structured and unstructured data," he explained.
For giant health care providers, the ability to store and retrieve both types of data is becoming critical. Gartner's Runyon estimated that 80% to 90% of health care data is unstructured. At this point, health care companies both large and small are still buying a lot of storage devices to handle both types of data.
Health care corporations and facilities have a particularly acute case of unstructured data woes, agreed Kirk Mahlen, a former regional CIO at one of the top five ($8 billion per year) religion-affiliated health care giants.
"Document imaging systems are still the main technology in health care. Faxes and paper reports are also still being used," Mahlen said. At his former employer, the next step in its ILM strategy was to take as much unstructured data as possible and make it structured so it could be put into the Oracle database system.
Point solutions toil alone
It's a very fragmented, application-specific approach right now. For example, there are products that link email and file systems. Attachments are stored in emails and on the server. The problem? You end up with several copies of the data.
Deduplication software products have helped quite a bit, but eliminating copies of unstructured data is much more difficult, added Nick Semple, managing director of the knowledge management service at New York-based PA Consulting Group.
Data classification and storage resource management (SRM) tools are useful to a certain extent, Runyon said. "All of the storage vendors have their own SRM tools, but they don't work together," he said.
So from and when will there be a solution to these issues?
From either of two camps: data classification tool vendors or SRM providers, Runyon said. These two camps will decide long term whether or not ILM becomes real, he added. In the future, the storage method won't matter. It's likely that data classification tools will feed smoothly into SRM tools down the line.
If the ILM problems aren't resolved, the health care sector faces a looming crisis that will become all too real in three to five years, Runyon said. By that time, compliance requirements will bury health care organizations in data. But, hey, there's no doubt that ILM will succeed eventually, Runyon said. "There's been too much money put into it for it not to succeed."
Let us know what you think about the story; email firstname.lastname@example.org.