News Stay informed about the latest enterprise technology news and product updates.

Data classification vital part of data management solutions

The cost of data management solutions in the cloud and on-premise is still a bit steep for SMBs, but a lack of data classification will cost you more.

Chris Brady, CIO of Dealer Services Corp. in Carmel, Ind., is on the hunt for a data management solution that will simplify -- and speed up -- data retrieval for the car dealership finance organization.

Most of the company's data sits in an ERP system. Brady said she's looking at cloud services and advanced data processing capabilities such as columnar databases and parallel processing to speed up data replication and data transfers, and make copies of data more easily accessible.

The catch? The cost of data transactions to a cloud provider can be high. "The cloud sounds great, and I'd love to get there to take advantage of their processing power, but moving massive amounts of data across bandwidth is a challenge, and [cloud providers] charge for bandwidth in and out of the cloud, " Brady said.

She's done the math and found that it's still less expensive to copy and store data in-house.

Yet, midmarket companies are gravitating to the cloud for data storage and replication, because it offers a quicker way to get the functionality they couldn't afford on their own, said Arun Taneja, consultant and founder of storage consulting firm Taneja Group in Hopkinton, Mass. More importantly, it's a cheaper way to establish a data protection and disaster recover plan, he said.

The caveat is that a cloud provider may give you peace of mind: You get backups every day, and the provider has measures in place to protect and retrieve your data. Once your data is on the cloud's platform, however, you can't run a search on it or classify it, he said.

"You can't apply or set policies for [data] classification or indexing on your data in cloud," Taneja said. "It's the very early stages for solutions like that in the cloud."

But some products are starting to lean in that direction. There are products like Iron Mountain Inc.'s Virtual File Store (VFS), in which the gateway remains on-premise and points at your network-attached storage (NAS) boxes. A policy engine lets you tell the VFS product what type of information to extract from the NAS boxes and into the VFS. "You can set a policy to pull out folders once they are three months old, move it into the VFS box to be cached and moved as a copy back to Iron Mountain's cloud," he said.

The price for VFS is unclear, but it's based on a one-time appliance installation, setup and configuration services engagement; a recurring appliance rental fee and 24/7 support; and a fee based on the user's capacity, according to, a sister site of

Cirtas Systems Inc. offers a similar solution with Bluejet. The Bluejet Cloud Storage Controller, also an on-premise appliance, lets you move data from your storage arrays to Amazon's or Iron Mountain's cloud. The price tag is $70,000, per appliance, and includes secure data encryption and automated tiered caching.

On-premise data management solutions that include such features as search, classification, indexing and litigation holds will run you about $100,000, Taneja said. Some vendors in this category include StoredIQ Inc., Kazeon Systems Inc. (which was acquired by EMC Corp. last year), and Autonomy Corp.

"Any of the advanced products, the vendors say that you can buy a clipped version to get started for $50,000, but realistically you're looking at $100,000 as a starting point, and a few hundred thousand to enable searchability, classification and legal holds," he said.

Data classification should be the first class

Before laying down a pile of cash on a data management solution, Taneja advised that companies of any size classify their data first.

The cloud sounds great, and I'd love to get there to
take advantage of their processing power, but moving massive amounts of
data across bandwidth is
a challenge.

Chris Brady
CIODealer Services Corp.

"What money and systems can't solve is data classification," he said. "You have to figure out what is the most mission-critical data that you're going to give the most TLC to … the Oracle financial data will be in a different class than some images of your building for a brochure."

The most mission-critical information will reside on the best systems with the most data protection and most frequent recovery time objective and recovery point objective.

"You have a weak foundation if you don't start with this basic data classification," he said. "Then you can move to the next level: What data is sensitive, how should I keep litigation in mind, what data has private employee or customer information."

Data classification isn't necessarily easier for smaller companies, many of which are data-intensive despite their size. What will be easier is identifying data stakeholders, a prerequisite to data classification, said Gwen Thomas, founder of The Data Governance Institute LLC in Orlando, Fla.

Stakeholders should be broken out by how the data is used, typically into three categories: operations, analytics and compliance/regulations, she said. And keep in mind that stakeholders may have different polices and might not even be aware of how data is used outside their silos.

"That's why metadata is so important," she said. "The perception is high that assigning the appropriate levels of metadata is a burden, but if you are working in a team you should do it, because the cost of re-creating data is much higher than the aggregate cost of [putting in] metadata," she said.

Let us know what you think about the story; email Christina Torode, News Director.

Dig Deeper on Small-business infrastructure and operations

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.