Are you dissatisfied with your traditional relational database management system (RDBMS) for business intelligence (BI)?
You’re not alone.
According to Forrester Research Inc., an RDBMS has always been an awkward fit for BI. When you need to find relationships that require analyzing many-to-many correspondences; when the variables themselves aren’t all of the same kind; or when you don’t know, going in, exactly which relationships you’re looking for, traditional spreadsheets and their more sophisticated relational-database progeny come up short. Even if you know what you’re looking for, a traditional RDBMS requires time-consuming tuning to get the job done. That’s just not practical in the modern business landscape. When the questions are changing faster than the BI answers can be provided, it’s time for something new.
In fact, in the search for BI agility, most companies will jettison their current RDBMS over the next decade for BI needs, Forrester BI expert Boris Evelson predicted.
Last week, I spoke to David Gallaher, IT services manager at the National Snow and Ice Data Center, who went to an object-oriented database because a traditional RDBMS was of no use.
“We have tried to shoehorn all kinds of data into these constructs, and now Big Data is where we have really run into the limitation of what you can do with these old constructs, where everything has to fit into a table,” Gallaher told me. “Well, what if my data doesn’t really fit into a table?”
In a report published in May, Evelson discussed several new strategies for extracting relationships out of ever-more-complex data sets, and reviewed four relevant BI database management system technologies that have already arrived or at least are on their way. Here’s the skinny:
Columnar DBMS: Although traditional spreadsheets — still the most popular BI tool — can always analyze a row or a column, the emphasis in some new DBMSes is shifting to the power and flexibility of columnar analysis. Evelson believes there are distinct advantages with a columnar DBMS. It compresses data better than a row-based RDBMS, because everything in a column is of the same type. Indexing is an easier task than it would be in a row-based RDBMS because each column “already represents its own index,” he said. “It can keep the database size roughly equal to that of the raw data set — or sometimes cut it in half,” he added.
Many DBMS vendors already offer columnar or hybrid row-based and columnar systems. They range from such mainstream vendors as IBM (Netezza), Microsoft (PowerPivot), and EMC Corp. (Greenplum) to such pure-play columnar RDBMS vendors as Hewlett-Packard Co. (Vertica), SAP AG, Sybase Inc. (IQ), Infobright Inc., and 1010data Inc.
In-memory index DBMS: This is the most agile and flexible of the four technologies because the entire relational database is either in memory or can be swapped rapidly into memory. That flexibility and agility, however, add risk. One risk is that business users could arrive at a wrong answer because they’re no longer constrained by the rigid data models typical of an RDBMS.
It should also be kept in mind that the functions offered by in-memory vendors vary widely, Evelson warned. Among other questions, business pros should inquire whether an in-memory DBMS can be accessed by their other BI tools. Another issue is that if Big Data is being used, the entire data model might not fit into a single memory space.
When sizing applications for a single memory space, users should consider the size of the raw data set, compression ratios and the number of concurrent users, he advised. If the total exceeds a few hundred gigabytes, he suggested picking a vendor that can “dynamically swap chunks of your model in and out of [random-access memory],” or one of the hybrid in-memory databases. The vendor list includes Tibco Software Inc. (Spotfire), Tableau Software Inc., SAP (HANA), and MicroStrategy Inc., among others.
Inverted-index DBMS: According to Evelson, this is a useful database technology when data is complex, content is unstructured and the user’s hypothesis is vague. By building indexes, an inverted-index BI DBMS upends the RDBMS practice of putting the database first and worrying about tuning it later. “This approach builds one big index, but instead of just pointing to data sources — as traditional search engines like Google and Yahoo do — it embeds data in the index itself,” he explained.
The inverted index works well for applications that use data from a variety of sources and that incorporate structured as well as unstructured content. BI pros should consider an inverted index when a project requires numerous data marts to get around the limitations of traditional and even multidimensional DBMSes. An RDBMS assumes you know what you’re looking for, “but BI end users often don’t,” he noted. This searching allows BI users to navigate through the data in order to zero in on what they want by subtracting what they know they don’t want. Attivio Inc. and Endeca Technologies Inc. offer an inverted-index DBMS.
Associative DBMS: It’s tough to make predictions, especially about the future, as Yogi Berra is said to have noted. That’s why some business users are insisting that everything gets filed away in the data warehouse because who knows when it might come in handy. An associative DBMS attempts to link everything together, allowing any trend to be pulled out at any time. “Imagine a data warehouse that can store all-to-all relationships — associations or vectors — between every entity and every attribute in your domain, with counters, aggregates and indexes for every intersection,” Evelson said. Oh, my! But it will cost you. The factor used to calculate the size of an associative DBMS as a multiple of the raw data set is as high as 10 in the associative databases used in academia, he said.
An associative DBMS also requires purpose-built graphical user interfaces, and is not easily accessed by queries based on the Structured Query Language and the Multidimensional eXpressions language. Rather than think in traditional “where clauses,” associative DBMSes let their imaginations run wild, finding connections and analogies that — you guessed it — don’t necessarily line up neatly by rows. Saffron Technology Inc., Ingres Corp. (VectorWise), Illuminate Solutions Inc. (iLuminate), LazySoft Ltd. (Sentences), and Splunk Inc. (a variation on an associative DBMS) are in the vanguard.