Hortonworks Inc., a major distributor for Apache Hadoop, the open source distributed computing framework used in big data, announced last week that it was going public. The initial public offering comes just three years after the company was spun out of Yahoo. It's an ambitious -- if not unexpected -- move for Hortonworks, analysts said. It may spur more initial public offerings from close competitors, MapR and Cloudera, but what does it say about the status of big data?
Maybe less than expected, analysts told The Data Mill. According to the IPO filing, Hortonworks generated $33.4 million in revenue from January through September of this year. That's more than double the $15.9 million in revenue for the same period last year, but more modest than the hype about Hadoop might have suggested, said Nick Heudecker, analyst at Gartner Inc.
"If anything, the revenues reported in the S-1 should reinforce how early the entire Hadoop space is," Heudecker said. Hortonworks also reported $86.7 million in losses for the first nine months of 2014, up from $48.4 million for the same period in 2013.
The results are a "good proxy for enterprise maturity relative to Hadoop," agreed Jeff Kelly, analyst for The Wikibon Project in Marlborough, Mass.
"Hortonworks' support subscription becomes appealing to enterprise practitioners only when they are ready to scale-out pilot Hadoop projects to production deployments. That Hortonworks has generated just $30-plus million in revenue in the first nine months of 2014 indicates there are somewhat fewer production Hadoop deployments in the enterprise than many (myself included) thought," Kelly said.
Indeed, for the time being, the IPO is a neutral event for CIOs, Heudecker said. "If they're currently doing an evaluation of Hortonworks, they should continue doing so. They shouldn't gravitate to or shy away from Hortonworks based on this filing. They should evaluate the company based on technical merits."
If the IPO signals anything about enterprise technology, it is that open source continues to gain momentum. "People are starting to understand what the open source model is, what the community is, and they are investing," said Howard Dresner, chief research officer of Dresner Advisory Services LLC in Nashua, N.H. A few years ago, participants in Dresner's surveys reported that Hadoop was dead last in terms of priorities. That isn't the case anymore. "For better or worse, it's taking hold," he said. "And it's something CIOs have to pay attention to. There's going to be too much pressure for them not to."
How to find the right people
Naysayers told Brett Goldstein, former chief data officer/CIO for the City of Chicago, that he'd never attract quality IT professionals to work for city government. "I took it as a direct challenge. Instead of sitting in my CIO office and letting HR handle all of this, I went out and hit the street," Goldstein told me.
He attended Meetups. He'd introduce himself to people, telling them about his work at OpenTable when it was just a young start-up, about his five years as an officer at the Chicago Police Department, and about the vision his boss -- Mayor Rahm Emanuel -- had for a more open and transparent government.
"I didn't say, 'Drop and give me 20.' I said, 'Drop and give me two years,'" Goldstein said. He held out another bonus to prospective IT hotshots. "I said, 'Come work for me. I'm going to help make you smarter; you're going to help make the city we live in better,'" he said.
Goldstein's quest to find data scientists wasn't just about bringing new talent in. He also spent time encouraging and training the talent he already had to tackle hard problems and use new tools.
"You need to give people the opportunity to learn new things," he said.
Secrets to a data team's success
Last week, Nicholas Arcolano, senior data scientist at FitnessKeeper Inc., offered a prescription for building a successful data team to the crowd at the second annual Boston Data Festival. The first ingredient is good communication. "There's so much you can gain for data analysis from talking to everyone else in your company," he said. "We also have a lot to teach people." Here are three more essentials.
1. Move quickly but carefully: "I was skeptical but surprised that data science can work well within an Agile framework," Arcolano told Data Fest attendees. That being said, a two-week iteration cycle also requires data scientists to make certain assumptions. "The secret is understanding what those assumptions are and understanding how they affect your results," he said. He also said that if something looks too good to be true, it probably is. The advice might sound like common sense, but "it's easy to lose sight of when there's this sense of urgency," he said.
2. Keep it simple: "Simplicity has huge advantages," he said. Simple code and processes can be instrumental in finding and debugging problems or translating a prototype into an actual feature for a mobile app. "The simpler you can make something, the easier it is to explain to developers, the easier it is for them to code up and test," he said.
3. Use the right tools: "You need to be comfortable using a variety of tools," Arcolano said. "You need to make time to learn new ones." One way to branch out is to experiment with open source tools and leverage the open source community. Sometimes, the right tool means building it in-house. What's the litmus test for FitnessKeeper? When not having a dedicated tool starts to become inconvenient, it's time to build, Arcolano said.
Previously on The Data Mill
Big text and thinking in data
Attention to UX is the first rule of mobile app dev
CIOs, meet Tamr, a data curation tool