When Brett Goldstein was appointed as Chicago's first chief data officer (CDO) in May 2011, he found himself in the middle of a classic IT struggle. The city's data was spread across the municipality and mired in silos, making it difficult to get a holistic view. Telephone calls and interoffice envelopes were the city's data integration tools of choice, Goldstein said, only half joking.
That needed to change -- in a hurry. The city was set to host the North Atlantic Treaty Organization (NATO) Summit in May 2012. The event would bring in heads of state -- and throngs of protesters -- to Chicago. Goldstein wanted to provide public safety officials with better "situational awareness," or the ability to understand what was happening in any given place at any given time. To do so, Goldstein, who became Chicago's CDO/CIO in 2012, needed to break data out of silos in a cost-effective manner that didn't require overhauling the city's infrastructure.
"We're one of the biggest cities out there. We have an enormous IT footprint. We cannot always replace things, but we shouldn't use that as an excuse for not making ourselves smarter," he said.
Goldstein, now a senior fellow in urban science at the University of Chicago, embarked on a course that would make Chicagoans smarter without breaking the bank. Rather than eradicate the silos, he planned to build connectors so that data from one department could be integrated with data from another. The end goal was to make IT a strategic player in the public sector, a cultural shift that ultimately required radical changes to technology, data, process and people. Naysayers told him it couldn't be done, he said, but then they didn't know Goldstein.
From OpenTable to the CPD
Goldstein arrived at Chicago City Hall with an unusual portfolio of experience. A graduate of Connecticut College with a master's degree in computer science from the University of Chicago, Goldstein was busy making OpenTable the restaurant reservation kingpin it is today, when 9/11 intervened. Stuck on the tarmac in Chicago as the terrible news broke, he vowed that when his work was done at OpenTable he was "going to do something different -- something meaningful in a different way," he said. In 2006 he joined the Chicago Police Department (CPD).
"One week, I was at OpenTable in my office; the next week I was doing pushups at the police academy," he said. A year into his tenure on the force, he was transferred to police headquarters and asked to put his technology skills to work.
There, he secured a $200,000 grant from the National Institute of Justice and built the CPD's predictive analytics group, which analyzed 911 call data to predict hotspots. While the project has been met with some criticism, Goldstein caught the attention of the newly elected mayor, Rahm Emanuel, who asked the computer scientist-turned-cop to become the city's CDO.
"In 2011, we kicked off in Chicago what proved to be one of the biggest open data projects in municipal government," Goldstein recounted at Strata + Hadoop World in October. "What we found was that by releasing data, we solved other problems."
One example should resonate with any city dweller who owns a car: When Chicagoan and Web developer Scott Robbin expressed interest in building a street sweeper application that would remind residents when to move their cars, Goldstein supported the idea. But the street sweeping schedule, retrieved from the Department of Streets & Sanitation, was "as unusable data as possible," he said. An Excel spreadsheet had been made to look like an Outlook calendar and then transformed into a PDF, a format that isn't easily consumed by machines. Goldstein had a different approach, which involved interns and manual data entry. The data was eventually released via the open data portal, which enabled Robbin to build the Sweep Around Us app.
"Cities have been doing things a certain way for a long time," Goldstein said at the Strata conference. Departments function independently rather than recognizing the relationships that, when uncovered, potentially make life better for the taxpayers who support them. "As different pieces of data, different agencies, are tied together, we need to figure out what those relationships are," Goldstein said.
Spatial index provides common language
In fact, that was the perspective Goldstein brought to discussions on how the city should prepare for the 2012 NATO Summit. After playing around with MongoDB, the open source, NoSQL distributed database software embraced by the developer community, Goldstein thought he had found a technology that could help break down those silos.
In four months, he put together a prototype that would establish MongoDB as a central database for the city, pulling in data from the 911 system, 311 system (the city's customer service phone line), parks and recreation department, and permitting department -- all systems that had a spatial component and therefore could be mapped.
"The majority of the city data is spatially enabled, so your common denominator ends up being coordinates such as latitude and longitude," Goldstein said. "You convert that into a single spatial index." Reports of crime, GPS coordinates for city vehicles, even location information from external data sources such as Twitter, could be gathered together.
Once the prototype, which came to be known as the WindyGrid application, was given the green light, he and his team built a large extract transform load (ETL) architecture, hooking into legacy IT systems and pulling real-time data into MongoDB. On the front end, Goldstein leveraged technology the city already had, a municipal mapping and data visualization tool from Esri. That provided a user-friendly "WindyGrid interface." Users simply "would outline or create a polygon for an area of operation," he said. The user then would be informed of any 911 calls, for example, that happened within that particular area.
Another important feature? The city spent less than $100,000 building WindyGrid, Goldstein said. "It proved the case that you can build out systems like this at an enormously low cost," he said, estimating that requests for proposals from traditional technology vendors would have come in at over $20 million.
"Whether you're in government, in a startup or in corporate, you often have this mentality that you can't do a big thing without a big investment. That was the notion we challenged with WindyGrid, and we challenged it quite successfully," he said.
From WindyGrid to Plenario
After the NATO Summit, Goldstein wanted to turn situational awareness into predictive analytics. The idea to do so was spun out of research he conducted as a graduate student and started at the CPD with 911 data. If a resident calls 911 to report that someone is selling drugs, the call data is logged and a car is dispatched. But if the officers arrive and there is no crime to investigate, the call is categorized as an "unfounded" 911 for drug sales. "In a traditional approach, the data is discounted," Goldstein said.
By all definitions, unfounded call data is considered dirty or prone to containing errors. But aggregating millions of calls together can tone down those errors, providing information that can be leveraged as a predictive signal. "It is an extraordinarily invaluable sensor input," Goldstein said.
The data, albeit not pristine, can offer a path to action. "This bifurcation between something you can publish and something that gives you an operational edge is typically not distinguished, and yet, I feel we need to go in that direction to make businesses better," he said.
One project involved analyzing data from 311 calls. "I posed the hypothesis that 311 instances would have an interrelationship with smaller spatial units, which would allow for prevention" of crime and other problems, Goldstein said at Strata. He and his team began to look for patterns within the city's 26,000-plus blocks. And they found one.
City blocks that called 311 to report a broken garbage cart also reported an outbreak of rats. More specifically, Goldstein's team discovered that the 311 call acted as a predictive signal, giving the city about a seven-day window to respond to a broken garbage cart complaint before reports of a rodent outbreak would surface. "Why is this interesting? And why isn't it obvious? Because it only happened in a subset of blocks in Chicago," Goldstein explained. "We make certain assumptions based on the totality of the system, but when you analyze the data in subsets and within small contiguous areas, you find these interrelationships," he said.
Proactive responses to rat outbreaks might not be a big-ticket cost-savings initiative or the key to lowering the city's murder rate, but, "small changes to big systems can have a huge impact," Goldstein said.
After leaving city government last year for academia, Goldstein has continued to seek ways to make data more accessible and to challenge assumptions. One focus of his work as a senior fellow in urban science at the University of Chicago Harris School of Public Policy is on how to make data easier to use. He has his hands in another big project that promises to do just that.
In September, the Urban Center for Computation and Data (CCD), where Goldstein also works, launched an alpha version of Plenario, which takes the technological concepts behind the WindyGrid application to the national level. Goldstein is focused on transforming open data sets, what he sometimes refers to as "spreadsheets on the Web," into a machine-ready format that's easy to consume, to combine with other data sets and to visualize.
According to the open source tool's website, Plenario is "an automated ETL tool" that can extract open data sets from city, county, state and federal governments; transform and standardize them; and load them into a database. The more data sets government agencies open up, the more ETL hooks Plenario will build. By taking care of the work on the back end, Goldstein and the UrbanCCD are freeing up users to find new patterns and new predictive signals of their own.
"With these types of projects, people often focus on what is a sexy analytic," Goldstein said. But sexy analytics first require a solid foundation -- the behind-the-scenes data wrangling and data cleansing IT typically provides to the enterprise. "That's the foundational problem here, and that's why we continued with Plenario as a way of pushing this vision forward," he said.
MongoDB co-founder Dwight Merriman talks about dynamic schemas and caching as a crutch in this Q&A on SearchCIO's sister site SearchDataManagement.