This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
3. - Information strategy with a data protection slant: Read more in this section
- Five steps to privacy and data protection governance
- Governing BYOD: The data governance and security effect
- Data quality analysis and big data governance bring value to biz
- When project governance is severely lacking
- Governance drives big data advantage at Penn Medicine
Explore other sections in this guide:
Governance -- of the kind that solves, not makes problems -- seems in rather short supply these days. But don't tell that to IT leaders Michael Restuccia and Brian Wells of Penn Medicine. They are putting the University of Pennsylvania's medical center on the map in big data, data mining and high-performance computing -- due in large part, they argue, to an unusual top-down governance structure. Read more about this CIO team's breakthrough work and the big data advantage reaped through good governance.
In a few weeks, Michael Restuccia, CIO at Penn Medicine, and Brian Wells, associate vice president of health technology and academic computing, will make available the first multi-terabyte block of data from an initiative known as PennOmics. One of only a handful of such repositories in the nation, the PennOmics research data warehouse combines patient care information from the medical center's clinical trial management systems with cancer genomics data from its research labs. Come Thanksgiving, researchers will be able to access the data from their own labs via an Internet connection. They will be able to ask questions of the data, test hypotheses and, it is hoped, discover a trove of new funding opportunities -- from clinical trials with industry partners to obtaining more federal grant money.
"It's a pretty unusual initiative," Wells said. Not many academic medical institutions have the clinical systems and research heft to launch a data project this size. The University of Pennsylvania Health System comprises four hospitals, 2000-plus doctors, and 6,000 clinicians spread across southeast Pennsylvania and parts of southern New Jersey. The Perelman School of Medicine -- founded in 1765 as the nation's first medical school -- boasts 1,000 faculty and 800 students. PennOmics, however, has something other than size going for it, explained Restuccia and Wells.
Penn's medical school and healthcare system both report to a single entity, the office of the dean of medicine. What goes on in the research labs is integrated into patient care -- part of Penn Medicine's commitment to deliver "precision medicine," Restuccia said. Behind this mission is a powerful group known as the Senior IT Governance Council. The council meets at least monthly to discuss technology requirements, plans, status of projects, challenges and goals.
"The governance at the highest level is made up of leaders of the School of Medicine and the Health System," said Restuccia, an ex-officio member of the group, along with Wells. "Twelve individuals have the overview and oversight and provide me and Brian and others with clear direction on what we need to deliver."
So, with the PennOmics project, before Restuccia and Wells ferreted out the half-dozen disparate collections of genetics data across the Penn system; before Wells went from lab to lab selling cardiologists, oncologists, neurologists and other specialists on the value of sharing their data; before IT partnered with Oracle Corp. to fine-tune the vendor's genetic data model; indeed, before they purchased the Oracle Translational Research Center hardware and software suite, the two IT leaders leaned heavily on the council's "integrated approach to leadership."
The reason it requires governance is because we're pulling data from both the health system and the research side. We had to get resources from leaders to help fund it, as well as to contribute data and to endorse the whole decision.
CIO, Penn Medicine
One of the pressing questions before the council in the PennOmics initiative, for example, was whether to buy or build the software for the research data warehouse. The council deemed the project was crucial to the Penn Medicine mission and gave it a buy decision. From contract negotiations to implementation took nine months, Wells said, as opposed to the years it might have taken to build the systems internally.
The governance council's input also proved critical for other aspects of the project. One innovation of the PennOmics database is that it pulls data from disparate sources. About two-thirds of the data comes from an existing patient care data warehouse the IT team had already worked on for six years. The other data come from environments IT knew little about -- the decentralized world of the Penn research labs, where the data tends to be disorganized and the approach more "me-centric" than collaborative. "They had a lot of systems that were good at collecting information and storing information but didn't have a way to share information and link it back to the health system data," Wells said.
Restuccia said Wells' ability to get up to speed on the research systems and culture is unusual and a "huge differentiator" for Penn Medicine. A project with this many egos, however, required more than a quick study.
"The reason it requires governance is because we're pulling data from both the health system and the research side," Restuccia said. "We had to get resources from leaders to help fund it, as well as to contribute data and to endorse the whole decision."
Far from being a drag on IT, governance not only has given Penn Medicine a big data advantage, according to this CIO team, but also resulted in a more effective IT organization. "IT is more rapidly able to define our direction, more efficiently utilize our limited dollars, and we are more inclusive versus exclusive in our decision making," Restuccia said. "There is less wasted time. The group keeps us focused on what's important."
Centralized approach to high performance computing
In fact, the relationship is a two-way street. The trust Restuccia and Wells have established with their healthcare system and medical school constituents makes it possible for IT to push the envelope.
Penn Medicine's High Performance Computing Cluster by the numbers:
-- 4,700 virtual processing cores (2,368 physical cores)
-- 31 terabytes of RAM
-- 1.8 petabytes (PB) of disk storage
-- 1.9 PB of mirrored archive tape storage
-- Will provides approximately 37 million computational hours per year
Source: Penn Medicine
Case in point is another high-profile initiative closely related to the PennOmics research data warehouse: the Penn Medicine High Performance Computing Cluster (HPCC). The cancer genomics data integral to the PennOmics initiative is big -- roughly a terabyte per patient genome. "If you have 400 patients in your clinical trial and you're sequencing each of them -- well, 400 terabytes of data is something that was very foreign to us to manage," Restuccia said. Nor is the size of the data easily managed by the desktop computer sitting under the researcher's lab bench. The bottleneck prompted an IT push for a centralized high-performance computing center on campus that could be utilized by the medical center's hundreds of researchers to run their algorithms. (See sidebar on the new HPCC capacity.)
The computing cluster was purchased in the spring of 2012 and was operational in January 2013. "We were able to deliver the solution way faster and certainly more economically than were they to try to do it on their own. It was one of the first instances where people saw a common good," Wells said.
The centralized approach also helps protect the sensitive patient health information (PHI) contained in the data. Processing is done behind the firewall for the most part. Researchers have a "cloud overflow option" for scale out, "but we don't allow any identifiable PHI to go to the cloud," Wells said. The tough work of getting cloud vendors to sign service-level agreements to meet Penn Medicine's stringent security requirements for protecting this type of data is ongoing. "We're not there yet," he said.
On the ground level, the biggest challenge is moving genetics results data from the sequencers to the HPCC and into the large research warehouse. "The fastest speed you're going to get from the wall plate in your office (or wherever your sequencer is) to the data center is a gigabit per second. That means it takes about three hours to move a terabyte [TB] -- that's a lot of time," Wells said. IT is working with network providers on getting 10- or even 40-Gb networks, as well as analyzing portable storage arrays that can be loaded up with 10 to 30 TB at a time.
The project again illustrates the value of the IT leadership governance council, both for getting buy-in across the enterprise for a centralized high performance computing cluster and for bringing their domain expertise to IT projects. "I would not have understood the sequencing needed for these projects, but the leaders of our council did and took us from step one to step two," Restuccia said.
Continue to part two of this SearchCIO Innovator profile on Restuccia and Wells to read about the pain and promise of using data analytics to improve performance -- from the finance department to the hospital wards.