In some ways, I feel bad for the team that launched the Health Insurance Marketplace of the Affordable Care Act. I hate to see any IT team not deliver on a mission-critical application. At the same time, I am shocked that the marketplace's rollout has been so problematic. We really can and should do much better when we launch a product -- particularly a system costing a reported $400 million.
Over the years, we have learned how to scale and test high-volume websites. As the Health Insurance Marketplace website goes back and forth, from offline for repairs to online, I thought I would share the lessons I have learned and applied when I've been tasked with delivering a high-volume, mission-critical application.
1. Always plan for more than the most hopeful demand. I recently delivered a Web application that supported an aggressive marketing campaign. We had a sense of the Web-visit demand, but were clueless as to the actual numbers. Relying on a formula I invented years ago, I polled the team for their most optimistic number of visitors per hour and per day. I then took the highest number and multiplied it by 2.7. Why multiply by 2.7? Because a number like that gives others the sense that I have thought it through (even though I simply invented that factor). We then designed the front end and back end of the application to elastically handle such loads.
In some ways, I feel bad for the team that launched the Health Insurance Marketplace. I hate to see any IT team not deliver on a mission-critical application.
2. Plan on elasticity. We now had a target for site visits, but we did not want to build to the peak demand. Instead, we built the entire infrastructure for a more reasonable number, but then planned on high levels of elasticity. This required us to be very creative with our designs and service providers. Not only did we need elasticity in planning for demand, we also needed flexibility as to the duration of the campaign. If the campaign was successful, it would continue forever. If it did not work, we would close it down in just a couple of months. In order to deal with our uncertain peak demand, we engaged with a cloud service provider to build out the elastic infrastructure. That provider required a first-year contract for their services. I told them we could not do that. Instead, we needed a month-to-month term. They told me I was crazy. I told them that might be, but we still needed some flexibility. We settled on a three-month, renewable term.
3. Pilot, pilot, pilot. When deploying a mission-critical application, we need to test and validate both the functionality of the application and the capabilities of the infrastructure. The best way I have found to do this is to do a series of expanding pilots. I conduct these pilots in two dimensions. In the first dimension, I pilot the functionality in phases. I learned a long time ago that no one knows what they want until they see it. And that once they see it, they will want to change it. As a result, I develop a portion of the functionality, then deploy it to the pilot group. They give me feedback on the design and features as early as possible. In the second dimension, I test the scalability of the application. In practice, these two dimensions look like this:
- We build a portion of the application and give it to a small group of pilot users.
- We get feedback, revise the application and give it to a larger group of pilot users.
- In parallel, we build more features and pilot them with the original group.
Recent columns from Niel Nickolaisen
Casting off the shackles of legacy systems
Consumer-friendly or enterprise security: What's a CIO to do?
The right BI for a mobile-first world
As we add features, we add more to the pilot group. At the same time, we use testing tools to measure the load the pilot groups are putting on the application and the infrastructure. We extrapolate the loads to decide if we need to change the application design or beef up the infrastructure. Along the way, we build confidence that when we turn on the application, it will perform in both dimensions. In our case, the initial pilot-group loads helped us identify ways we could streamline the application to reduce data integration bottlenecks.
By taking this approach with our marketing campaign, the launch was a non-event. No hiccups, no bumps in the road, no bad press about customers who spent fruitless hours on a website. You see, we really can do better.
About the author:
Niel Nickolaisen is CIO at Western Governors University in Salt Lake City. He is a frequent speaker, presenter and writer on IT's dual role enabling strategy and delivering operational excellence. Write to him at firstname.lastname@example.org.