Not long ago, high-performance computing use cases came solely from the rarefied realm of government agencies, research institutions and the largest corporations. The ability of high-performance computing systems to process large data sets and perform complex calculations at high speeds -- on machines that cost hundreds of millions of dollars -- was not a critical business need for many enterprises. The closest most senior executives came to an HPC machine was to read a news item about the world's fastest computer, a label that might last months or weeks, until the arrival of a still faster machine.
No more. With the proliferation of data, the need to extract insights from big data and escalating customer demands for near-real-time service, HPC is becoming relevant to a broad range of mainstream businesses -- even as those high-performance computing systems morph into new shapes, both on premises and in the cloud. Increasingly, use of high-performance computing is no longer limited by high costs and esoteric skill sets, but only by the user's imagination.
Industry data shows HPC's growing appeal. According to Hyperion Research, the high-performance computing market for businesses will grow at a compound annual growth rate of 9.8% from 2017 to 2022. The growth is fueled in part by falling hardware costs. "Prices start at well under $50,000 today. That's affordable for many types of organizations," said Steve Conway, senior research vice president at Hyperion Research.
More than the falling cost, the need for systems that can handle AI technologies such as deep learning is making high-performance computing a must-have for the enterprise market. As Conway put it: "The biggest thing by far is massive interest in AI."
Across diverse industries, a gold-rush mentality has emerged, as corporate leaders can't bear the thought of being left behind while competitors discover game-changing business models. While the number of high-performance computing use cases grows and reveals the benefits for business, vendors are seeking to capitalize on the feverish activity surrounding high-performance computing in the enterprise by coming up with hardware, software, storage and network innovations.
For IT and business leaders, it's time to get up to speed on HPC. At stake is the ability of their companies to make the right decisions: They must enable their organizations to grab hold of transformative insights, while avoiding foolish forays into unexplored territory with uncertain returns.
IT professionals who are new to HPC are in for an education. For starters, they'll need to learn how to deal with very large quantities of data. "Twenty to thirty petabytes are not uncommon," Conway said. They'll also be working with high-performing CPUs and GPUs, fabric architectures, high-bandwidth networking and new software. IT workers who don't go with a cloud-based option might be looking at a major data center overhaul. And they probably will need to hire at least one expert, such as a data scientist, to extract those transformative insights they are after.
One way to get a grip on the scope, range and relevance of high-performance computing systems is to look at some of the industries where high-performance computing has gained a foothold -- and delve into the new ways in which it is being applied. These range from on-premises HPC configurations to cloud-based services far removed from end users.
HPC use cases: Deep learning insights from oil and gas exploration
The energy sector has long been a user of the most powerful computers for seismic analysis in their search for accessible oil and gas deposits. But technology innovations are making ambitious new exploration initiatives possible.
The massive data quantities required to gain an understanding of geologic formations far below the earth's surface led Devon Energy Corp. to move a seismic HPC application to the cloud to get faster results at reasonable cost. At the same time, Devon is implementing deep learning to gain both higher speed and greater accuracy.
"Applications are being rewritten and rethought in a shift to deep learning statistical analysis," said Stephen Taylor, an independent consultant in the oil and gas industry and formerly global head of analytics at Devon.
The sheer size of new seismic data -- 10 TB to 15 TB in a single file -- makes running an application on a single machine difficult if not impossible, he said. Previously, Devon used Windows applications on Intel-based HPE Superdome symmetric multiprocessing servers, according to Taylor.
To overcome these limitations, Taylor developed a new approach at Devon that parcels out work so that it can be performed simultaneously across many servers on Microsoft's Azure cloud. "Our new approach saves time and money. Instead of processing all the data one job at a time in sequence, we now have thousands of different jobs working on data at the same time. In addition, we only pay for the resources we use and don't have to pay for idle infrastructure," Taylor said.
Devon's deep learning implementation uses advanced statistical analysis methods to generate faster and more precise results than earlier applications that used so-called scientific analysis, according to Taylor.
The new application uses the Databricks Unified Analytics Platform and Apache Spark, a real-time data analytics engine designed to work with Nvidia-based GPU acceleration and CUDA, a set of GPU-accelerated libraries. The application also uses the Snowflake Computing cloud-based data warehouse service on Azure.
For others embarking on a similar path, finding the talent for these projects could prove the biggest obstacle. "Don't expect to find an individual who can do this work. They are in very high demand and are very difficult to find," Taylor cautioned. Instead of seeking a single expert -- a "unicorn" -- he said a better approach is to put together a team of individuals with complementary scientific and technical knowledge.
Also, when it comes to high-performance computing, don't expect to see results overnight. "Patience is needed," said Taylor, voicing a theme expressed by many early adopters. "You have to give people the leeway to do the work and research before you throw a production case at it."
New to the cloud: Deep learning as a service
The role of the cloud in transforming the uses of high-performance computing in the enterprise cannot be overstated. For Ziff Inc., the cloud has proven the catalyst in creating an entirely new business category: deep learning as a service. With a data center running Dell EMC PowerEdge C4140 servers augmented by Nvidia GPUs, Ziff caters to organizations that seek deep learning insights but would rather not hire a data scientist or build their own HPC infrastructure.
Several Ziff customers are using the service to process images and correlate them with related data.
"Academics might be comfortable with 10,000 images, but business might have millions or even billions of images," said Ben Taylor, chief AI officer and co-founder at Ziff.
One Ziff customer is running a house price prediction application that uses Ziff processing power to examine pictures of properties, relating those images to other relevant data to generate suggested market prices. "Our platform can consume [the images and data] and build competitive models. You don't need a data scientist, just an engineer," Taylor said.
Another Ziff customer, Chatbooks, an online scrapbook-creation company, uses the AI capabilities of the Ziff platform to automate photo deduplication and the identification of high-quality photos, saving users time in creating their scrapbooks. Still another customer, HireVue, is developing AI-based human resources applications that assess job candidates through online video interviews.
Medical research HPC use cases focus on advanced imaging
For NYU Langone Medical Center in New York, powerful GPU co-processors supplied the foundation on which the institution built an innovative medical research application. NYU Langone embarked on a transmission electron cryomicroscopy (commonly known as cryo-EM), in which molecules are studied at extremely cold temperatures (minus 200 degrees Celsius). The purpose is to create detailed images of individual biomolecules to understand their function in the body.
"We want to understand how proteins move through the body," said John Speakman, senior director of research IT at NYU Langone. Cryo-EM stems allows researchers to observe molecules in their native configuration. "It's a young technique."
The Relion open-source application runs on a Cray CS-Storm accelerated GPU cluster with CS500 cluster supercomputer nodes and storage. In addition to cryo-EM, the implementation involves AI, modeling and genomics. "It's a Swiss Army knife solution developed by a diverse group of researchers," Speakman said.
GPUs were needed for AI, CPUs for workflow modeling and high-memory nodes for handling the very large volumes of data. Those large data quantities proved a challenge, requiring increased network throughput to move them from one data center to another without undue latency. Every day, the institution moves 1 TB of data from Manhattan to a data center in New Jersey, according to Philip Karp, vice president of IT architecture and infrastructure at NYU Langone. For disaster recovery purposes, the data is replicated to the West Coast.
The organization faced challenges on the staffing front as well, another recurring theme among the current generation of HPC adopters. It needed experts in data science who were also familiar with biology and genomics -- no easy task in a hot labor market like New York City.
5 pointers for HPC newbies
As high-performance computing begins to take root in mainstream IT, many decision-makers will need to master some multifaceted issues. Here are some words of wisdom.
Fully understand the business problem being resolved. You'll be making a commitment in time, money and staff, so select a problem whose answer has a good chance of generating a return. Beware of embarking on AI or HPC because everyone else is doing it.
Keep your antennae up. The world of HPC and the related realm of AI are constantly evolving to solve new problems. New co-processor designs are emerging; keep an eye on them. You may need to hire or train experts in new deep learning tools as well.
Hire (the right) new people. This is not rocket science, but it is data science. And that means hiring a species of specialist that is in short supply: the data scientist. Early adopters say data scientists who understand not just data but knowledge in their particular industry and business model are essential.
Don't be commitment-shy. Do more than just get your feet wet. AI technologies require large amounts of data and plenty of processing power. Getting a taste of what HPC can deliver will probably mean reevaluating your estimates in terms of compute power, storage, staff and, yes, money.
HPC: It's a physical thing. Organizations that build and maintain their own high-performance computing systems should be aware of the physical needs. Ultra-dense servers call for more power, reinforced flooring and possibly liquid cooling.
Putting HPC to work to assess financial risks
One major New York-based financial services firm, which requested anonymity, is using HPC to increase the accuracy of its risk analysis and replace manual tasks with AI capabilities. Finding people with the right mix of skills proved challenging as the institution discovered that successful HPC implementation depends as much on business acumen as it does on powerful technology.
Banks are required by law to retain capital against the potential losses from operational risk incidents, but capital that is not invested earns no profit. A highly exact understanding of risk factors can enable a bank to set aside just the amount required, freeing up remaining capital for investment purposes. The New York firm is building its risk-analysis HPC implementation on the IBM Power9 server and GPU co-processor.
"It's all on premise. Cloud is a bit taboo because data is confidential and cannot leave the premises," said the executive director in charge of the project at the bank, who also requested anonymity. The bank has hired a data scientist and is using TensorFlow, an open-source machine learning library developed by Google.
The application reads natural language text to identify operational risk incidents. It was new territory for the bank and required new skills. "You need to have smart people who can be resourceful. The skill set is not only the nerdy guy, the hacker; it's more than that. It's people who can solve problems and be resourceful," the executive director said.
In the financial services field, the use of AI models has to go through a process of regulatory review. The level of accuracy for identifying operational risk incidents is something that regulators need to be aware of and understand. As a result, regulators will become conversant in the use of AI by banks. "Open standards for AI are going to become very important," the executive director said.
Patience while awaiting ROI is also required. For the financial institution, the use of AI to replace manual tasks in risk analysis will eventually pay off, according to the executive director. "Right now, it's hard to tell. It will be proven over time, not just one year."
Emerging AI markets for HPC systems
Financial services fraud detection
Personalized interest rates
Internet of things
Time is of the essence: Photorealistic 3D rendering in the cloud
In the quest to meet customer demands in a tech-savvy world, a bit of imagination can come in handy, as companies cobble together applications, tools and services from several different providers. And as these new applications are deployed, some companies may find themselves using HPC without knowing it. A high-performance computing use case involving Timex Group is a good example. The company built an application to enable its watch designers and business partners to visualize new watch designs quickly.
"We were looking for a tool to mix and match various components, such as the dial and attachments -- a configurator. There was no system that enabled mix and match to create a new style," said Krishna Mohan, director of supply chain services for Timex.
To find answers, the company sorted through some 15 vendors in a global search, ultimately finding Swedish company Tacton Systems, which has software that enables the combining of components, and an Australian company, Migenius, that makes 3D photorealistic rendering software called RealityServer. The two together provide the timepiece customization system.
Fast performance is essential to the Timex mix-and-match application because latency in rendering new designs would quickly erode the productivity of designers and discourage business partners. The Migenius rendering software runs on Nimbix cloud services, which host Lenovo servers equipped with Nvidia GPUs. The Nimbix platform is not virtualized but employs containers for workloads on bare metal.
"Customers are looking at minutes, even seconds. We can't talk about waiting five minutes to spin up interactive applications. That's not good enough," said Paul Arden, CEO of Migenius, who examined a number of cloud services before selecting Nimbix for performance reasons.
Exascale system to arrive by 2021
Building the fastest computers in the world today means creating exascale systems, capable of performing one quintillion calculations per second. The current fastest computers are petascale systems, which can perform a quadrillion calculations per second. Intel Corp. and the U.S. Department of Energy announced the Aurora system on March 18, 2019; the first U.S.-built exascale system will be completed by 2021 by Intel and Cray Computing for the Argonne National Laboratory in Chicago.
Under stress: Structural engineering looks to HPC-generated models
Engineering firms are no strangers to the use of powerful computers to create and validate structural designs. As with companies in other industries, non-virtualized, cloud-based HPC implementations using containers are opening up new horizons. Arup, an international engineering firm, is using Penguin Computing On Demand (POD), a cloud-based HPC service, to run LS-DYNA, an application for seismic engineering and evaluation of building structures developed by the Livermore Software Technology Corp.
The ability to withstand the extreme stress events such as earthquakes cause is particularly needed for critical public facilities like airports and hospitals. Such buildings often incorporate seismic protection measures such as base isolation and braces to resist buckling. LS-DYNA running on an HPC cluster verifies the ability of these structures to endure unusually strong forces.
"Penguin provides a non-virtualized environment," wrote Kermin Chok, physical engineer with Arup, in an email exchange. That's important because virtualization typically creates some latency due to the layer of virtual machine software on which applications run. Instead, the POD service provides what Penguin calls a bare-metal, InfiniBand on-demand HPC compute cluster for containerized applications, an approach that is designed to deliver superior performance. The implementation enables Arup engineers to run multiple large-scale, complex design projects simultaneously and reduce the time spent on each LS-DYNA run-through.
Arup's implementation also includes high-speed storage thanks to Lustre, a parallel distributed file system for Linux environments. The result, according to Chok, is the kind of performance needed for highly complex structural calculations. "This provides us a highly elastic compute and storage environment served by high core-count compute nodes," he explained.
As a cloud-based service, POD took a little getting used to. "Engineers had to get comfortable with running models and interrogating results remotely. This is a shift from having local data access," Chok said. Others proceeding down a similar path must be hyperaware of the sheer size of the data sets that will be used, Chok advised. This is important with regard to storage planning, he pointed out, as well as providing sufficient bandwidth to move large amounts of data to and from the cloud.
HPC use cases illustrate challenges and opportunities
High-performance computing use cases show clearly that, as organizations explore new territory with innovative applications powered by HPC, early adopters and analysts are up against several emerging challenges. One such challenge is called overfit, or the inability of a deep learning system, once trained, to apply what it has learned to new data. Such a system can "discover" patterns that don't actually exist, according to Brian Hopkins, vice president and principal analyst at Forrester Research. "Data scientists have tried to develop models that do not over predict," Hopkins said.
Another hurdle for enterprises presented by high-performance computing use cases is more down to earth: people issues. "The skills gap is probably the greatest factor in the market," said Addison Snell, CEO of Intersect360 Research. The challenge is to find that special data scientist with knowledge of a specific industry as well as of AI, machine learning and deep learning. It's an indispensable combination when it comes to formulating questions and interpreting results. "You have to know the question you want to answer," Snell said.
Those specialists might be just too rare for some businesses to find. In those cases, a cloud-based service, such as what Ziff offers, might be the answer. Indeed, Ziff's Taylor places little faith in data scientists. "Data scientists are bottlenecks. They don't have business acumen. They will chase a complex problem for months."
Whether in the cloud or on premises, high-performance computing systems and AI technologies will go hand in hand for some time, most agree.
"Commercial markets have been the big growth engine for HPC over the last five years and will be for the next five," Snell said, adding, "It's a long-term stable market. After all, we're not likely to get to the end of science after the next five years."