The Challenge:
Overcoming Storage Bottlenecks in Complex Simulations
Exxact Corporation performs benchmarking of multiple High Performance Computing (HPC) software applications in efforts to characterize how specific apps will perform on their systems. More specifically, when their systems include multiple NVIDIA Tesla Graphics Processing Units (GPUs). Recently, Exxact engineers have been characterizing the performance of life-science applications such as RELION, GROMACS, NAMD and Amber, all of which are molecular dynamic simulation applications that model biochemical processes for life science research. These applications run best when leveraging NVIDIA’s CUDA-enabled GPUs. In most cases, application processing gets divided across multiple NVIDIA Tesla GPUs in a single system. The test system originally had a single Samsung solid state drive (SSD), but test results showed scaling from one to two and from four to eight GPUs was nowhere near linear. Furthermore, performance with four or eight GPUs was nowhere near the expected performance of single-GPU performance. In particular, gains achieved when scaling from four to eight GPUs were incremental at best.
Exxact sells custom systems to labs and universities doing research into life sciences, real-time modeling of biological processes, deep learning, Big Data and more. Being able to sell these systems successfully requires expertise in setting up and optimizing both the hardware systems and software used.
Paul Del Vecchio, a Sr. Sales Engineer, was given the job of creating a demo system that could run various molecular dynamics applications utilizing CUDA-enabled GPUs.
In a similar fashion, the CUDA software is used to run simulations of biological processes across multiple NVIDIA Tesla GPUs in a single system. Dedicated, specialized motherboards and PCIe bus expansion systems allow for eight or more 16x PCIe slots in a single system. Since communications between nodes run over the PCIe bus, some of the usual challenges of clustered systems involving the network that connects the nodes are eliminated.
RELION, GROMACS, NAMD and Amber are all applications that simulate different biological and chemical processes. These simulations are so complex that one of the standard measurements is days per nanosecond (days/ns), which measures how many days it takes to simulate one billionth of a second of a biological system in operation.
Isolating bottlenecks is an ongoing process; once one bottleneck is found and ameliorated, a new bottleneck is usually discovered. Once that choke point is resolved, the next lowest-performing component becomes the bottleneck. Particularly with complex systems like HPC software, optimizing performance becomes a lengthy process, and often must be tailored to not only the clustered operating system, but the specific application that runs on top of it.