The quest for a cure
Five trillion calculations of biological research? It's all in a moment's work for the Dell-Intel-Linux cluster at the Buffalo Center of Excellence in Bioinformatics
Challenge: Design and install a large-scale, high-performance computing (HPC) cluster for bioinformatics research that requires trillions of complex calculations per second and at least 10 TB of storage
Solution: A 2,000-node HPC cluster comprising DellTM PowerEdgeTM 1650 and PowerEdge 2650 servers using Intel® XeonTM and Intel Pentium® processors running the Red Hat® Linux® operating system; a Dell storage area network
Benefit: High levels of processing power at a better price/performance compared to supercomputers; stable backup solution
The mysteries of the human body have eluded scientists for centuries. Through the years, scientists have performed research and developed theories—with the help of supercomputers—about the human body and the deadly diseases that attack our very existence. In some cases, their research has resulted in effective treatments and powerful drugs that have all but annihilated specific diseases. But the quest for a cure—or at least some relief—for threats such as cancer, AIDS, and Alzheimer's continues.
Today, researchers and scientists in the field of bioinformatics focus on identifying, sequencing, and understanding the human genome—and developing molecular models of even the tiniest proteins in biological agents. This analysis requires high-end computing and visualization technology to help process resource-intensive research. Traditionally, the discipline's computational needs were met by expensive supercomputers. In recent years, however, high-performance computing (HPC) clusters built with commodity components have become a viable alternative, offering a cost-effective way to obtain massive processing power.
The newly formed Buffalo Center of Excellence in Bioinformatics at the University at Buffalo (UB), a campus of The State University of New York, combines such high-end technologies as supercomputing and visualization with scientific expertise in disciplines such as genomics, proteomics, and bioimaging. Faced with the steep costs of supercomputers, the center sought a cost-effective alternative for its new facility. As director of the Buffalo Center of Excellence in Bioinformatics, Dr. Jeffrey Skolnick wanted a fast, reliable, and scalable HPC cluster to support his research in computational biology. The cluster would run proprietary software that performs protein-folding simulations and calculations—resulting in a huge amount of data that required at least 10 TB of storage and a stable backup solution.
Dell preps around the clock
The importance of this research combined with the need for cost-effective processing set the stage for a partnership that included corporate, government, and non-profit organizations—all to serve the bioscience community. Dell provided the computing powerhouse: a high-performance computing (HPC) cluster.
Dell faced a tight deadline: It had a little more than five weeks to build a 2,000-node Intel® processor-based cluster and a storage area network (SAN), implement a backup solution, and perform acceptance testing. Dell created two teams that worked on three shifts. At the Long Island facility, one team racked and stacked equipment, configured software, and tested the configurations. This team then disassembled each rack as it was completed and shipped the hardware to the UB location, where the second team performed final configuration, testing, and verification of the cluster. Once installed at UB, the cluster passed all preset acceptance tests and goals set forth at the project's inception.
Dell servers provide the power to perform
A system of this magnitude requires maximum computing power in the smallest form factor possible to conserve space, power, and ultimately cost. DellTM PowerEdgeTM 1650 servers had the density required to support this type of system, allowing 41 dual Intel Pentium® III processor-based servers to fit in each 42U rack. Based on Dr. Skolnick's requirements for a cost-effective, scalable, and reliable development environment, Dell also installed 100 dual Intel XeonTM processor-based PowerEdge 2650 servers. All servers run the Red Hat® Linux® operating system.
Designed for performance and reliability, the SAN incorporated ninety 181 GB Fibre Channel disk drives in a Dell|EMC FC4700 storage array and eight disk array enclosures. Two Dell PowerVaultTM 136T tape libraries provided tape backup facilities. All SAN devices connected to two PowerVault 57F1 Fibre Channel switches.
Software enhances cluster manageability
With more than 2,000 pieces of hardware in the cluster, the bioinformatics center needed a simple, scalable solution for monitoring vital statistics and maintaining the overall health of its investment. To provide this functionality, the team installed Dell OpenManageTM Server Administrator and Dell OpenManage IT Assistant. Dell OpenManage is based on the industry-standard Simple Network Management Protocol (SNMP) for seamless integration into existing enterprise management platforms.
For backup and recovery, Dell used a combination of EMC® SnapViewTM and VERITAS NetBackup DataCenterTM software.
Massive processing delivers maximum value
The bioinformatics research performed by Dr. Skolnick and his team requires massive computing power, historically the domain of multimillion-dollar supercomputers. Weighing in at 80,000 pounds, the Intel-based Dell cluster provides the necessary computing power—more than 5 trillion calculations per second—at a fraction of the cost.
"Dell's exceptional price/performance allowed us to acquire low-cost servers that will give us extremely high levels of computing power," Dr. Skolnick says. "Deploying industry-standard technology in the form of a server cluster enables us to process the massive amount of data that is critical when doing this type of research."
Success is contagious
Based on the success of this cluster, the Center for Computational Research (CCR) at UB decided to deploy a 300-node Dell HPC cluster to assist general UB scientific research efforts, such as tracking pollution in the Great Lakes. This cluster—comprising 300 Dell PowerEdge 2650 servers, each with dual Intel Xeon processors—has become the highest ranking Dell system on the TOP500 Supercomputers Sites list.2
The new cluster will be a success if it speeds research by even a fraction of the time that the 2,000-node cluster has saved. According to Dr. Skolnick, the amount of data to be analysed by the first cluster would take approximately 2,000 years to analyse on a single computer with one processor. By using the cluster, he expects to complete his initial data analysis in just six months—a time savings that brings treatments and cures to diseases such as cancer, Alzheimer's, and AIDS that much more within reach.