General descriptions: based on our analysis, the core content of computation of bioinformatics is memory-intensive and storage-intensive. Relying on our years of experience, we can offer specific and professional solutions to our clients.
1. Demands and analysis
Life science focuses on researching nature, features, occurrence and evolution rules of life phenomena and activities and the interrelations between various life forms and between life forms and the environment. With the development of science and technology, human beings can research miracles of life development through diversified means, including bioinformatics.
Bioinformatics focuses on storage, search and analysis of biological information with the aid of a computer. It is one of the major frontier sectors of life science and natural science today and will become one of the core sectors of natural science in the 21st century. Its focus of research predominantly includes genomics and proteomics. Specifically, it analyzes the biological information of the structure and function expressed in the sequence of nucleic acid and protein.
A couple of specialties have emerged in bioinformatics in the last decade. The focuses of researches relating to high-performance computing include:
The basic issue of sequence alignment compares the similarity or dissimilarity of two or more than two symbol sequences. From the perspective of biology, this issue involves the following meanings: to re-establish the complete sequence of DNA from the overlapped sequence fragments. Determine the physical and gene map storage from the probe data, search and compare the DNA sequence in the database, compare the similarity of two or more sequences, search for relevant sequences and subsequences in the database, find the continuous generation mode of nucleotides and find the information component in protein and DNA sequence. Massive data will be generated during sequence comparison and that brings considerable challenges to our storage system.
The capacity of hard disk doubles every 14 months while the data load of gene sequence doubles every 5 months. For instance, CeleraGenomics, Sanger Centre and other main gene research institutes manage data of trillions of bytes and the data in their databases already exceed the collection of books in the US Library of Congress and also the data load collected ever since the beginning of biological researches.
Sequence assembly refers to the assembly of reads fragments generated upon sequence testing and resume the original sequence. This issue is the most fundamental task of sequence analysis and critical to the success of genome research. The assembly result directly affects the sequence marking, gene forecast, genome comparison and other subsequent tasks. The assembly of genome sequence is also the most important problem that must be solved in genome research. The difficulty lies in its massive data load (recovering 100M original sequence from 10M fragments in terms of human genome sequence) and also the high degree of sequence repetition.
From the perspective of computer, massive initial data will be imported into the memory and then processed in the beginning of assembly. Therefore, sequence assembly has very high requirements on the internal memory and computing capacity of the computer.
Structure-based medicine design
One of the purposes of human genetic engineering is to understand the structure, function and interaction between about 100,000 different types of proteins in the human body and their relations with various types of human diseases, and to then find the methods for treatment and prevention of diseases, and medication. Medicine design based on macromolecular structure and micro molecular structure of life forms is a very important research field in bioinformatics. In order to inhibit the activity of certain enzymes or proteins, the molecule of inhibitors may be designed on a computer in the molecule alignment algorithm when the three-leveled structure of protein is known and then used as a candidate medicine. Then a comparison is made in the database and an advantageous structure is obtained. Finally, use molecule simulation for the design of medicine molecule.
2. Inspur life science high-performance solution
Based on our analysis, the core content of computation of bioinformatics is memory-intensive and storage-intensive. Relying on our years of experience, we can offer specific and professional solutions to our clients.
Inspur high-performance application cluster mainly solves the four primary issues of informatics:
For a high-performance computer, what really matters are the floating-point processing capacity and the CPU’s general performance. Considering the features of bioinformatics, Intel processors are recommended, not only as they have high processing capacity, but also offering significant advantages in energy efficiency ratio, internal memory support, and CPU architecture.
In the application of bioinformatics, import of historic data has increasingly higher requirements on internal memory. Inspur uses high-memory server and four or eight-channel fat nodes and each single node can be furnished with a 2TB memory at the maximum to fully satisfy the actual application needs.
A massive storage system is a precondition of bioinformatics computation. Inspur provides not only professional direct storage, but also optical fiber storage system with an 8Gb interface. Special storage node is adopted to establish Lustre parallel file system and access the Ethernet and even 40GB Infiniband network with a total capacity of PB level. User data safety and data backup are also considered to fundamentally solve the difficulty in data storage in bioinformatics.
High system stability
One highly stable system makes our bioinformatics application faster and more convenient, processes data efficiently and prevents disruption of operations. Inspur can guarantee the stability of the system from all aspects, substantially increase the operational stability of user, reduce fault rates and provide continuous support to the increase of user productivity via our uniform cluster monitoring and management, operation dispatching and our high-performance server.
3. Advantages of Inspur solution
Inspur has a professional HPC application and analysis team which can use unique test tools to establish the hardware platform requirements for client applications and hence help Inspur to provide HPC solution with the highest performance-price ratio to clients.
Inspur not only provides hardware products and solutions, but also performs HPC technology researches, makes advance researches in the sector of isomeric parallel computation and establishes a world-leading application development team to satisfy the customization service needs and special requirements of the client.
In CPU technology, the team has successfully cooperated with BGP to complete mono-frequency property extraction, multiple time window obliquity scanning, body curvature extraction, formation-guided filtering, characteristic value association, strain property extraction, data separation and other petroleum exploration interpretation and processing of CPU multi-core parallel algorithm development and optimization programs.
In GPU technology, Inspur already possess the algorithm, desktop and cluster-level development capability. We have cooperated with: Beijing Genomics Institute, CAS, Northwestern Polytechnical University and BGP to complete Blastn, LES-LBM, PSTM, RNA and other software development programs, increasing the performances of prior software by several tens of folds.
In MIC technology, Inspur established the China Parallel Computation United Laboratory with Intel on August 24, 2011. The laboratory has successfully performed MIC researches in life science, computation hydrodynamics, meteorology, petroleum and computation finance and relevant research results have been displayed in SC11 and IDF12.