The main purpose of establishing Saudi Arabia's gene data processing platform is to work with the SOLiD system of ABI and the GS FLX sequencing system of the 454 Corporation, as which are supported by FASTA, BLAST, Genscan, RepeatMasker, NAMD, Phrap and other life science software for relevant research work.
There are many types of sequencing devices today. Saudi Arabia's gene data processing system comprises the SOLiD sequencing system and the 454 Corporation's sequencing system. SOLiD sequencing system is an end-to-end genome analysis and research solution, which encompasses sequencing assembly, chemical assembly, computational cluster and data storage assembly. This platform enables sequencing based on oligonucleotides connection and testing. Compared to the polyase sequencing method, SOLiD system utilizes phased connection technology to generate high-quality data and is applicable for full genome sequencing and oriented re-sequencing, transcript analysis, micro molecular RNA discovery, genetic expression map analysis, chromatin immunity precipitation (ChIP), microbe and eukarya resequencing, digital caryogram analysis, medical sequencing, gene genotyping, among others. Meanwhile, the GS FLX sequencing system of 454 Corporation relies on noctilucence for DNA sequence analysis and couples the aggregation of each dNTP and the release of chemical irradiance signal under the joint influence of DNA polyase, ATP sulfurylase, luciferase and dual phosphatase, and hence achieves the purpose of real-time DNA sequencing by measuring the presence and strength of chemical radiance signals.
Analysis of Application Features
Gene research engineering constitutes an important sector in the field of life science. The computer resources needed for gene research engineering (including computing speed, memory and disk capacity and network communicating capacity) will thus exceed those required for the moon landing project by millions of times. It highlights the life sciences' enormous demand for high-performance computing. In order to save the precious time of the researchers and satisfy the requirements on computing capacity and storage capacity of the computing system, Inspur recommends adopting the highly mature cluster solution.
In the field of life sciences, many applications require the computing capacity, memory capacity and IO capacity of the system. Therefore, Inspur's system must also meet these requirements. Inspur's scientific researchers are typically experts in a particular field but they may not have a very good understanding of computer cluster. As such, Inspur's system shall be easy to use and manage as far as possible so that the science researchers can devote more time and energy to their research. Considering the increasing demands in life sciences for computational load and storage capacity, the completed system needs to be highly scalable to satisfy future demands. Also, some applications may run for several days and even several months and therefore power consumption is an unavoidable consideration.
In response to the high performance requirement on Saudi Arabia's gene data processing platform, and based on Inspur's profound understanding of high-performance applications in the scientific computation sector, Inspur has specifically launched the Inspur Tiansuo TS10000 cluster based on Intel’s latest 45nm four-core processor for this project. This cluster is characterized by outstanding computing performance, leading power control, convenient and easy monitoring and management system, open and scalable system and well-established service system. Moreover, the widely acclaimed Inspur high-performance cluster training service system is an additional asset to this project as it addresses the user's concerns.
This cluster comprises 72 computational nodes, 1 management node/generic computational node, 1 head node/generic computational node, 2 source data processing nodes/generic computational nodes, 4 storage nodes, 1 fully-linear 20Gb Infiniband computational network, 1 1G management network and the software system and video switching system deployed in the cluster system.
1. High performance: high-performance computational nodes are important in elevating the cluster’s computing capacity. Inspur NX7100DB supports the configuration of two Intel Xeon four-core processors. Each CPU is integrated with 12MB L2 cache with a main frequency of 3.0GHz.
2. High reliability: the management node server and magnetic disk array device are both furnished with RAID redundant disk to ensure data safety and reliability; Inspur cabinet system, rack server node and blade server node all have optimized radiation and redundant designs to ensure stable operation and reduce power consumption.
3. Standard and open cluster architecture: the system design conforms to generic industrial standards on parallel computer cluster design and can be interconnected with the standard parts of any other manufacturer.
4. Easy management: cluster system software is a key part of the cluster server. Inspur’s self-developed cluster management system (TSMM) is adopted for the management of cluster resources as with the management of a server.
5. Scalability: the life sciences computation platform has infinite performance requirements. Expansion without changing the prior equipment architecture is preferred.
King Abdulaziz City for Science and Technology (KACST) (the national scientific research administration and national scientific research organ of Saudi Arabia) launched the DPGP. Via this supercomputer system, the full infrastructure is completed for the DPGP and an influential life sciences and biotechnical center is established to promote the development of Saudi Arabia's biotechnical industry effectively.