General descriptions: Inspur has the most systematic and professional HPC support team in the domestic numeric forecast sector who can conduct professional performance optimization, process optimization and service system monitoring for user applications.
I. Demand and analysis
Atmospheric environment application needs
Severe pollution has frequently occurred in the Chinese Mainland over recent years. In particular, hazes hovering above North China, Huanghuai, Jianghuai, south China and Central East China for several days in October 2011, Jan. 2013 and Feb. 2014, and the problem of hazing hence became a focus of public concern. The continuous occurrence of atmospheric pollution events generates severe impacts on the physical health and production and life of the public and therefore attracts the attention from the public, the media, and central government. These pollutions occurred after the promulgation of the Environmental Air Quality Standards (GB3095-2012), and the evaluation methods thereof, and therefore raised severe challenges to the urban air quality and atmospheric pollution prevention and treatment work of China. In September 2013, the State Council distributed the notice of “Atmospheric Pollution Prevention and Treatment Action Plan” (hereinafter referred to as the Action Plan), aiming to “make general improvement of air quality nationwide, substantially reducing the number of severely polluted weathers and significantly improving the air quality in the Beijing-Tianjin-Hebei region, the Yangtze River Delta and the Pearl River Delta after five years of efforts; gradually eliminate severely polluted weathers and significantly improving air quality nationwide in another five years or long period”. It specifically requires the establishment of a monitoring and pre-alarm system and requests the environment protection authority to strengthen cooperation with the meteorological organ, establishing a heavily polluted weather monitoring and pre-alarm system, properly analyzing the trend of severely polluted weather, improving the joint discussion and analysis mechanism, improving the accuracy of monitoring and pre-alarm and release monitoring and pre-alarm messages in a timely manner.
The basic reason for the severe atmospheric pollution of China is the fundamental change in the feature of atmospheric pollution. Due to rapid development of social economy and urbanization, singular atmospheric pollution of main cities is gradually being replaced by composite atmospheric pollution and the regional composite air pollution characterized by PM2.5 and O3 (ozone) becomes increasingly serious. The scope of discharges of pollution sources and their impacts crosses the administrative boundaries of cities and even provinces. Generally, the regional pollution and secondary pollution are its distinctive features. Such secondary pollution represented by PM2.5 and O3 has gradually become a main bottleneck of China’s urban and regional air quality improvement. The complexity of secondary pollution and the severity of its harms determine the enormity of treatment. The conventional pollution control concept can hardly satisfy the current needs for pollution prevention and treatment. Therefore, it is necessary to urgently analyze the causes of atmospheric pollutions in cities and surrounding regions, forecast and warn against pollutions and formulate control measures against them.
A daily forecast of air quality is performed in many countries and regions. Statistical forecasts and numerical forecasts are the two popular forecasting methods in the world and numerical forecasting has become a trend for air pollution forecast research due to its objectivity, timeliness, accuracy and efficiency. But numerical forecasting involves a multitude of programs and modes and demands massive calculation load, frequent communication and IO demands which are beyond the computing capacity of a PC or workstation. Therefore, a high-performance, highly available and highly reliable computing system is critical to the research and development of air quality numeric forecast.
While increasing its ranking among the famous high-performance high-end computers in the world, we are also introducing efficient and stable solutions to various enterprises, public institutions and research institutes. In the numerical forecast of meteorological, marine and environmental sectors, Inspur has accumulated considerable software debugging and service experiences and established long-term sound partnerships with several research institutes after ten-plus years of development. We have relied on our years of practical experiences to launch special hardware and software integration solutions for these industries which become the best options for the smooth progress of the user’s research and operational system. Moreover, Inspur has the most systematic and professional HPC support team in the domestic numeric forecast sector which can conduct professional performance optimization, process optimization and service system monitoring for user applications.
Inspur has had lots of domestic success cases of high-performance clusters in the sector of numeric forecast. The cases that involve an annual system scale of over 30 trillion in the last three years include: Shenzhen Meteorological Bureau (34 trillion), Shanghai Typhoon Bureau (32 trillion), Zhejiang Meteorological Bureau (3.5 trillion) and Beijing Meteorological Bureau (90 trillion). While accumulating success experiences, Inspur has also established sound partnerships with the most domestic numeric forecast research institutes to jointly promote the popularity and development of numeric forecast in China’s environmental pollution and meteorological industry.
Common air quality modes
It is an essential method of scientific research institutes and monitoring organs to perform numeric simulation and forecast atmospheric pollution via high-performance computation and guide pollution prevention and control and policy formulation via source analysis in the environment protection industry.
Relevant entities include:
Environmental monitoring and forecasting entities under the Ministry of Environment Protection. (e.g. environment monitoring station; environment monitoring centers of various provinces and municipalities, etc.)
Various environment research entities (e.g. Institute of Atmospheric Physics of the CAS and the environment research institutes in Shenzhen, Guangzhou, Shanghai, Beijing and Xinjiang)
Environmental schools of various colleges and universities (Beijing University, Tsinghwa University, Lanzhou University and South China University of Technology)
The following institutions have many mature applications of air quality forecast mode: NAQPMS mode of the Institute of Atmospheric Physics of the CAS, Models-3/CMAQ mode of the EPA, CAMx mode of the US Environ and WRF-Chem of NCAR.
Mode calculation features
Massive calculation load
Medium-scale meteorological forecast modes (WRF, GRAPES, etc0 and atmospheric chemical modes (e.g. CMAQ) all have enormous calculation loads and most of them are floating-point computations. Theoretically, doubling forecast accuracy is equivalent to increasing the computation load by 16 folds. Such requirements on computation in a numeric forecast mode can’t be satisfied by a single CPU or ordinary computer within the valid time and therefore parallel computation must be utilized. On the one hand, mode forecast software has to be parallelized through message transmission, storage sharing or other communication modes. On the other hand, it is necessary to purchase high-performance computers to satisfy the increasing needs for computation.
Most of the forecast modes as specified herein have been parallelized, e.g. WRF of medium-scale forecast modes and CAMx of air quality modes both support parallel transmission of MPI messages, parallel shared storage of OpenMP and mixed operation of MPI+OpenMP; CMAQ and NAQPMS only support MPI parallelism instead of OpenMP parallelism.
As these modes are all parallel software and finite differential grid mode is normally adopted for parallel computation, there is a massive communication load between CPUs while these forecasting modes are running. The modes have very high requirements on the communication performances. For instance, the communication of medium-scale meteorological forecast mode WRF includes the communication between mother domain and embedded domain and the communication between different data divisions in each domain. Therefore, high-performance computers need to have high-performance communication network.
High I/O requirements
As massive numbers of users and small file reading/writing are involved, the meteorological mode program mostly have high requirements on the IOPS performance of the system, generally requiring the availability of distributive IO or parallel file system. Meanwhile, the stability and availability of the storage system is highly important to the operation of the whole operating system. In the design of the storage system, highly available solutions and storage system that supports self-healing of fault are normally required.
Moreover, strategy-based layered storage had better be provided considering the periodic access to meteorological data.
Massive calculation load of main mode
Software processing flow normally includes pretreatment, main mode and post-treatment. Pretreatment includes data transmission and downloading, data assimilation, etc. Post-treatment essentially refers to graphic treatment of the generated products. Pretreatment and post-treatment usually have low requirements on the computer’s floating-point computing capacity but high requirements on the I/O processing capacity of nodes. The main mode is the main part of the whole system and also where the main dual-accuracy floating-point computation load is performed. This part has very high requirements on the dual-accuracy floating-point performance of the computer.
Considering the foregoing features of forecast mode, it has the following requirements on the computing environment:
High-density cluster system (Cluster) which has a high processing capacity;
Layered storage space which can store periodic service data (hot-spot data) and filing data (big data)
Linux or Unix operating system
C and Fortran 77/90 encoding environment
MPI and OpenMP parallel environment;
Image library and graphic display system, e.g. NCL, MICAPS, GrADS, VIS5D and RIP4
Guarantee that the system can capture background field data, e.g. NCEP and T213
Therefore, the following points are important for the selection of the basic environment during establishment of one numeric forecast platform and a service forecast system.
High performance, particularly the dual-accuracy floating-point processing capacity and the comprehensive processing capability of the pretreatment/post-treatment system of the computing system
High-performance network environment;
High stability of the system;
High-performance and high-availability parallel storage system; support filing;
Mature and stable operating and dispatching system which enables priority dispatching and supports operational occupation/recovery dispatching
Inspur has a professional HPC application analytical team which utilizes our unique test tool to acquire the hardware platform requirements of the client’s application, hence helping us to provide the customer with the HPC solution with the highest performance-price ratio. The following are the application features of the WRF meteorological software.
II. Inspur high-performance meteorological solution
Inspur has proposed solutions according to the high performance requirements of the meteorological industry. Our products have the following distinctive advantages:
High performance, particularly floating-point processing capacity
Meteorological software has very strict requirements on computing capacity. Therefore, the solution includes multiple dual-channel nodes with extremely high floating-point computing capacity and is applicable for computation of MPI distributive memory.
Network bandwidth problem
Meteorological software application has very high requirements on network delay and bandwidth. We have configured a 40Gb or 56Gb high-speed Infiniband network to satisfy the computation and exchange needs of all nodes and reduce network delay.
Storage bandwidth problem
Big data exchanges will occur at the boundaries during the computation of meteorological computation. An excellent storage system can satisfy the software’s requirements on network bandwidth. We have configured the optical fiber storage system of 8Gb interface for connection to the 40Gb or 56Gb high-speed Infiniband network via special IO nodes. Therefore, CPU waiting for data computation can be avoided and the computational efficiency is significantly increased.
One high-stability system can make our meteorological application more convenient and faster. Inspur’s design plan has a high degree of integrity and simple configuration which not only effectively reduces the probability of fault, but also increases the utilization ratio of equipment and hence guarantees the high availability and high stability of the system and the highest investment-output ratio.
III. Advantages and value of Inspur HPC solution
The system solution configuration satisfies user requirements with a reasonable ratio of computing, storage and network parts. It fits in with the user application features and has no performance or functional shortcoming.
The system has a robust computing capacity and rich computing resources. The node selection and configuration fit in with the user application. Lean node, fat node and GPU node are organically combined.
Inspur TSExaStor distributive storage architecture is adopted as the storage system, with sufficient I/O aggregate bandwidth. The storage system is stable and reliable and has a very high scalability.
The industry-leading 56Gb/s FDR InfiniBand high-speed network is adopted as the high-speed computing network and storage network. As the most advanced network technology in the industry, 56Gb FDR has double performances compared with the previous-generation QDR network and therefore the computing efficiency of the parallel application can be substantially improved and the I/O aggregate bandwidth and IOPS performance of the parallel storage system can be significantly increased.
The cluster monitoring and management network is characterized by 1G switch and 10G uplink and therefore guarantees network performances and simplifies wiring management.
System stability, reliability and availability are fully considered in the solution, e.g. main products all have a redundancy design; management nodes of the system has a dual-unit redundancy configuration; the storage system has a dual-active redundancy design; the cluster monitoring and alarm system is adopted to estimate potential risks of the system.
The Inspur ClusterEngine cluster monitoring and management system provides a simple and user-friendly interface for cluster management and operation, including: cluster deployment, monitoring, alarm, management, summarization, statement and operational dispatching; it supports bookkeeping configuration; supports charging as per CPU, internal memory and storage use conditions or self-defined charging strategies; supports statistic statement export; supports online user payment and balance management; supports broken-point setup during operation and resumption of operations at the broken point.
Inspur provides a well-established environment of basic software for high-performance computation, including encoder, function library, common tool library and parallel environment. It also optimizes the system and satisfies the development and operating needs of high-performance computing programs.