Interpretation on Supreme Computing of Huawei Atlas 900 AI Cluster

BrandPost By IDG Contributing Editor
Oct 01, 2019
IT Leadership

On September 18, at HUAWEI CONNECT 2019, Mr. Ken Hu, Deputy Chairman and Rotating CEO of Huawei, proudly released the Atlas 900 AI cluster.

On September 18, at HUAWEI CONNECT 2019, Mr. Ken Hu, Deputy Chairman and Rotating CEO of Huawei, proudly released the Atlas 900 AI cluster. This AI cluster, a set of expert-level servers, delivers the ultimate experience for AI services thanks to its superior computing power running on Huawei’s in-house chips. To ensure cutting-edge AI remains a catalyst for transformation from digital to intelligence, the AI industry is facing challenges regarding its own upgrade and evolution. Huawei supercharges this transformation by contributing supreme AI computing power in the form of large-scale, distributed AI training clusters.

no.4 article image 2 Huawei

Figure 1 Huawei Atlas 900 AI cluster

Introduction to Atlas 900

Neural network architecture trained on large data sets comprises image recognition, natural language processing (NLP), real-time video analysis, intelligent recommendation, and other functions. Large volumes of floating-point computing resources are required to train these neural network models. In recent years, substantial progresses have been made in computing power and training methods of an individual AI processor. However, a single machine cannot perform AI training as it is too time consuming. To improve floating-point computing power of a neural network training system, a large-scale, distributed AI cluster has become a top priority.

The Atlas 900 AI cluster consists of thousands of Ascend 910 AI processors and is the fastest AI training cluster in the world. It delivers 256 to 1024 PFLOPS @FP16, a performance equivalent to 500,000 PCs, allowing users to easily train algorithms and datasets for various needs.

Technological Advantages of Atlas 900

Leading AI Computing Power

The Atlas 900 AI cluster uses Ascend 910, the world’s most powerful AI processor. Each processor has 32 built-in DaVinci AI Cores and provides twice the computing power (256 TFLOPS @FP16) as industry counterparts do. The Atlas 900 AI cluster interconnects thousands of Ascend 910 AI processors to build the industry’s fastest computing cluster.

The Ascend 910 AI processor adopts SoC design and supports both AI and general-purpose computing running high-speed, large-bandwidth I/O. This processor greatly offloads data preprocessing tasks of the host CPU and therefore improves the training efficiency.

Optimal Cluster Network

To reduce the network latency, the Atlas 900 AI cluster uses three high-speed interconnection modes: HCCS, PCIe 4.0, and 100G Ethernet, and a dedicated 100 TB full-mesh, non-blocking synchronization network. This helps reduce the gradient synchronization latency by 10% to 70%.

Within the AI server, the Ascend 910 AI processors are interconnected through the HCCS high-speed bus. These processors are interconnected with the CPU by using the latest PCIe 4.0, at the rate of 16 Gbit/s, twice that of the mainstream PCIe 3.0 (8.0 Gbit/s). This new technology makes data transmission faster and more efficient. At the cluster layer, the CloudEngine 8800 series switches dedicated for data centers provide 100 Gbit/s switching rate on a single port and connect all AI servers in the cluster to the high-speed switching network.

iLossless, a unique, intelligent lossless switching algorithm, is used to learn and train network traffic in the cluster in real time, achieving zero packet loss and E2E μs latency.

System-Level Tuning

Leveraging the Huawei cluster communication library and job scheduling platform, Atlas 900 also integrates HCCS, PCIe 4.0, and 100G RoCE interfaces to fully unlock the computing power of Ascend 910.

The Huawei cluster communication library provides a distributed parallel library for training networks. System-level tuning are carried out for the communication library, network topology, and training algorithms, delivering over 80% cluster linearity and greatly improving job scheduling efficiency.

Extreme Heat Dissipation

Most traditional data centers use air cooling to dissipate heat for devices. However, this method is insufficient in the era of AI, as components such as CPU and AI chipsets feature high power consumption. To fulfill the need for more efficient cooling methods, liquid cooling is introduced.

The Atlas 900 AI cluster adopts a full liquid cooling solution, comprising innovative cabinet-level heat insulation technology that supports a liquid cooling ratio greater than 95%. A single cabinet supports heat dissipation up to 50kW, achieving extreme energy efficiency for data centers with PUE less than 1.1.

Another benefit of this technology is that is slashes the necessary equipment room space by 79% when compared with the use of 8kW air-cooled cabinets. The optimal liquid cooling method meets requirements for high power, high-density device deployment, and low PUE, greatly reducing the TCO.

Leading Benchmark Index

To make extraordinary computing power more broadly accessible to its customers across different industries, Huawei deployed an Atlas 900 AI cluster with 1024 Ascend 910 AI processors on HUAWEI CLOUD. Based on the current most typical ResNet-50 v1.5 model and ImageNet-1k dataset, the Atlas 900 AI cluster can complete training in just 59.8 seconds, ranking first in the world.

ImageNet-1k dataset contains 1.28 million images with a precision of 75.9%. To put that into perspective, the other two mainstream manufacturers scored 70.2s and 76.8s under the same precision. The Atlas 900 AI cluster is 15% faster than the next fastest rival.

no.4 article image Huawei

Because Atlas 900 provides supreme computing power for large-scale dataset neural network training, it is perfect for scientific research and business projects. Researchers can quickly train AI models using image, video, or voice data, allowing these datasets to help explore the universe, predict the weather, find oil, and bring automated driving to cities.

Atlas 900 can also provide abundant, economical computing resources on the cloud and an efficient, full-process AI platform. It delivers amazing experience with inclusive AI that is accessible, affordable, and easy to use.