NVIDIA H100 HOPPER: THE NEW GPU WILL CROSS THE 80 BILLION TRANSISTOR MARK
Nvidia took feature of the GTC to unveil Hopper, its new GPU architecture designed for the professional strip eager for hardware acceleration. All materialized by the H100 (Nvidia), a huge GPU with 79 billion transistors engraved in 4 nm.
For the mainstream, Nvidia is first and foremost a premier graphics card chipmaker in the world, with GeForce series reigning supreme in the market for over a decade. But driven by its leader Huang, the company (Nvidia) has always seen much further than the gamer market and very quickly positioned itself as a major player in hardware acceleration. A field where the demand is growing better, even exponential, as the resource requirements are enormous on subjects such as artificial intelligence and the management of models (climate, road, etc.) among others.
For this particular sector and specefic, Nvidia has historically always had two different approaches. The first consisted of adapting its graphic architectures in two variants, one for the general public, the other professional — as in the case of Ampère —; the second sought to create two distinct architectures, each targeting a specific market, as was the case with Volta, which was specifically developed for the field of acceleration.
Hopper subscribed to this second approach specific. The architecture was designed for the acceleration domain, so as to meet the expectations of AI or even the omniverse. And the least we can say is that two/three years after the GA100 chip, company Nvidia is coming up with a fairly impressive H100 chip on paper. Comprising 80 billion transistors spread over an area of 814 mm², it stands out quite clearly from its predecessor, which was limited to 54.2 billion transistors over 828 mm². Figures that are not misleading, Nvidia (H100) having abandoned 7 nm engraving in favor of the 4 nm offered by TSMC (node N4). The chip also consumes a maximum of 700 W, much more than the 500 W maximum of the previous generation.
Embedding a PCIe 5.0 interface, the chip is surrounded by a maximum of 80 GB of dedicated HBM3 type memory – enough to offer a bandwidth of 3 TB / s. The specific calculation units – which Nvidia calls accelerators – have been revised, with a fourth generation of Tensor Core dedicated to AI in particular, and which are announced as six times faster than those of the GA100 chip. The number of CUDA Cores type calculation units explodes, going from 6912 to 16,896. This gives rise to raw performance three times higher than on the old generation accelerator, and this, it should be remembered , in just two years.
Nvidia has also imagined a new acceleration engine called Transformer Engine. This is intended to accelerate the processing of models related to artificial intelligence around real-time translation, query interpretation, image analysis or even in the field of health and climate. Neural training that used to take days can now be done in just hours. A feat that will interest its world, especially Google, whose BERT algorithm uses this type of engine to better understand user requests and respond more and more precisely to these questions. As an example, Nvidia indicated that a job that took 7 days on 8000 GPUs would now only take 20 hours with Hopper chips.
This new GPU will be offered to Nvidia partners starting in the third quarter. It can be purchased individually (PCIe format) but also in the form of DGX racks integrating 8 modules or SuperPOD cabinets in which there are 32 modules. A maximum of 256 modules can be interconnected using an NVLink switch capable of establishing communication at 70.4 TB/s between the modules. Finally, supercomputers are already on the program, in particular Nvidia’s Eos unit — a supercomputer that the company will use itself and which it will offer to its partners — which will include 576 DGX racks or 4608 GPUs. The Eos will offer a computing power of 275 peta FLOPS in FP64, which will position it as the second supercomputer in the world, behind Fugaku (442 petaFLOPS). It now remains to wait for announcements from Nvidia in the field of the general public: the firm should most likely announce the succession of Ampère in the coming months.
The GPU in question will be the first to cross the one hundred billion transistor mark. The GPU die set shows 42 GPCs on each die for a total of 84, each containing 4 SMs with 128 CUDA cores each. This translates to a whopping total of 43008 CUDA cores. If NVIDIA can hit 1700MHz on this beauty as well, you’ll get 146 TFLOPs of raw graphics power.
NVIDIA Hopper H100 GPU Specs at a Glance
As for the specs, the NVIDIA Hopper GH100 GPU is comprised of a massive configuration of 144 SM (Streaming Multiprocessor) chips which is featured in a total of 8 GPCs. These GPCs switch in total of 9 TPCs which are further composed of 2 SM units each. This gives us eighteen SM per GPC and 144 on the full 8 GPC configuration. Each SM is made up of up to 128 FP32 units, which should give us a overall of 18,432
CUDA cores. Here are some of the configurations you can expect from the H100 chip:
The full GH100 GPU implementation includes the following units:
Intel CEO Pat Gelsinger predicts end to chip shortages by 2024
- 8 GPC, 72 TPC (9 TPC/GPC), 2 SM/TPC, 144 SM per full GPU
- 128 FP32 CUDA cores per SM, 18,432 FP32 CUDA cores per full GPU
- 4 fourth-generation Tensor cores per SM, 576 per full GPU
- 6 HBM3 or HBM2e stacks, 12 512-bit memory controllers
- 60 MB L2 cache
- Gen 4 NVLink and PCIe Gen 5
The NVIDIA H100 GPU with SXM5 card form factor includes the following units:
- 8 GPC, 66 TPC, 2 SM/TPC, 132 SM per GPU
- 128 FP32 CUDA cores per SM, 16896 FP32 CUDA cores per GPU
- 4 fourth-generation Tensor cores per SM, 528 per GPU
- 80 GB HBM3, 5 HBM3 stacks, 10 512-bit memory controllers
- 50 MB L2 cache
- Gen 4 NVLink and PCIe Gen 5
This is a 2.25x augmentation over the full GA100 GPU configuration. NVIDIA is also taking advantage of more FP64, FP16, and Tensor cores in its GPU Hopper, which would boost performance immensely. And it will be a necessity to compete with Intel’s Ponte Vecchio, which is also expected to feature 1:1 FP64.
You can share with us other ideas on comment …