What is the Blackwell Chip?

The Blackwell Chip processor contains 208 billion transistors and is produced with a custom-designed 4NP TSMC process. Every NVIDIA Blackwell product includes two dies connected through a 10 Tb/s chip-to-chip interconnect, delivering the capability of two GPUs in a single, fully cache-coherent, full-stack-ready chip.

What are the features of NVIDIA Blackwell?

The Blackwell Architecture NVIDIA is powered by six revolutionary technologies that will enable AI training and real-time LLM inference for models scaling up to 10 trillion parametres: World’s Most Powerful Chip: The Blackwell processor packs 208 billion transistors and is manufactured using a custom-built 4NP TSMC process. All NVIDIA Blackwell products feature two dies linked via a 10 Tb/s chip-to-chip interconnect, providing the power of two GPUs in one fully cache-coherent, full-stack-ready chip. Generative AI Engine: Custom Tensor Core technology, combined with NVIDIA TensorRT-LLM and NeMo framework innovations, accelerates inference and training for LLMs, including a mixture-of-experts models. Secure AI: Advanced confidential computing capabilities protect AI models and customer data with uncompromised performance, with support for new native interface encryption protocols, which are critical for privacy-sensitive industries like healthcare and financial services. Fifth-Generation NVLink: To accelerate performance for multitrillion-parameter AI models, NVLink’s latest iteration delivers ground-breaking 1.8Tb/s throughput per GPU, ensuring seamless high-speed communication among up to 576 GPUs for today’s most complex large language models. Decompression Engines: Dedicated decompression engines support the latest formats, accelerating database queries to deliver the highest performance in data analytics and data science. RAS Engine: Only Blackwell-powered GPUs include a dedicated engine for reliability, availability and serviceability.

What is the NVIDIA Blackwell power consumption?

The NVIDIA Blackwell power consumption varies for each blackwell product such as: NVIDIA GB200 NVL72: Up to 1200W NVIDIA HGX B100: Up to 700W NVIDIA DGX B200: Up to 1000W

Is the Blackwell GPU better than Hopper?

Yes. NVIDIA Blackwell delivers significantly higher AI throughput, improved memory bandwidth, and more efficiency than Hopper. It’s designed for next-gen LLM training and massive inference workloads, making it a major leap over H100/H200 GPUs.

How fast is Blackwell Nvidia?

NVIDIA Blackwell can deliver up to ~20 PFLOPS (FP4), offering roughly 3–4× performance uplift for AI workloads compared to Hopper. It enables faster training, higher inference throughput, and improved efficiency for large-scale generative AI systems.

Damanpreet Kaur Vohra

Updated on 4 May 2026

All About the NVIDIA Blackwell GPUs

Q: What is the NVIDIA Blackwell?

NVIDIA Blackwell is the latest ground-breaking GPU architecture announced on 18 March by NVIDIA CEO Jensen Huang at the “The #1 AI Conference for Developers” GTC 2024 in San Jose.

TABLE OF CONTENTS

NVIDIA's Blackwell architecture announced at GTC 2024 by CEO Jensen Huang is designed for generative AI and large scale AI workloads with its groundbreaking technology. Featuring a new Blackwell GPU, the architecture cuts energy usage for large-scale AI inference by 25 times. New Blackwell nnovations like advanced NVLink, custom Tensor Cores and RAS engine pave the way for real-time AI models with up to 10 trillion parameters. The latest NVIDIA Blackwell enables high-speed performance for data processing, electronic design automation and quantum computing.

The NVIDIA Blackwell GPUs represent the biggest architectural shift since Hopper, built for trillion-parameter models and extreme performance per watt. This article explains exactly what’s new, faster training, improved inference efficiency, and advanced AI-native capabilities right from the start. We break down the architecture, expected performance gains and real-world implications with clear examples.

Highlights

Announced on 18 March 2024 by NVIDIA CEO Jensen Huang at GTC 2024

NVLink Enable Trillion-Parametre-Scale AI Models

Latest Blackwell Tensor Cores and TensorRT- LLM Compiler Reduce LLM Inference Operating Cost and Energy by up to 25 times

New Accelerators Enable Innovation in Data Processing, Electronic Design Automation, Computer-Aided Engineering and Quantum Computing

After months of anticipation, NVIDIA CEO Jensen Huang finally took the stage today at “The #1 AI Conference for Developers” GTC 2024. His highly-anticipated keynote kicked off the event, promising to reveal cutting-edge innovations that will power the new era of Generative AI. Check out the full blog to learn about Blackwell specs, Blackwell GPU architecture and more.

Blackwell GPU Architecture

NVIDIA's Blackwell is the next-generation NVIDIA GPU built on the revolutionary NVIDIA Blackwell GPU architecture, featuring advanced NVLink, confidential computing, and the powerful NVIDIA Grace Blackwell Superchip for massive AI workloads.

The announcement revealed NVIDIA's latest breakthrough called the “Blackwell”. It is the NVIDIA Blackwell architecture succeeding the NVIDIA Hopper architecture launched on September 20, 2022. The NVIDIA Blackwell GPU is named after David Harold Blackwell, a statistician and mathematician specialising in game theory and statistics. The NVIDIA Blackwell GPU will enable organisations across the globe to build and run real-time inference on trillion-parameter large language models at 25x less cost and energy consumption than its predecessor. The NVIDIA Blackwell architecture features six transformative technologies for generative AI and accelerated computing, which will help in breakthroughs in data processing, electronic design automation, computer-aided engineering and quantum computing.

NVIDIA Blackwell Features to Power Generative AI

NVIDIA Blackwell GPUs are powered by six revolutionary technologies that will enable AI training and real-time LLM inference for models scaling up to 10 trillion parametres. The ground-breaking GPU will include the following features:

Blackwell chip: The Blackwell includes the new NVIDIA chip processor packs 208 billion transistors and is manufactured using a custom-built 4NP TSMC process. All NVIDIA Blackwell products feature two dies that are linked via a 900GB/s chip-to-chip interconnect, providing the power of two GPUs in one fully cache-coherent, full-stack-ready Blackwell chip.
Generative AI Engine: Custom Tensor Core technology, combined with NVIDIA TensorRT-LLM and NeMo framework innovations, accelerates inference and training for LLMs, including a mixture-of-experts models. Enterprises can optimise their business with the latest expert parallelism and quantisation techniques and deploy these models by using new precision formats.
Secure AI: Advanced confidential computing capabilities protect AI models and customer data with uncompromised performance, with support for new native interface encryption protocols, which are critical for data-sensitive industries like healthcare and financial services.
Fifth-Generation NVLink: To accelerate performance for multitrillion-parameter AI models, NVLink’s latest iteration delivers ground-breaking 1.8 Tb/s throughput per GPU, ensuring seamless high-speed communication among up to 576 GPUs for today’s most complex large language models.
Decompression Engines: Dedicated decompression engines support the latest formats, accelerating database queries to deliver the highest performance in data analytics and data science.
RAS Engine: Only Blackwell-powered GPUs include a dedicated engine for reliability, availability and serviceability.

Also, this architecture adds capabilities at the Blackwell chip level to utilise AI-based preventative maintenance to run diagnostics and forecast reliability issues. This maximises system uptime and improves resiliency for massive-scale AI deployments to run uninterrupted for weeks or even months at a time and to reduce operating costs.

About NVIDIA Blackwell GPUs: B100 and B200

Based on the Blackwell architecture NVIDIA, the NVIDIA B200 Tensor Core GPU delivers a massive leap forward in speeding up inference workloads, making real-time performance a possibility for resource-intensive and multitrillion-parameter language models.

Two B200 GPUs are combined in Blackwell’s flagship accelerator, the NVIDIA GB200 Grace Blackwell chip, which also utilises an NVIDIA Grace CPU. The GB200 provides a 30x performance increase compared to the NVIDIA H100 Tensor Core GPU for LLM inference workloads and reduces cost and energy consumption by 25x.

For the highest AI performance, GB200 supports the NVIDIA Quantum-X800 InfiniBand and Spectrum™-X800 Ethernet platforms which deliver advanced networking options at speeds up to 800 Gb/s. The GB200 NVL72 also includes NVIDIA BlueField®-3 data processing units to enable cloud network acceleration, composable storage, zero-trust security and GPU compute elasticity in hyperscale AI clouds.

The GB200 is a key component of the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale platform for the most compute-intensive workloads. It combines 36 Grace Blackwell chip, which include 72 B200 GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink.

To help accelerate the development of Blackwell-based servers from its partner network, NVIDIA announced NVIDIA Blackwell B200, a server board that links eight B200 GPUs through high-speed interconnects to develop the world’s most powerful x86 generative AI platforms. HGX B200 supports networking speeds up to 400 Gb/s through the Quantum-2 InfiniBand and Spectrum-X Ethernet platforms, along with support for BlueField-3 DPUs.

NVIDIA Blackwell GPU Specs

Check out the latest NVIDIA Blackwell specs for NVIDIA Blackwell GB200, B200 and B100:

Per GPU Specifications	NVIDIA Blackwell GB200 NVL72	NVIDIA Blackwell B200	NVIDIA Blackwell B100
FP4 Tensor Core	20 petaFLOPS	18 petaFLOPS	14 petaFLOPS
FP8/FP6 Tensor Core	10 petaFLOPS	9 petaFLOPS	7 petaFLOPS
INT8 Tensor Core	10 petaOPS	9 petaOPS	7 petaOPs
FP16/BF16 Tensor Core	5 petaFLOPS	4.5 petaFLOPS	3.5 petaFLOPS
TF32 Tensor Core	2.5 petaFLOPS	2.2 petaFLOPS	1.8 petaFLOPS
FP64 Tensor Core	45 teraFLOPS	40 teraFLOPS	30 teraFLOPS
GPU memory	Up to 192 GB HBM3e	Up to 192 GB HBM3e	Up to 192 GB HBM3e
Bandwidth	Up to 8 TB/s	Up to 8 TB/s	Up to 8 TB/s
Multi-Instance GPU (MIG)	7	7	7
Decompression Engine	Yes	Yes	Yes
Decoders	2x 7 NVDEC, 2x 7 NVJPEG	2x 7 NVDEC, 2x 7 NVJPEG	2x 7 NVDEC, 2x 7 NVJPEG
Power	Up to 1200W	Up to 1000W	Up to 700W
Interconnect	5th Generation NVLink: 1.8TB/s, PCIe Gen6: 256GB/s	5th Generation NVLink: 1.8TB/s, PCIe Gen6: 256GB/s	5th Generation NVLink: 1.8TB/s, PCIe Gen6: 256GB/s

FAQs

What is the NVIDIA Blackwell architecture?

Blackwell is NVIDIA's GPU architecture announced March 2024, succeeding Hopper. It is built on a dual-die chip (208 billion transistors, 10 TB/s chip-to-chip interconnect), fifth-generation NVLink at 1.8 TB/s per GPU, and new Tensor Core precision formats including FP4 and FP6 — designed for trillion-parameter model training and real-time LLM inference.

Should I choose NVIDIA B200 or NVIDIA H100 for my workload?

Choose B200 for inference on models above 30B parameters, large-scale training, or full-parameter fine-tuning of 70B+ models. The 192 GB HBM3e and 8 TB/s bandwidth are decisive at that scale. For smaller models or cost-sensitive batch jobs already running well on H100, the H100 SXM remains a strong option on Hyperstack. The team can run a workload assessment if you are unsure.

Are B200 and GB200 available on demand or only by reservation on Hyperstack?

Both B200 and GB200 NVL72 are available for reservation on Hyperstack.

What is the chip-to-chip interconnect speed?

All NVIDIA Blackwell products feature two dies connected via a 10 TB/s chip-to-chip interconnect, presenting as a single fully cache-coherent GPU to the software stack.

Is Blackwell better than Hopper for training, not just inference?

Yes. For training, the key gains are higher HBM bandwidth, new FP8/FP6 precision formats for larger effective batch sizes, and fifth-generation NVLink reducing all-reduce overhead at scale. The GB200 NVL72's 72-GPU NVLink domain is particularly effective for tensor-parallel training of dense models above 100B parameters.

Partnerships, Innovation, AI, Machine Learning, Supercomputing, Computer Vision, Product Updates

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

NVIDIA Blackwell Reserve Now

Coming Q4 this year!

Reserve now

Talk to an expert

Share On Social Media

link

Data Centre GPUs for Driving Innovation and Efficiency

Once thought to be useful only for rendering video games, GPUs now power some of the ...

All About the NVIDIA Blackwell GPUs

Highlights

Announced on 18 March 2024 by NVIDIA CEO Jensen Huang at GTC 2024

NVLink Enable Trillion-Parametre-Scale AI Models

Latest Blackwell Tensor Cores and TensorRT- LLM Compiler Reduce LLM Inference Operating Cost and Energy by up to 25 times

New Accelerators Enable Innovation in Data Processing, Electronic Design Automation, Computer-Aided Engineering and Quantum Computing

Blackwell GPU Architecture

NVIDIA Blackwell Features to Power Generative AI

About NVIDIA Blackwell GPUs: B100 and B200

NVIDIA Blackwell GPU Specs

FAQs

What is the NVIDIA Blackwell architecture?

Should I choose NVIDIA B200 or NVIDIA H100 for my workload?

Are B200 and GB200 available on demand or only by reservation on Hyperstack?

What is the chip-to-chip interconnect speed?

Is Blackwell better than Hopper for training, not just inference?

Subscribe to Hyperstack!

NVIDIA Blackwell Reserve Now

Data Centre GPUs for Driving Innovation and Efficiency

United Kingdom (Head office)

Registered Office

Spain

Solutions

Resources

Site map

Products

Legal

All About the NVIDIA Blackwell GPUs

Highlights

Announced on 18 March 2024 by NVIDIA CEO Jensen Huang at GTC 2024

NVLink Enable Trillion-Parametre-Scale AI Models

Latest Blackwell Tensor Cores and TensorRT- LLM Compiler Reduce LLM Inference Operating Cost and Energy by up to 25 times

New Accelerators Enable Innovation in Data Processing, Electronic Design Automation, Computer-Aided Engineering and Quantum Computing

Blackwell GPU Architecture

NVIDIA Blackwell Features to Power Generative AI

About NVIDIA Blackwell GPUs: B100 and B200

NVIDIA Blackwell GPU Specs

FAQs

What is the NVIDIA Blackwell architecture?

Should I choose NVIDIA B200 or NVIDIA H100 for my workload?

Are B200 and GB200 available on demand or only by reservation on Hyperstack?

What is the chip-to-chip interconnect speed?

Is Blackwell better than Hopper for training, not just inference?

Subscribe to Hyperstack!

NVIDIA Blackwell Reserve Now

Related Post

Data Centre GPUs for Driving Innovation and Efficiency

United Kingdom (Head office)

Registered Office

Spain

Solutions

Resources

Site map

Products

Legal