TABLE OF CONTENTS
NVIDIA H100 SXM On-Demand
Key Takeaways
- In this article, NVIDIA H100 GPUs are presented as a major leap forward for AI and high-performance computing workloads.
- The H100 is built on the Hopper architecture and is optimised specifically for large-scale AI training and inference.
- Significant performance gains over previous generations make H100 suitable for massive models and data-intensive tasks.
- Advanced Tensor Core capabilities improve efficiency for modern AI workloads, including transformer-based models.
- High-bandwidth memory and fast interconnects enable better scaling in multi-GPU environments.
- The article positions H100 as a foundational GPU for next-generation AI infrastructure and enterprise deployments.
When NVIDIA CEO Jensen Huang introduced the H100 at the GTC 2022 keynote, calling it the engine of the world's AI infrastructure, he was showing us the future of AI at scale. Released as part of the Hopper architecture, the NVIDIA H100 has become the go-to solution for enterprises looking to accelerate their AI-driven businesses. Leading companies like Meta and xAI are using NVIDIA H100 GPUs, and we can already see how they are shaping the future of AI technology.
NVIDIA H100 GPU Specs at a Glance
The NVIDIA H100 GPU specs show that it offers cutting-edge performance for AI and HPC.
|
Specification |
Value |
|
CUDA Cores |
16,896 |
|
VRAM |
80 GB HBM3 (PCIe) / 80-94 GB HBM3e (SXM5) |
|
Tensor Cores |
528 fourth-generation |
|
Interface |
SXM5 or PCIe Gen 5 |
|
Interconnect |
18x fourth-gen NVLink |
|
Max FP8 Performance |
1,979 TFLOPS |
|
Power Consumption |
700W (SXM) / 350-400W (PCIe) |
These H100 GPU specifications make it ideal for large-scale inference and training tasks, offering unmatched performance across enterprise and research workloads.
1. Meta and xAI Leading the Race
Meta and xAI are among the largest consumers of NVIDIA H100 GPUs. Meta announced plans to expand its generative AI infrastructure with 350,000 H100 GPUs and additional systems collectively delivering compute power comparable to nearly 600,000 H100s, making Meta one of the largest single consumers of NVIDIA H100 GPUs.
Meanwhile, xAI's Colossus supercomputer, based in Memphis, operates with 100,000 NVIDIA H100 GPUs using NVIDIA Spectrum-X Ethernet networking for high-performance AI workflows. Colossus supports training xAI's Grok large language models, including chatbots for X Premium subscribers. xAI has announced plans to expand Colossus to 200,000 H100 GPUs, aiming to build one of the world's largest AI training clusters.
2. Built for Secure AI
NVIDIA is not just leading AI innovation -- it is also building secure AI systems. The NVIDIA H100 is the first GPU to feature confidential computing capabilities, a significant advancement that ensures sensitive data stays protected even during active processing.
During AI training or inference, input data frequently contains personally identifiable information (PII) or critical enterprise secrets, and trained models represent valuable intellectual property. The H100's confidential computing feature secures both data and code using hardware-based isolation and trusted execution environments (TEEs), protecting them from unauthorised access or modification during use.
This makes H100 GPUs the go-to choice for secure AI operations in regulated fields like finance, healthcare, and defence.
3. Performance That Scales
The NVIDIA H100 specifications are built for cluster-level scalability. With 18 NVLink connections and 16,896 CUDA cores, this GPU is designed for high-throughput distributed workloads. Whether you are evaluating H100 GPU specifications for large language models or scientific research, the H100 delivers low-latency, high-throughput operations across multiple GPUs.
Enterprises can deploy up to 256 GPUs in a single cluster, achieving extraordinary compute power for workloads like weather simulation or training multi-trillion parameter AI models. The 18 fourth-generation NVLink connections facilitate high-bandwidth communication between GPUs, ensuring that even large clusters operate with minimal latency.
4. Powering Next-Generation AI Infrastructure
The NVIDIA DGX H100 system is the first AI platform designed specifically around NVIDIA H100 Tensor Core GPUs. Each system includes eight H100 GPUs connected via NVLink, delivering 32 petaflops of AI performance at FP8 precision -- six times the output of its predecessor, the DGX A100.
"AI has fundamentally changed what software can do and how it is produced. Companies revolutionising their industries with AI realise the importance of their AI infrastructure. Our new DGX H100 systems will power enterprise AI factories to refine data into our most valuable resource -- intelligence."
- Jensen Huang, founder and CEO of NVIDIA (GTC 2022 keynote).
NVIDIA's Eos supercomputer, built with 576 DGX H100 systems and 4,608 H100 GPUs, was one of the most powerful AI systems at its launch in 2023. The broader ecosystem of DGX H100 deployments has since formed the backbone of enterprise AI infrastructure worldwide, with cloud providers, research institutions, and hyperscalers all building on top of this platform.
5. Multi-Instance GPU (MIG): One NVIDIA H100, Multiple Workloads
One of the most operationally important features of the H100 is Multi-Instance GPU (MIG). MIG allows a single H100 to be partitioned into up to seven independent GPU instances, each with its own dedicated VRAM, compute, and memory bandwidth -- completely isolated from the others.
This means a single H100 can simultaneously serve seven separate inference workloads, or a mix of training and inference jobs, without any cross-contamination of resources. For cloud providers and enterprises running multiple tenants or models, MIG dramatically improves utilisation and cost efficiency.
Each MIG instance on an H100 can be as small as 10 GB VRAM with ~200 TFLOPS of FP8 throughput, making it viable for production inference of models up to around 7B parameters on a single partition. MIG also supports confidential computing, meaning each instance can operate in its own trusted execution environment.
6. How the NVIDIA H100 Compares to the NVIDIA A100
The NVIDIA A100 introduced tensor float-32 (TF32) precision and scalable multi-GPU setups, but the NVIDIA H100 takes a significant step forward. The NVIDIA H100 delivers up to 3x faster AI training and up to 6x faster inference compared to the NVIDIA A100, driven by the Hopper architecture's Transformer Engine optimisations.
The Transformer Engine dynamically selects between FP8 and FP16 precision at the layer level during training, maintaining accuracy while maximising throughput for large language models like Llama 3 and Mistral. Combined with higher memory bandwidth and 18 fourth-generation NVLink connections, the NVIDIA H100 offers substantially better scalability for large AI clusters.
|
Capability |
NVIDIA A100 |
NVIDIA H100 |
|
Architecture |
Ampere |
Hopper |
|
FP8 Performance |
Not supported |
1,979 TFLOPS |
|
VRAM (SXM) |
80 GB HBM2e |
80-94 GB HBM3/HBM3e |
|
NVLink Generation |
3rd gen (600 GB/s) |
4th gen (900 GB/s) |
|
Transformer Engine |
No |
Yes |
|
Confidential Computing |
No |
Yes |
|
MIG Instances (max) |
7 |
7 (with per-instance TEE) |
|
AI Training speedup vs prev gen |
Baseline |
Up to 3x faster |
Try NVIDIA H100 GPU VMs on Hyperstack
Hyperstack provides on-demand access to NVIDIA H100 PCIe and H100 SXM GPUs, starting at $1.90/hour. No long-term commitments or infrastructure management. Just spin up a VM in minutes and start running your workloads immediately.
FAQs
What makes NVIDIA H100 special?
The NVIDIA H100 is designed for AI, ML and HPC workloads, featuring the Hopper architecture, which excels in transformer model performance. It also includes advanced features like NVLink, FP8 precision and exceptional scalability for large-scale computing.
What are the NVIDIA H100 CUDA cores?
The NVIDIA H100 GPU includes 16,896 CUDA cores, making it ideal for massive parallel workloads like model training and scientific simulations. These cores are a key part of the H100's impressive GPU specs.
How powerful is the H100?
The NVIDIA H100 delivers up to 60 teraflops of FP64 performance and over 1000 teraflops of FP8 AI performance, making it one of the most powerful GPUs for AI and HPC applications.
What is the VRAM size of the NVIDIA H100 GPU?
The H100 GPU VRAM varies by model: the SXM version includes up to 94 GB of HBM3e memory, while the PCIe version supports 80 GB of HBM3. This large memory bandwidth enables in-memory training of massive AI models..
What is the cost of NVIDIA H100 GPUs at Hyperstack?
The cost of NVIDIA H100 GPUs at Hyperstack starts at $1.33/hour, offering both PCIe and SXM options. Check out our cloud GPU pricing here!
Can I get NVIDIA H100 GPUs on Hyperstack?
Yes, you can access NVIDIA H100 PCIe and NVIDIA H100 SXM GPUs on Hyperstack for high-performance AI workloads. Sign up here to access NVIDIA H100 GPUs now:
How do I deploy NVIDIA H100 GPUs on Hyperstack?
Hyperstack offers simple 1-click deployment for NVIDIA H100 GPUs, making it easy to get started with AI models, training and inference. Watch Hyperstack GPU Cloud Platform Quick Tour to get started.
Subscribe to Hyperstack!
Enter your email to get updates to your inbox every week
Get Started
Ready to build the next big thing in AI?