Updated on 30 Sep 2025

NVIDIA A100 NVLink vs H100 SXM5: LLM Inference Performance Compared

TABLE OF CONTENTS

NVIDIA H100 SXM On-Demand

In our latest blog, we compare the NVIDIA A100 NVLink and NVIDIA H100 SXM5 GPUs for large language model inference on Hyperstack. We ran the benchmark using the Llama 3.1 70B model for efficient inference workloads. Get full details below!

Is inference slowing you down or costing more than it should?

As models grow larger, inference becomes harder to optimise. It’s where many teams hit their biggest bottlenecks. Whether you're deploying in production or fine-tuning in research, delays and inefficiencies can lead to high latency, rising costs and a poor user experience.

The right GPU can change that.

In this blog, we compare the most popular GPUS for LLM workloads: the NVIDIA A100 NVLink and the NVIDIA H100 SXM5. We ran benchmarks with vLLM, a high-performance inference engine built for throughput and low latency on Hyperstack’s ultimate GPU cloud.

vLLM H100 Benchmark Setup

We ran an in-house benchmark on Hyperstack using vLLM’s official benchmarking suite, simulating real-world inference workloads. Here’s what the LLM inference benchmark setup looked like:

Model: Meta Llama 3.1 70B
Batch Size: 64
Max Model Length: 4096 tokens
Deployment Environment: Hyperstack VMs (NVIDIA A100 NVLink and NVIDIA H100 SXM5)

The focus was on token throughput, which directly affects response time and user experience.

Inference Throughput Comparison

NVIDIA A100 NVLink & NVIDIA H100 SXM5 - Comparison table (1)

Which GPU Delivered Higher Throughput: Result

NVIDIA A100 80GB Nvlink & NVIDIA H100 80GB SXM5 Updated Bar Chart

The NVIDIA H100 llm inference performance was amazing because the NVIDIA H100 SXM5 outperformed the NVIDIA A100 NVLink by 2.8x in terms of tokens generated per second. While the NVIDIA H100 SXM5 is only 1.7x more expensive, it delivers significantly higher cost-efficiency for inference tasks.

That means you get:

Lower latency for interactive applications
Higher throughput for batch inference pipelines
Better ROI for AI teams deploying LLMs at scale

Conclusion

If you’re looking to accelerate inference while keeping costs under control, the NVIDIA H100 SXM5 is the clear choice. With 2.8x the performance at only 1.7x the cost, it delivers more value per token than the NVIDIA A100 NVLink when deployed on our platform optimised for LLM workloads at scale.

Accelerate LLM Inference on Hyperstack

Run your LLM workloads with NVIDIA H100 SXM GPUs on Hyperstack, starting at $2.40/hr.

FAQs

What model and settings were used for this benchmark?

The benchmark was run using the Llama 3.1 70B model with a batch size of 64 and a maximum model length of 4096 tokens. The in-house vLLM benchmarking suite was used for consistency.

How much faster is the NVIDIA H100 SXM5 compared to the NVIDIA A100 NVLink?

The NVIDIA H100 SXM5 delivers approximately 2.8 times more inference throughput, generating 3311 tokens per second compared to 1148 tokens per second on the NVIDIA A100 NVLink.

Is the performance of the NVIDIA H100 SXM5 worth the cost?

Yes, the NVIDIA H100 SXM5 is only about 1.7 times more expensive but provides 2.8 times the throughput, making it more cost-effective for LLM inference workloads.

How can I run my inference workloads on Hyperstack using the NVIDIA H100 SXM5?

You can easily deploy and run inference workloads on Hyperstack’s platform, using the NVIDIA H100 SXM5 GPUs on-demand. Visit our console here to log in and get started with our high-performance cloud GPU platform.

What is the cost of NVIDIA H100 SXM on Hyperstack?

You can deploy the powerful NVIDIA H100 SXM5 GPU on-demand for $2.40/hr on Hyperstack.

AI, Machine Learning, LLM, Gen AI, a100, Cloud Computing, GPU Cloud, H100

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Talk to an expert

Share On Social Media

link

Comparing NVIDIA A100 vs NVIDIA H100: Use Cases, Cost and ...

24 Apr 2025

The NVIDIA A100 and NVIDIA H100 are two of the most powerful and popular GPUs, designed ...

link

NVIDIA L40 vs RTX A6000: Best GPU for AI Workloads in 2025

4 Apr 2025

Choosing the right cloud GPU for AI can feel overwhelming. With so many options on the ...

NVIDIA A100 NVLink vs H100 SXM5: LLM Inference Performance Compared

NVIDIA H100 SXM On-Demand

vLLM H100 Benchmark Setup

Inference Throughput Comparison

Which GPU Delivered Higher Throughput: Result

Conclusion

Similar Reads

FAQs

What model and settings were used for this benchmark?

How much faster is the NVIDIA H100 SXM5 compared to the NVIDIA A100 NVLink?

Is the performance of the NVIDIA H100 SXM5 worth the cost?

How can I run my inference workloads on Hyperstack using the NVIDIA H100 SXM5?

Subscribe to Hyperstack!

Get Started

Comparing NVIDIA A100 vs NVIDIA H100: Use Cases, Cost and ...

NVIDIA L40 vs RTX A6000: Best GPU for AI Workloads in 2025

United Kingdom (Head office)

Spain

Solutions

Site map

Products

Legal

NVIDIA A100 NVLink vs H100 SXM5: LLM Inference Performance Compared

NVIDIA H100 SXM On-Demand

vLLM H100 Benchmark Setup

Inference Throughput Comparison

Which GPU Delivered Higher Throughput: Result

Conclusion

Similar Reads

FAQs

What model and settings were used for this benchmark?

How much faster is the NVIDIA H100 SXM5 compared to the NVIDIA A100 NVLink?

Is the performance of the NVIDIA H100 SXM5 worth the cost?

How can I run my inference workloads on Hyperstack using the NVIDIA H100 SXM5?

Subscribe to Hyperstack!

Get Started

Related Post

Comparing NVIDIA A100 vs NVIDIA H100: Use Cases, Cost and ...

NVIDIA L40 vs RTX A6000: Best GPU for AI Workloads in 2025

United Kingdom (Head office)

Spain

Solutions

Site map

Products

Legal