<img alt="" src="https://secure.insightful-enterprise-intelligence.com/783141.png" style="display:none;">

NVIDIA H100 SXMs On-Demand at $2.40/hour - Reserve from just $1.90/hour. Reserve here

Deploy 8 to 16,384 NVIDIA H100 SXM GPUs on the AI Supercloud. Learn More

|

Published on 20 May 2025

LLM Inference Benchmark: Comparing NVIDIA A100 NVLink vs NVIDIA H100 SXM

TABLE OF CONTENTS

updated

Updated: 20 May 2025

NVIDIA H100 SXM On-Demand

Sign up/Login
summary

In our latest blog, we compare the NVIDIA A100 NVLink and NVIDIA H100 SXM5 GPUs for large language model inference on Hyperstack. We ran the benchmark using the Llama 3.1 70B model for efficient inference workloads. Get full details below!

Is inference slowing you down or costing more than it should?

As models grow larger, inference becomes harder to optimise. It’s where many teams hit their biggest bottlenecks. Whether you're deploying in production or fine-tuning in research, delays and inefficiencies can lead to high latency, rising costs and a poor user experience.

The right GPU can change that.

In this blog, we compare the most popular GPUS for LLM workloads: the NVIDIA A100 NVLink and the NVIDIA H100 SXM5. We ran benchmarks with vLLM, a high-performance inference engine built for throughput and low latency on Hyperstack’s ultimate GPU cloud.

Benchmark Setup

We ran an in-house benchmark on Hyperstack using vLLM’s official benchmarking suite, simulating real-world inference workloads. Here’s what the setup looked like:

  • Model: Meta Llama 3.1 70B

  • Batch Size: 64

  • Max Model Length: 4096 tokens

  • Deployment Environment: Hyperstack VMs (NVIDIA A100 NVLink and NVIDIA H100 SXM5)

The focus was on token throughput, which directly affects response time and user experience.

Inference Throughput Comparison

NVIDIA A100 NVLink & NVIDIA H100 SXM5 - Comparison table (1)

Which GPU Delivered Higher Throughput: Result

NVIDIA A100 80GB Nvlink & NVIDIA H100 80GB SXM5 Updated Bar Chart

The NVIDIA H100 SXM5 outperformed the NVIDIA A100 NVLink by 2.8x in terms of tokens generated per second. While the NVIDIA H100 SXM5 is only 1.7x more expensive, it delivers significantly higher cost-efficiency for inference tasks.

That means you get:

  • Lower latency for interactive applications

  • Higher throughput for batch inference pipelines

  • Better ROI for AI teams deploying LLMs at scale

Conclusion

If you’re looking to accelerate inference while keeping costs under control, the NVIDIA H100 SXM5 is the clear choice. With 2.8x the performance at only 1.7x the cost, it delivers more value per token than the NVIDIA A100 NVLink when deployed on our platform optimised for LLM workloads at scale.

Accelerate LLM Inference on Hyperstack

Run your LLM workloads with NVIDIA H100 SXM GPUs on Hyperstack, starting at $2.40/hr.

Similar Reads

FAQs

What model and settings were used for this benchmark?

The benchmark was run using the Llama 3.1 70B model with a batch size of 64 and a maximum model length of 4096 tokens. The in-house vLLM benchmarking suite was used for consistency.

How much faster is the NVIDIA H100 SXM5 compared to the NVIDIA A100 NVLink?

The NVIDIA H100 SXM5 delivers approximately 2.8 times more inference throughput, generating 3311 tokens per second compared to 1148 tokens per second on the NVIDIA A100 NVLink.

Is the performance of the NVIDIA H100 SXM5 worth the cost?

Yes, the NVIDIA H100 SXM5 is only about 1.7 times more expensive but provides 2.8 times the throughput, making it more cost-effective for LLM inference workloads.

How can I run my inference workloads on Hyperstack using the NVIDIA H100 SXM5?

You can easily deploy and run inference workloads on Hyperstack’s platform, using the NVIDIA H100 SXM5 GPUs on-demand. Visit our console here to log in and get started with our high-performance cloud GPU platform.

What is the cost of NVIDIA H100 SXM on Hyperstack?

You can deploy the powerful NVIDIA H100 SXM5 GPU on-demand for $2.40/hr on Hyperstack.

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Sign up now
Talk to an expert

Share On Social Media

24 Apr 2025

The NVIDIA A100 and NVIDIA H100 are two of the most powerful and popular GPUs, designed ...

4 Apr 2025

When choosing a GPU for your AI workloads, you’re likely to feel confused. With plenty of ...