Hyperstack - Thought Leadership

OpenAI's GPT-OSS 20B and 120B: Here’s All You Need to Know

Written by Damanpreet Kaur Vohra | Aug 7, 2025 7:38:06 AM

For the first time since GPT‑2, OpenAI has dropped fully open‑weight language models that you can download, run and own. The long‑awaited gpt‑oss‑20B and gpt‑oss‑120B are released under the Apache 2.0 license. You get the state‑of‑the‑art AI directly in your hands. 

This release is a turning point for AI infrastructure. Organisations can now deploy, fine‑tune and scale frontier‑level models on‑premises or in private clouds, with full control over performance, cost and data privacy.

In this blog, we discuss GPT-OSS and show how Hyperstack makes running these open models effortless at enterprise scale.

What is GPT‑OSS

OpenAI recently released gpt‑oss‑120B and gpt‑oss‑20B, its first open‑weight language models since GPT‑2 in 2019. These are distributed under the Apache 2.0 license, meaning anyone can download, inspect, deploy, fine‑tune or redistribute the models freely for commercial or research use. These models give organisations the power to run state‑of‑the‑art AI models without relying on a proprietary API.

GPT‑OSS Models

GPT‑OSS models are Transformers leveraging Mixture‑of‑Experts (MoE) to optimise efficiency. Instead of activating all parameters for every token, MoE enables the model to use only a fraction of the parameters, reducing memory and computation costs while preserving performance.

GPT‑OSS is available in two sizes:

  1. gpt‑oss‑20B (21 B parameters)
  2. gpt‑oss‑120B (117 B parameters)

GPT‑OSS Performance

OpenAI benchmarked the models across reasoning, coding, health and competition mathematics, comparing them against their proprietary models. 

  • gpt‑oss‑120B: Matches or exceeds o4‑mini on reasoning, coding, and health (HealthBench). The 120B model can run on a single H100 80 GB GPU for inference at enterprise‑scale use without multi‑GPU complexity
  • gpt‑oss‑20B: Performs on par with o3‑mini, sometimes exceeding it in competition mathematics. The 20B model can run on consumer-grade hardware with 16GB.

Running GPT‑OSS on Hyperstack

While GPT‑OSS is open and flexible, it still requires high‑performance compute to run efficiently. 

1. Open Models, Hosted Your Way

OpenAI’s release of gpt‑oss‑20B and 120B under Apache 2.0 licensing gives you complete freedom to self‑host:

  • Run the models independently on your own infrastructure or cloud environment.
  • Avoid vendor lock‑in. There are no forced proprietary APIs or usage caps.

Hyperstack is the first European‑owned GPU cloud to enable this at enterprise scale. We provide on‑demand access to NVIDIA H100 GPUs, high-speed networking and ultra‑fast NVMe storage, ensuring your GPT‑OSS workloads run smoothly and efficiently.

2. Deploy Across Regions  

Deployment choice is the most important aspect for organisations handling sensitive or regulated data. With Hyperstack, you can:

  • Run GPT‑OSS models in Europe, the US or Canada.
  • Ensure data processing stays in the region you choose, without routing through US‑only API endpoints.

This means compliance, latency control and data sovereignty are built into your AI workflow, giving you complete confidence in where your models and data reside.

3. No Lock‑In, No Black Box

Unlike SaaS‑wrapped AI platforms, GPT‑OSS models are fully open-source:

  • Download the weights, configure your own inference environment and own the entire stack.
  • Hyperstack provides a high‑performance cloud environment to build with these models.

Why Choose Hyperstack to Run GPT-OSS

Thinking of trying the latest GPT‑OSS models? Here’s why Hyperstack is your ideal platform to run them at scale.

Enterprise‑Grade Performance

Hyperstack offers on-demand access to high-performance GPUs to run the latest GPT-OSS-20B and 120B models. You can easily run the 20B model on smaller GPUs, ideal for frequent inference, local fine‑tuning and edge‑ready workloads.

For the larger model, you can deploy the 120B model on H100 GPUs for high‑throughput reasoning and long‑context applications. The H100 GPUs on Hyperstack support high-speed networking of up to 350Gbps. This ensures fast data transfer and minimal bottlenecks for large‑scale inference.

2. European Data Compliance

Organisations can run GPT‑OSS entirely within Europe, keeping their data sovereign and compliant. We are also SOC 2 Type 1 certified, so this ensures your workloads meet enterprise‑grade security and operational standards.

3. Scale Without Worry

AI workloads can be spiky and unpredictable but Hyperstack helps you save costs without compromising performance:

  • Per‑minute billing means you only pay for actual usage.
  • Hibernation options let you pause instances without losing progress, preventing wasted GPU hours.

4. Future‑Proof Infrastructure

Hyperstack supports the full AI lifecycle:

  • Training of smaller open‑weight models.
  • Fine‑tuning GPT‑OSS models on your datasets.
  • Inference with high‑performance, low‑latency GPU clusters.

Run GPT‑OSS Your Way

Spin up RTX A6000 and NVIDIA H100 GPUs to run your choice of gpt‑oss‑20B or gpt‑oss‑120B with ease. Deploy in minutes and keep your data fully in your control with Hyperstack’s enterprise‑grade GPU cloud.

FAQs

What is GPT‑OSS?

GPT‑OSS is OpenAI’s new open‑weight language model series, freely downloadable under Apache 2.0, allowing full control over hosting and usage.

What models are available under GPT‑OSS?

Two models are available: gpt‑oss‑20B with 21 B parameters and gpt‑oss‑120B with 117 B parameters for advanced AI tasks.

Which GPU is best for gpt‑oss‑120B?

Gpt‑oss‑120B requires high‑performance NVIDIA H100 GPUs, delivering high throughput for enterprise‑scale reasoning, long‑context inference and model fine‑tuning.

What is the price of the NVIDIA H100 on Hyperstack?

Hyperstack offers NVIDIA H100 GPUs for $1.90 per hour on-demand with per‑minute billing and hibernation cost‑saving options.