<img alt="" src="https://secure.insightful-enterprise-intelligence.com/783141.png" style="display:none;">

NVIDIA H100 SXMs On-Demand at $2.40/hour - Reserve from just $1.90/hour. Reserve here

Deploy 8 to 16,384 NVIDIA H100 SXM GPUs on the AI Supercloud. Learn More

alert

We’ve been made aware of a fraudulent website impersonating Hyperstack at hyperstack.my.
This domain is not affiliated with Hyperstack or NexGen Cloud.

If you’ve been approached or interacted with this site, please contact our team immediately at support@hyperstack.cloud.

close
|

Published on 6 Aug 2025

How to Deploy OpenAI’s GPT-OSS-20B Model: A Step-by-Step Guide

TABLE OF CONTENTS

updated

Updated: 7 Aug 2025

NVIDIA A100 GPUs On-Demand

Sign up/Login
summary
In our latest tutorial, we explored GPT‑OSS‑20B, OpenAI’s powerful open‑weight language model designed for efficient local deployment and advanced reasoning. The tutorial will help you deploy the latest GPT-OSS 20 B model on Hyperstack.

What is GPT‑OSS‑20B?

GPT‑OSS‑20B is OpenAI’s newly released open‑weight language model family. This release marks the first open‑weight model from OpenAI since GPT‑2 in 2019.

The model contains approximately 20 billion parameters and uses a mixture‑of‑experts (MoE) architecture. This design allows it to activate only a portion of its parameters for each token, delivering strong performance while keeping compute requirements manageably.

The best part about the GPT-OSS-20 B is that it can be deployed on hardware with 16 GB of GPU memory. You can easily run the GPT-OSS 20 B on the NVIDIA H100 GPU. 

Key Features of GPT‑OSS‑20B

Here are the key features of GPT-OSS 20B mode:

  • Efficient Mixture‑of‑Experts Architecture

    GPT‑OSS‑20B uses a 21B‑parameter Transformer with a mixture‑of‑experts (MoE) design, activating only 3.6B parameters per token. This allows the model to deliver high reasoning performance while staying lightweight enough for consumer GPUs with ~16 GB VRAM.

  • Large 128k Context Window

    It supports up to 128,000 tokens of context, making it suitable for long‑document understanding, multi‑step reasoning and agentic workflows without frequent truncation or context loss.

  • Strong Reasoning and Tool Use

    GPT‑OSS‑20B matches or exceeds OpenAI o3‑mini on benchmarks like AIME (competition mathematics), MMLU and HealthBench. It also shows strong chain‑of‑thought reasoning, few‑shot function calling and tool usage such as Python execution or web search.

  • Optimised for Local and Edge Deployment

    Designed for on‑device inference, GPT‑OSS‑20B can run on edge devices, local servers or consumer GPUs for cost‑effective private deployments and rapid iteration without heavy cloud infrastructure.

  • Open‑Weight, Fully Customisable

    The model is released under the Apache 2.0 license, so anyone can fine‑tune, modify and deploy GPT‑OSS‑20B freely. It also exposes full chain‑of‑thought outputs and supports structured outputs for integration into agentic and production workflows.

If you’re planning to try the latest GPT‑OSS 20 B model, you’re in the right place. Check out our guide below to get started.

Steps to Deploy GPT-OSS-20B

Now, let's walk through the step-by-step process of deploying GPT-OSS-20B on Hyperstack.

Step 1: Accessing Hyperstack

  1. Go to the Hyperstack website and log in to your account.
  2. If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
  3. Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

  1. Look for the "Deploy New Virtual Machine" button on the dashboard.
  2. Click it to start the deployment process.

Select Hardware Configuration

  1. For GPT-OSS-20B GPU requirements, go to the hardware options and choose the "1xNVIDIA H100 PCIe" flavour. 

Choose the Operating System

  1. Select the "Ubuntu Server 24.04 LTS R570 CUDA 12.8 with Docker". 
image-png-Aug-06-2025-01-40-37-3526-PM

Select a keypair

  1. Select one of the keypairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

  1. Ensure you assign a Public IP to your Virtual machine [See the attached screenshot].
  2. This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

  1. Make sure to enable an SSH connection.
  2. You'll need this to securely connect and manage your VM.
Review and Deploy
  1. Double-check all your settings.
  2. Click the "Deploy" button to launch your virtual machine.

Step 3: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

  1. In the Hyperstack dashboard, find your VM's details.
  2. Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

  1. Open a terminal on your local machine.
  2. Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
  3. Replace username and ip_address with the details provided by Hyperstack.

Step 4: Setting up GPT-OSS-20B with Open WebUI

  1. To set up GPT-OSS-20B, SSH into your machine. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use the script below to set up GPT-OSS-20B with OpenWebUI.
  2. Execute the command below to launch open-webui on port 3000.

# Create a docker network
docker network create ollama-net

# Start ollama runtime
sudo docker run -d --gpus=all --network ollama-net -p 11434:11434 -v /home/ubuntu/ollama:/root/.ollama --name ollama --restart always ollama/ollama:latest

# Pull the model
sudo docker exec -it ollama ollama pull gpt-oss:20b

# Start open-webui with this runtime
sudo docker run -d --network ollama-net -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --restart always -e OLLAMA_BASE_URL=http://ollama:11434 ghcr.io/open-webui/open-webui:main

The above script will download and host gpt-oss-20b. See the model card here: https://huggingface.co/openai/gpt-oss-20b for more information.

Interacting with GPT-OSS-20B

  1. Open your VM's firewall settings.

  2. Allow port 3000 for your IP address (or leave it open to all IPs, though this is less secure and not recommended). For instructions, see here.

  3. Visit http://[public-ip]:3000 in your browser. For example: http://198.145.126.7:3000

  4. Set up an admin account for OpenWebUI and save your username and password for future logins. See the attached screenshot.

Frame 1-1

And voila, you can start talking to your self-hosted GPT-OSS-20B! See an example below.

image-Aug-06-2025-01-56-19-6384-PM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

  1. In the Hyperstack dashboard, locate your Virtual machine.
  2. Look for a "Hibernate" option.
  3. Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.

Why Deploy GPT-OSS-20B on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying GPT-OSS-20B:

  • Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA H100 on-demand, specifically designed to handle large language models. 
  • Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform. 
  • Scalability: You can easily scale your resources up or down based on your computational needs.
  • Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing
  • Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

FAQs

What is GPT‑OSS‑20B?

GPT‑OSS‑20B is OpenAI’s 21B‑parameter open‑weight language model offering strong reasoning, local deployment and full customisation under Apache 2.0.

What is the context length of GPT‑OSS‑20B?

It supports up to 128,000 tokens, ideal for long‑document analysis, multi‑step reasoning and agentic workflows requiring extended memory.

How does GPT‑OSS‑20B compare to GPT‑OSS‑120B?

GPT‑OSS‑20B is smaller, lighter, and edge‑friendly, while GPT‑OSS‑120B offers higher performance for enterprise‑level reasoning and large‑scale deployments.

Does GPT‑OSS‑20B support chain‑of‑thought reasoning?

Yes, it supports configurable chain‑of‑thought reasoning with low, medium, and high effort modes for faster or deeper analysis.

Is GPT‑OSS‑20B suitable for tool use and function calling?

Absolutely, GPT‑OSS‑20B excels at few‑shot function calling, tool use like Python execution, and structured output for automation workflows.

What license is GPT‑OSS‑20B released under?

It’s released under Apache 2.0, allowing commercial use, modification, redistribution, and fine‑tuning without vendor lock‑in.

What are ideal use cases for GPT‑OSS‑20B?

It’s perfect for local AI applications, long‑document reasoning, coding tasks, private inference, and cost‑effective on‑device deployments.

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Sign up now
Talk to an expert

Share On Social Media

12 Jun 2025

Trying Flux but your outputs are more vague than ever? You’re not alone. Flux 1.1 Pro is ...

4 Jun 2025

The latest DeepSeek-R1 update is making waves across social media with everyone eager to ...