Updated on 15 Oct 2025

Deploying Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack

TABLE OF CONTENTS

NVIDIA A100 GPUs On-Demand

In our latest tutorial, we deploy Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack, showcasing its improved reasoning, multilingual support and long-context handling. Learn how to set up, run and optimise the model on Hyperstack’s high-performance infrastructure.

What is Qwen3-VL-30B-A3B-Instruct-FP8?

Qwen3-VL-30B-A3B-Instruct-FP8 is a fine-tuned, FP8-quantised version of the Qwen3-VL-30B-A3B-Instruct model from the Qwen3 series. This vision-language model is designed to process and generate text, images, and videos, excelling in tasks that require reasoning across multiple modalities. The FP8 quantisation reduces memory usage and computational requirements, making it suitable for deployment on edge devices as well as cloud environments.

Features of Qwen3-VL-30B-A3B-Instruct-FP8

Here are the key features of Qwen3-VL-30B-A3B-Instruct-FP8:

Visual Agent & Coding Boost: Operates PC/mobile GUIs and generates Draw.io, HTML, CSS, and JS from images or videos.
Advanced Spatial & Video Understanding: Judges object positions, viewpoints, occlusions, and handles long videos with full recall and second-level indexing.
Enhanced Multimodal Reasoning: Excels in STEM/math tasks, providing logical, evidence-based answers and causal analysis.
Upgraded Visual Recognition & OCR: Recognises a wide range of visuals and supports OCR in 32 languages, including rare characters and complex documents.
Extended Text-Vision Comprehension: Seamless text–vision fusion ensures unified understanding across modalities, on par with pure LLMs.

How to Deploy Qwen3-VL-30B-A3B-Instruct-FP8

Now, let's walk through the step-by-step process of deploying Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack.

Step 1: Accessing Hyperstack

Go to the Hyperstack website and log in to your account.
If you're new to Hyperstack, you'll need to create an account and set up your billing information. Check our documentation to get started with Hyperstack.
Once logged in, you'll be greeted by the Hyperstack dashboard, which provides an overview of your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

Look for the "Deploy New Virtual Machine" button on the dashboard.
Click it to start the deployment process.

Select Hardware Configuration

In the hardware options, choose the "1xH100 PCIe" flavour.

Choose the Operating System

Select the "Ubuntu Server 24.04 LTS R570 CUDA 12.8 with Docker".

Select a keypair

Select one of the key pairs in your account. Don't have a keypair yet? See our Getting Started tutorial for creating one.

Network Configuration

Ensure you assign a Public IP to your Virtual machine.
This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

Make sure to enable an SSH connection.
You'll need this to securely connect and manage your VM.

Review and Deploy

Double-check all your settings.
Click the "Deploy" button to launch your virtual machine.

Step 3: Accessing Your VM

Once the initialisation is complete, you can access your VM:

Locate SSH Details

In the Hyperstack dashboard, find your VM's details.
Look for the public IP address, which you will need to connect to your VM with SSH.

Connect via SSH

Open a terminal on your local machine.
Use the command ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address] (e.g: ssh -i /users/username/downloads/keypair_hyperstack ubuntu@0.0.0.0.0)
Replace username and ip_address with the details provided by Hyperstack.

Step 4: Setting up Qwen3-VL-30B-A3B-Instruct-FP8 with Open WebUI

To access and experiment with Meta's latest model, SSH into your machine after completing the setup. If you are having trouble connecting with SSH, watch our recent platform tour video (at 4:08) for a demo. Once connected, use this API call on your machine to start using the Qwen3-VL-30B-A3B-Instruct-FP8:

# 1) Create a docker network
docker network create qwen-net

# 2) Ensure Hugging Face cache directory exists (shared with the container)
sudo mkdir -p /ephemeral/hug && sudo chmod 0777 /ephemeral/hug

# 3) Start vLLM
sudo docker run -d --gpus=all --network qwen-net --ipc=host -p 8000:8000 -v /ephemeral/hug:/hug:rw --name vllm --restart always -e HF_HOME=/hug vllm/vllm-openai:nightly --model Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 --host 0.0.0.0 --port 8000 --async-scheduling --gpu-memory-utilization=0.95

# 4) Start Open WebUI (points to vLLM's API)
sudo docker run -d --network qwen-net -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --restart always -e OPENAI_API_BASE_URL=http://vllm:8000/v1 ghcr.io/open-webui/open-webui:main

If the API is not working after ~10 minutes, please refer to our 'Troubleshooting Qwen3-VL-30B-A3B-Instruct-FP8' section below.

Interacting with Qwen3-VL-30B-A3B-Instruct-FP8

Open your VM's firewall settings.
Allow port 3000 for your IP address (or leave it open to all IPs, though this is less secure and not recommended). For instructions, see here.
Visit http://[public-ip]:3000 in your browser. For example: http://198.145.126.7:3000
Set up an admin account for OpenWebUI and save your username and password for future logins. See the attached screenshot.

Frame 1-1

And voila, you can start talking to your self-hosted Qwen3-VL-30B-A3B-Instruct-FP8! See an example below.

image (12)

Troubleshooting Qwen3-VL-30B-A3B-Instruct-FP8

Step 5: Hibernating Your VM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

In the Hyperstack dashboard, locate your Virtual machine.
Look for a "Hibernate" option.
Click to hibernate the VM, which will stop billing for compute resources while preserving your setup.

To continue your work without repeating the setup process:

Return to the Hyperstack dashboard and find your hibernated VM.
Select the "Resume" or "Start" option.
Wait a few moments for the VM to become active.
Reconnect via SSH using the same credentials as before.

Why Deploy on Hyperstack?

Hyperstack is a cloud platform designed to accelerate AI and machine learning workloads. Here's why it's an excellent choice for deploying Qwen3-VL-30B-A3B-Instruct-FP8:

Availability: Hyperstack provides access to the latest and most powerful GPUs such as the NVIDIA A100 and the NVIDIA H100 SXM on-demand, specifically designed to handle large language models.
Ease of Deployment: With pre-configured environments and one-click deployments, setting up complex AI models becomes significantly simpler on our platform.
Scalability: You can easily scale your resources up or down based on your computational needs.
Cost-Effectiveness: You pay only for the resources you use with our cost-effective cloud GPU pricing.
Integration Capabilities: Hyperstack provides easy integration with popular AI frameworks and tools.

New to Hyperstack? Sign up on Hyperstack Today to Get Started.

FAQs

What is Qwen3-VL-30B-A3B-Instruct-FP8?

It is a fine-tuned, FP8-quantised vision-language model from the Qwen3 series that can process and generate text, images, and videos, delivering advanced multimodal understanding and reasoning capabilities.

What tasks can Qwen3-VL-30B-A3B-Instruct-FP8 handle?

The model can perform GUI automation, visual-to-code generation, spatial reasoning, long-form document and video comprehension, OCR in multiple languages, STEM/math problem-solving and multimodal content generation.

What are the new features in this version?

Visual Agent & Coding Boost
Advanced Spatial & Video Understanding
Enhanced Multimodal Reasoning
Upgraded Visual Recognition & OCR
Extended Text-Vision Comprehension

How long is the model’s context window?

It has a native 256K token context, expandable up to 1 million tokens, allowing it to process books, long documents, and hours-long videos with full recall.

Which types of inputs does it support?

The model supports text, images, and videos. It can also handle multi-modal inputs simultaneously for tasks like visual reasoning, diagram generation, and video analysis.

Which GPU is recommended for running Qwen3-VL-30B-A3B-Instruct-FP8?

For optimal performance, we recommend using the NVIDIA H100 PCIe GPU VM on Hyperstack.

Innovation, AI, Machine Learning, LLM, High-Performance Computing (HPC), H100

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Talk to an expert

Share On Social Media

link

Train Your Own ChatGPT Model for $75 with Nanochat on ...

24 Oct 2025

Want to build your own ChatGPT-like model? Sounds fascinating until you start worrying ...

link

Deploy Qwen3-Next-80B-A3B on Hyperstack: A Step-by-Step ...

15 Sep 2025

What is Qwen3-Next-80B-A3B? Qwen3-Next-80B-A3B is one of the latest models in the ...

Deploying Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack

NVIDIA A100 GPUs On-Demand

What is Qwen3-VL-30B-A3B-Instruct-FP8?

How to Deploy Qwen3-VL-30B-A3B-Instruct-FP8

Step 1: Accessing Hyperstack

Step 2: Deploying a New Virtual Machine

Step 3: Accessing Your VM

Step 4: Setting up Qwen3-VL-30B-A3B-Instruct-FP8 with Open WebUI

Interacting with Qwen3-VL-30B-A3B-Instruct-FP8

Troubleshooting Qwen3-VL-30B-A3B-Instruct-FP8

Step 5: Hibernating Your VM

New to Hyperstack? Sign up on Hyperstack Today to Get Started.

FAQs

What is Qwen3-VL-30B-A3B-Instruct-FP8?

What tasks can Qwen3-VL-30B-A3B-Instruct-FP8 handle?

What are the new features in this version?

How long is the model’s context window?

Which types of inputs does it support?

Which GPU is recommended for running Qwen3-VL-30B-A3B-Instruct-FP8?

Subscribe to Hyperstack!

Get Started

Train Your Own ChatGPT Model for $75 with Nanochat on ...

Deploy Qwen3-Next-80B-A3B on Hyperstack: A Step-by-Step ...

United Kingdom (Head office)

Registered Office

Spain

Solutions

Site map

Products

Legal

Deploying Qwen3-VL-30B-A3B-Instruct-FP8 on Hyperstack

NVIDIA A100 GPUs On-Demand

What is Qwen3-VL-30B-A3B-Instruct-FP8?

How to Deploy Qwen3-VL-30B-A3B-Instruct-FP8

Step 1: Accessing Hyperstack

Step 2: Deploying a New Virtual Machine

Step 3: Accessing Your VM

Step 4: Setting up Qwen3-VL-30B-A3B-Instruct-FP8 with Open WebUI

Interacting with Qwen3-VL-30B-A3B-Instruct-FP8

Troubleshooting Qwen3-VL-30B-A3B-Instruct-FP8

Step 5: Hibernating Your VM

New to Hyperstack? Sign up on Hyperstack Today to Get Started.

FAQs

What is Qwen3-VL-30B-A3B-Instruct-FP8?

What tasks can Qwen3-VL-30B-A3B-Instruct-FP8 handle?

What are the new features in this version?

How long is the model’s context window?

Which types of inputs does it support?

Which GPU is recommended for running Qwen3-VL-30B-A3B-Instruct-FP8?

Subscribe to Hyperstack!

Get Started

Related Post

Train Your Own ChatGPT Model for $75 with Nanochat on ...

Deploy Qwen3-Next-80B-A3B on Hyperstack: A Step-by-Step ...

United Kingdom (Head office)

Registered Office

Spain

Solutions

Site map

Products

Legal