TABLE OF CONTENTS
NVIDIA H100 SXM On-Demand
Welcome to Hyperstack Weekly Rundown
This week on Hyperstack, we are introducing a new way to interact with infrastructure. With the launch of the Hyperstack MCP Server, you can now manage cloud resources using natural language. We’re also sharing exciting tutorials on running Qwen 3.5 on Hyperstack to a benchmark exploring how KV cache compression impacts inference performance.
Take a few minutes to catch up on what’s new and what you can try next on Hyperstack!
New on Hyperstack
Check out what we released on Hyperstack this week:
Hyperstack MCP Server
We’re excited to introduce the Hyperstack MCP (Model Context Protocol) Server. It is a new way to manage your infrastructure using natural language. With MCP support, users can interact with their cloud resources using natural language through compatible AI clients like Claude Desktop and Open WebUI.
The MCP Server translates plain-English instructions into secure, authenticated Hyperstack API actions. This means you can create, manage and monitor infrastructure without manually writing API calls.
Once connected, you can perform tasks such as:
- Creating and managing Virtual Machines
- Provisioning and scaling Kubernetes clusters
- Creating and attaching storage volumes
- Retrieving billing and usage information
- Managing environments
- Executing multi-step infrastructure workflows
Quickstart: Follow our quickstart guide using Open WebUI here, a simple browser-based AI client that lets you quickly connect to the Hyperstack MCP Server and start managing infrastructure through natural language.
New on our Blog
Check out the latest blogs on Hyperstack:
Optimising Long-Context LLMs:
With KVPress Compression on Hyperstack
KV cache size is becoming a major bottleneck for LLM inference speed and memory. In this benchmark on Hyperstack’s H100 infrastructure, we compare KnormPress and NVIDIA’s DMS to see how much KV cache can be compressed without impacting reasoning performance on the Qwen-3-8B model.
How to Deploy Qwen 3.5 on Hyperstack:
A Step-by-Step Guide
Qwen 3.5 is a powerful open-weight AI model built for advanced assistant workflows across text, code, images, and video. In this guide, we show how to run Qwen 3.5 on Hyperstack infrastructure to get high-performance inference for large, multimodal workloads.
UI, API and Now MCP:
A New Way to Interact with Your GPU Cloud
For decades, we’ve interacted with software through UIs or APIs. But Model Context Protocol (MCP) introduces a third way, letting AI clients interact with systems through natural language. This blog explains what MCP is, how it works and why it’s changing the way we build and use software.
Help Shape the Future of Hyperstack
Great products are built with the people who use them. If there’s something you would like to see on Hyperstack whether it is a new feature, workflow improvement or integration that would make your work easier, we would love to hear about it.
Your feedback helps us prioritise what matters most and build a platform that works better for the community.
That's it for this week's Hyperstack Rundown! Stay tuned for more updates next week and subscribe to our newsletter below for exclusive AI and GPU insights delivered to your inbox!
Missed the Previous Editions?
Catch up on everything you need to know from Hyperstack updates below:
Subscribe to Hyperstack!
Enter your email to get updates to your inbox every week
Get Started
Ready to build the next big thing in AI?


