What is the best GPU for 3d modelling?

We recommend using the NVIDIA A6000 for 3D Modelling.

What are some challenges in Generative AI for 3D Models?

Some challenges in Generative AI for 3D Models include: Quality and Detail: Generative AI struggles to meet the precision demands of professional applications, especially in technical fields like engineering. Lack of Creativity: AI lacks the intuitive aesthetic understanding of human designers, leading to designs lacking creative flair. Data Dependency: The output quality depends heavily on the quantity and quality of training data, hindering diversity and quality with limited or biased datasets. Computational Resources: Resource-intensive requirements, including advanced GPUs and substantial memory, limit accessibility, particularly for individuals or small organisations. Ethical Concerns: Issues related to originality and copyright arise when using generative AI, particularly if training data includes copyrighted designs.

Can AI generate 3d models?

Generative AI for 3D models involves training models, often based on deep neural networks, to learn patterns from existing 3D data. These models use various techniques like autoencoders or generative adversarial networks to generate novel and realistic 3D shapes.

What are the applications of Generative AI for 3D Models?

Generative AI finds applications in diverse fields like entertainment, design, and manufacturing. It helps in creating lifelike virtual environments, designing unique objects or characters, and optimising complex engineering designs. It also facilitates data augmentation for training datasets, enhancing the capabilities of 3D content creation and simulation.

What are some real-world applications of generative AI in 3D modelling?

Generative AI powers 3D asset creation for gaming, animation, AR/VR worlds, product design, architecture, digital twins, simulation, robotics, and manufacturing—reducing manual modelling time and improving creativity and iteration speed.

Which model architectures should I consider for 3D generative tasks?

Popular architectures include diffusion models, neural radiance fields (NeRFs), transformers, 3D VAEs, GANs, and point-cloud/mesh-based models like Point-E, Shap-E, and 3D-aware diffusion networks. Choose based on dataset and output format.

What are the major steps to train a generative AI model for 3D models?

Collect and clean 3D datasets, preprocess meshes/point-clouds, choose a model architecture, train with compute acceleration, fine-tune for quality, evaluate outputs, and deploy with inference pipelines for rapid generation.

How long does it take to train a generative AI model?

Training time varies based on dataset size, architecture, and GPU power. Small models train in hours; advanced 3D diffusion or NeRF-based models may take days or weeks for high-quality results.

Damanpreet Kaur Vohra

Updated on 14 May 2026

How to Train Generative AI for 3D Models: A Comprehensive Guide

TABLE OF CONTENTS

NVIDIA H100 SXM On-Demand

Key Takeaways

• Training generative AI for 3D models starts with collecting and preprocessing high-quality 3D data such as meshes, point clouds, or multi-view images.
• Different model architectures are used depending on representation, including voxel-based, point-based, and neural implicit methods.
• Large GPU memory and compute resources are required to handle complex 3D structures and high-resolution outputs.
• Data augmentation and normalisation are critical for improving model generalisation and stability.
• Training workflows often combine 2D supervision with 3D consistency constraints to improve realism.
• Evaluation focuses on geometry accuracy, visual quality, and consistency across multiple viewpoints.

Training generative AI for 3D models has changed dramatically in the past two years. GANs and VAEs, once the standard approach, have largely been superseded by diffusion models, Neural Radiance Fields (NeRF), and 3D Gaussian Splatting. This tutorial reflects that shift. You will find working environment setup commands, a current architecture comparison and a complete walkthrough of provisioning a Hyperstack GPU VM and running a real training script using Nerfstudio.

By the end, you will have a functioning training pipeline running on Hyperstack.

Prerequisites

Before starting, make sure you have the following:

Go to the Hyperstack website and log in.
If you are new, create an account and set up your billing information. Our documentation can guide you through the initial setup.
An SSH keypair added to your Hyperstack profile
Basic familiarity with Python and the Linux command line
A Hugging Face account (for accessing pretrained checkpoints)
Your training data: images, a video, or an existing 3D dataset (ShapeNet, Objaverse, or your own captures)

Tools and libraries used in this tutorial:

Nerfstudio: modular NeRF and Gaussian Splatting training framework
3D Gaussian Splatting (Kerbl et al.): fast, high-quality scene reconstruction
Shap-E (OpenAI): text/image-to-3D diffusion model
PyTorch 2.x with CUDA 12.2
COLMAP: for structure-from-motion preprocessing

Choosing the Right Architecture for 3D Generation

The right architecture depends on your input data, output format and quality requirements. Here is a current comparison of the main approaches:

Architecture	Best for	Output format	GPU requirement	Training time
NeRF (Neural Radiance Field)	Photorealistic scene reconstruction from images/video	Implicit volumetric representation	1-4x A100 / H100	Minutes to hours (Instant-NGP, Nerfacto)
3D Gaussian Splatting	Real-time rendering, fast scene reconstruction	Point cloud with Gaussian splats	1-2x A100 / H100	20-45 minutes per scene
Diffusion-based (Shap-E, Zero123, One-2-3-45)	Text-to-3D or image-to-3D generation	Mesh, point cloud, NeRF	2-8x A100 / H100	Hours to days for full training
3D VAE / Point-E	Fast low-res prototyping from text	Point cloud	1x A100	Minutes for inference; days for training
GAN (GET3D, EG3D)	Category-specific object generation (cars, chairs)	Mesh + texture	4-8x A100	Days to weeks

Recommendation for most use cases: Start with 3D Gaussian Splatting via Nerfstudio if you have image/video input and need fast, high-quality results. Use Shap-E or Zero123++ for text-to-3D workflows. Reserve GAN-based methods for category-specific generation tasks where you have a large, labelled dataset.

Step 1: Provision Your Hyperstack VM

Recommended configuration for 3D generative AI training:

GPU: NVIDIA A100-80GB for training; NVIDIA A6000 for inference and prototyping
OS image: Ubuntu 22.04 LTS with CUDA 12.2 and Docker
Storage: At a minimum of 200 GB on the ephemeral disk for datasets and checkpoints
Networking: Assign a public IP and open port 22 (SSH) and port 7007 (Nerfstudio viewer)

Once deployed, connect via SSH:

ssh -i /path/to/your/key ubuntu@YOUR_VM_PUBLIC_IP

Step 2: Set Up the Environment

Update the system and verify the GPU is visible:

sudo apt-get update && sudo apt-get upgrade -y
nvidia-smi

You should see your GPU listed with the driver and CUDA version. Now, create a Python virtual environment and install PyTorch:

python3 -m venv ~/3d-train-env
source ~/3d-train-env/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python -c "import torch; print(torch.cuda.get_device_name(0))"

Install system dependencies required by Nerfstudio and COLMAP:

sudo apt-get install -y git cmake build-essential libboost-all-dev \
  libfreeimage-dev libgflags-dev libgoogle-glog-dev \
  libsuitesparse-dev colmap ffmpeg

Step 3: Install Nerfstudio

Nerfstudio is the recommended framework for NeRF and Gaussian Splatting training. It provides a unified interface for multiple 3D representation methods and includes a real-time web viewer.

pip install nerfstudio
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
ns-train --help

If you see the Nerfstudio CLI help output, the installation is successful.

Step 4: Prepare Your Dataset

Nerfstudio expects a set of images with known camera poses. If you have a video or a collection of photos of an object or scene, the ns-process-data command handles COLMAP preprocessing automatically.

Option A: From a video file:

ns-process-data video \
  --data /path/to/your/video.mp4 \
  --output-dir /ephemeral/dataset/my-scene

Option B: From a folder of images:

ns-process-data images \
  --data /path/to/images/ \
  --output-dir /ephemeral/dataset/my-scene

This runs COLMAP structure-from-motion under the hood, estimating camera intrinsics and extrinsics for every frame. Expect this to take 5-20 minutes, depending on image count and resolution.

Option C: Use an existing dataset (ShapeNet / Objaverse):

pip install objaverse
python -c "
import objaverse
uids = objaverse.load_uids()
objects = objaverse.load_objects(uids[:100])
print('Downloaded', len(objects), 'objects')

Step 5: Train a 3D Gaussian Splatting Model

3D Gaussian Splatting is currently the fastest method for producing photorealistic scene reconstructions. It trains in under an hour on a single A100 and produces real-time renderable outputs.

ns-train splatfacto \
  --data /ephemeral/dataset/my-scene \
  --output-dir /ephemeral/outputs/my-scene-gaussian \
  --max-num-iterations 30000 \
  --pipeline.model.num-downscales 0 \
  --viewer.no-enable-viewer

To monitor training loss in real time:

tail -f /ephemeral/outputs/my-scene-gaussian/splatfacto/*/training.log

Checkpoints are saved automatically every 2,000 iterations. When training completes, render output frames:

ns-render camera-path \
  --load-config /ephemeral/outputs/my-scene-gaussian/splatfacto/*/config.yml \
  --output-path /ephemeral/outputs/render.mp4

Step 6: Train a NeRF Model (Nerfacto)

If you need an implicit volumetric representation rather than splats, for example, for downstream editing or novel view synthesis with fine detail, use Nerfacto, Nerfstudio's default high-quality NeRF method:

ns-train nerfacto \
  --data /ephemeral/dataset/my-scene \
  --output-dir /ephemeral/outputs/my-scene-nerf \
  --max-num-iterations 50000 \
  --pipeline.model.disable-scene-contraction True \
  --viewer.no-enable-viewer

Nerfacto trains in roughly 20-40 minutes on an A100 for a typical indoor or object-centric scene.

Step 7: Text-to-3D with Shap-E

If your use case is generating 3D assets from text prompts rather than reconstructing from images, use OpenAI's Shap-E diffusion model:

pip install git+https://github.com/openai/shap-e.git

python <<'EOF'
import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import decode_latent_mesh
import trimesh

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

prompt = 'a wooden chair with four legs'
latents = sample_latents(
    batch_size=1, model=model, diffusion=diffusion,
    guidance_scale=15.0,
    model_kwargs=dict(texts=[prompt]),
    progress=True, clip_denoised=True, use_fp16=True,
    use_karras=True, karras_steps=64,
    sigma_min=1e-3, sigma_max=160, s_churn=0,
)

for i, latent in enumerate(latents):
    mesh = decode_latent_mesh(xm, latent).tri_mesh()
    t = trimesh.Trimesh(vertices=mesh.verts, faces=mesh.faces)
    t.export(f'/ephemeral/outputs/output_{i}.obj')
    print(f'Saved output_{i}.obj')
EOF

The generated .obj files can be downloaded from the VM and opened directly in Blender, Maya or any standard 3D software.

Step 8: Evaluate Your Model

Evaluate reconstruction quality using standard metrics:

ns-eval \
  --load-config /ephemeral/outputs/my-scene-nerf/nerfacto/*/config.yml \
  --output-path /ephemeral/outputs/eval_results.json
cat /ephemeral/outputs/eval_results.json

Key metrics to target:

PSNR > 28 dB: acceptable reconstruction quality for most applications
PSNR > 32 dB: high-quality reconstruction
SSIM > 0.85: strong structural similarity to ground truth
LPIPS < 0.15: perceptually close to reference images

If metrics are below target, the most common fixes are: more training iterations, more input images with better coverage, or switching from Nerfacto to Gaussian Splatting for scenes with strong view-dependent effects.

Step 9: Export and Download Your Model

Export the trained model to a standard format for use in downstream tools:

ns-export gaussian-splat \
  --load-config /ephemeral/outputs/my-scene-gaussian/splatfacto/*/config.yml \
  --output-dir /ephemeral/exports/splat

ns-export marching-cubes \
  --load-config /ephemeral/outputs/my-scene-nerf/nerfacto/*/config.yml \
  --output-dir /ephemeral/exports/mesh \
  --resolution 1024

Download the exports to your local machine:

scp -i /path/to/your/key -r ubuntu@YOUR_VM_PUBLIC_IP:/ephemeral/exports/ ./local-exports/

Step 10: Hibernate Your VM When Not Training

3D training jobs can be long, but you should not leave the VM running between sessions. In the Hyperstack dashboard, use Hibernate to pause compute billing while keeping your disk state intact. When you resume, your environment, datasets and checkpoints will all be exactly as you left them.

Current Limitations to Be Aware Of

Reconstruction vs. generation: NeRF and Gaussian Splatting reconstruct scenes from images -- they do not generate novel objects from scratch. For pure generation from text or image prompts, use diffusion-based methods like Shap-E or Zero123++.
Geometric precision: Diffusion-based 3D generation still produces meshes with artefacts, thin surfaces, and topology errors that require manual cleanup before use in engineering or manufacturing workflows.
Data coverage: NeRF and Gaussian Splatting quality degrade sharply with sparse image coverage. Aim for 100+ images with at least 60% overlap between adjacent views for reliable reconstruction.
Generalisation: Models trained on a single scene do not generalise. For a generalisable model across object categories, you need category-level training on large datasets like Objaverse, which requires significantly more compute.

Recommended GPU Selection on Hyperstack

NVIDIA A6000 (48 GB): Ideal for prototyping, single-scene NeRF/Gaussian Splatting training, and inference. Best value for iterative work.
NVIDIA A100-80GB: Recommended for Shap-E and diffusion-based 3D training, multi-scene batched training, and large-dataset fine-tuning runs.
NVIDIA H100 SXM: Required for training large-scale generalised 3D diffusion models from scratch on datasets like Objaverse-XL.

Start training your 3D models on Hyperstack today. Provision an A100 or A6000 VM in minutes and follow the steps above to go from raw images or text prompts to a renderable 3D scene. Sign up and get started now.

FAQs

What is the best GPU for 3D generative AI training on Hyperstack?

The NVIDIA A6000 is the best starting point for single-scene NeRF and Gaussian Splatting training. For diffusion-based 3D generation or large-dataset training, use the A100-80GB. Multi-GPU H100 configurations are appropriate for training generalised models from scratch.

What is the difference between NeRF and 3D Gaussian Splatting?

NeRF represents a scene as a continuous implicit function that maps 3D coordinates to colour and density. Gaussian Splatting represents a scene as millions of 3D Gaussians with position, colour, opacity, and covariance. Gaussian Splatting trains faster (20-45 minutes vs. hours), renders in real time, and often produces sharper results. NeRF is more flexible for downstream editing and relighting tasks.

Can I use this pipeline for text-to-3D generation without input images?

Yes. Use Shap-E (Step 7 above) or Zero123++ for text-to-3D or single-image-to-3D generation. These are diffusion-based models that do not require multi-view image input. The trade-off is lower geometric precision and more manual cleanup required compared to reconstruction-based methods.

How long does 3D model training take on Hyperstack?

3D Gaussian Splatting trains in 20-45 minutes per scene on an A100. Nerfacto trains in 20-40 minutes. Diffusion model training takes hours to days depending on dataset size and number of GPU nodes. Full training of a generalised 3D model from scratch on Objaverse-scale data takes several days on a multi-GPU cluster.

What 3D file formats can I export from Nerfstudio?

Nerfstudio supports export to .ply (Gaussian splats and meshes), .obj (mesh via marching cubes), and .glb/.gltf for web and game engine use. All formats are compatible with Blender, Unity, Unreal Engine, and standard 3D pipelines.

Do I need a Hugging Face account?

Only if you are downloading gated pretrained checkpoints. Nerfstudio, 3D Gaussian Splatting, and Shap-E can all be used without one.

Innovation, Gen AI, Rendering

Subscribe to Hyperstack!

Enter your email to get updates to your inbox every week

Get Started

Ready to build the next big thing in AI?

Talk to an expert

Share On Social Media

link

How to Train Generative AI for 3D Models: A Comprehensive Guide

Key Takeaways

Prerequisites

Choosing the Right Architecture for 3D Generation

Step 1: Provision Your Hyperstack VM

Step 2: Set Up the Environment

Step 3: Install Nerfstudio

Step 4: Prepare Your Dataset

Step 5: Train a 3D Gaussian Splatting Model

Step 6: Train a NeRF Model (Nerfacto)

Step 7: Text-to-3D with Shap-E

Step 8: Evaluate Your Model

Step 9: Export and Download Your Model

Step 10: Hibernate Your VM When Not Training

Current Limitations to Be Aware Of

Recommended GPU Selection on Hyperstack

FAQs

What is the best GPU for 3D generative AI training on Hyperstack?

What is the difference between NeRF and 3D Gaussian Splatting?

Can I use this pipeline for text-to-3D generation without input images?

How long does 3D model training take on Hyperstack?

What 3D file formats can I export from Nerfstudio?

Do I need a Hugging Face account?

Subscribe to Hyperstack!

Get Started

NVIDIA Nemotron 3 Nano Omni: Process Video, Audio, and ...

Deploy Kimi K2.6 on Hyperstack: A Step-by-Step Guide for ...

Step-by-Step Guide to Deploying Qwen3.6 on Hyperstack for ...

United Kingdom (Head office)

Registered Office

Spain

Solutions

Resources

Site map

Products

Legal

How to Train Generative AI for 3D Models: A Comprehensive Guide

Key Takeaways

Prerequisites

Choosing the Right Architecture for 3D Generation

Step 1: Provision Your Hyperstack VM

Step 2: Set Up the Environment

Step 3: Install Nerfstudio

Step 4: Prepare Your Dataset

Step 5: Train a 3D Gaussian Splatting Model

Step 6: Train a NeRF Model (Nerfacto)

Step 7: Text-to-3D with Shap-E

Step 8: Evaluate Your Model

Step 9: Export and Download Your Model

Step 10: Hibernate Your VM When Not Training

Current Limitations to Be Aware Of

Recommended GPU Selection on Hyperstack

FAQs

What is the best GPU for 3D generative AI training on Hyperstack?

What is the difference between NeRF and 3D Gaussian Splatting?

Can I use this pipeline for text-to-3D generation without input images?

How long does 3D model training take on Hyperstack?

What 3D file formats can I export from Nerfstudio?

Do I need a Hugging Face account?

Subscribe to Hyperstack!

Get Started

Related Post

NVIDIA Nemotron 3 Nano Omni: Process Video, Audio, and ...

Deploy Kimi K2.6 on Hyperstack: A Step-by-Step Guide for ...

Step-by-Step Guide to Deploying Qwen3.6 on Hyperstack for ...

United Kingdom (Head office)

Registered Office

Spain

Solutions

Resources

Site map

Products

Legal