Yollama ← blog
case study Jun 18, 2026 · 10 min read · Cloud GPU

The $10 Ollama Experiment:
Benchmarking Thunder Compute’s Neo-Cloud

Hyperscaler GPU bills can kill a project before it even starts. We funded a Thunder Compute account with a single pre-paid $10 credit to see if a "neo-cloud" with virtualized GPU-over-TCP could handle demanding local LLMs, benchmark quants, and hot-swap hardware on the fly.

The Challenge: Hosting Ollama Under $10

Running large models like Llama 3.3 (70B) or DeepSeek-R1-Distill-Llama-70B requires severe VRAM. If you do not own a dual-RTX 3090/4090 desktop or a high-end Mac Studio, you are forced into the cloud. However, major cloud providers (AWS, GCP, Azure) require complex VPC networking, setup fees, and expensive hourly commitments.

Thunder Compute represents a new class of "neo-cloud" providers targeting AI developers. By utilizing a virtualized network-attached GPU layer (GPU-over-TCP), they separate raw GPU cores from underlying CPU nodes. This lets them offer on-demand instances at a fraction of traditional hyperscaler costs.

We set up a strict budget constraint: exactly $10.00 pre-paid. Here is how that budget divides across their active GPU offerings:

GPU Model VRAM Hourly Rate Max Runtime per $10 Optimal Ollama Target
NVIDIA RTX A6000 48 GB $0.35/hr 28.57 hours Llama-3.1-70B-Q4_K_M (43GB)
NVIDIA A100 80GB 80 GB $0.78/hr 12.82 hours DeepSeek-R1-Distill-70B-Q8_0 (75GB)
NVIDIA L40 48 GB $0.89/hr 11.23 hours Stable Diffusion XL / vLLM batches
NVIDIA L40S 48 GB $0.99/hr 10.10 hours Fine-tuning smaller 8B/14B models
NVIDIA H100 80GB 80 GB $1.38/hr 7.24 hours Llama-3.3-70B-Q8_0 (high throughput)

Step-by-Step Execution Workflow

The installation and configuration process on Thunder Compute is built to be developer-friendly. Below is the exact command sequence we executed to get our test environment live.

1. CLI Installation and Login

First, install the `tnr` CLI tool locally. This tool handles authentication, instance provisioning, and port-forwarding tunnels:

curl -fsSL https://raw.githubusercontent.com/Thunder-Compute/thunder-cli/main/scripts/install.sh | bash
tnr login

2. Provisioning with the Ollama Template

We started our run on a low-cost NVIDIA RTX A6000 (48GB). Instead of setting up CUDA and downloading Ollama manually, we used Thunder Compute's pre-built template, which handles the workspace preparation automatically:

tnr create --gpu "a6000" --template "ollama"

This provisions the node in under 30 seconds. To verify the instance is running and grab its unique ID, check the status:

tnr status

3. Connecting and Launching the Service

Establish an SSH link to the container. Thunder Compute manages your SSH keys automatically if you've added them with `tnr ssh-keys add`:

tnr connect <instance_id>

Once inside the shell, launch the Ollama runtime and its web interface. The preconfigured template includes a helper command to start the background servers:

start-ollama

Ollama is now listening inside the instance on port `11434`.

4. Exposing Ollama to Your Local Machine

To route traffic from your local terminal or codebases directly to the cloud GPU instance, tunnel the Ollama port to your local environment. Run this command on your local machine:

tnr connect <instance_id> -t 11434

Now, you can query the remote GPU as if it were running on your local machine:

curl http://localhost:11434/api/tags

The Secret Weapon: Hot-Swapping GPUs

Normally, scaling resources in the cloud requires stopping your VM, detaching volumes, creating a new VM, and re-attaching disks. Under Thunder Compute, you can reconfigure hardware without rebuilding the workspace.

Suppose you began your session benchmarking a 70B model quantized at Q4 (which fits comfortably inside the 48GB VRAM of the $0.35/hr RTX A6000). You now want to test the full, unquantized or Q8_0 weights of a 70B model, which requires an 80GB GPU.

Instead of destroying your environment, you can **hot-swap the GPU** using the modify command:

# Stop the instance first to safely release the GPU allocation
tnr stop <instance_id>

# Modify the instance configuration to upgrade the GPU to an A100 80GB
tnr modify <instance_id> --gpu "a100"

# Spin the instance back up
tnr connect <instance_id>

Your home folder, downloaded model files, and custom Modelfiles are preserved because the persistent disk storage is independent of the dynamic GPU assignment. This is an immense benefit when dealing with huge model weights; downloading a 50GB model from Hugging Face every time you change hardware is a massive waste of time and budget.

Development Mode vs. Production Mode

When customizing your instance, you will encounter the option to run in either Development Mode or Production Mode. Selecting the correct mode is critical for maximizing your $10 budget:

Model Fit & Quant Guide for 48GB vs 80GB

If you are running Ollama on these instances, here is how to size your models to get the best performance and avoid Out-Of-Memory (OOM) crashes:

Model Name Size Quantization RAM/VRAM Required Best Match GPU
DeepSeek-R1-Distill-Llama-70B 70B Q4_K_M ~43 GB RTX A6000 (48GB)
Llama 3.3 70B Q4_K_M ~43 GB RTX A6000 (48GB)
DeepSeek-R1-Distill-Llama-70B 70B Q8_0 ~75 GB A100 80GB / H100 80GB
Llama 3.3 70B Q8_0 ~75 GB A100 80GB / H100 80GB
Command R+ 104B Q4_K_M ~65 GB A100 80GB

The Verdict: Is It Worth It?

For a $10 prepaid budget, Thunder Compute offers exceptional utility. We managed to run a complete benchmark suite testing multiple quantized 70B models, upgrade our GPU on the fly to test Q8 quants, and still had credit left over.

What We Liked:

What to Keep in Mind: