LLM Hosting Research — Session 2026-06-25

Goal: First cost comparison research pass — MiniMax-M3-class models on $10,000 AUD/year budget with >10 tok/s throughput.


Chunk 2026-06-25T01:24 (autopilot tick)

What was done:

  • Gathered current GPU cloud pricing across major providers (RunPod, Vast.ai, Lambda, AWS, GCP, Spheron, JarvisLabs)
  • Researched MiniMax-M3 model specifications and hardware requirements
  • Compiled cost comparison for GPUs that can run M3

Key Findings

MiniMax-M3 Requirements

  • Size: ~428B parameters (Mixture of Experts, 23B active params)
  • Quantized sizes: UD-IQ1_M (128GB), UD-IQ3_XXS (159GB), UD-IQ4_XS (208GB), UD-Q4_K_XL (265GB)
  • Minimum VRAM: ~80GB for smallest quantization (UD-IQ1_M), 133GB total memory needed
  • Inference throughput: Depends on GPU; need >10 tok/s target

GPUs that fit the M3 model:

GPUVRAMCan Run M3?Min Quantization
RTX 409024GBNo (too small, even quantized)
A100 PCIe 40GB40GBNo
A100 SXM 80GB80GBMarginal (UD-IQ1_M only)UD-IQ1_M
H100 SXM 80GB80GBYes (UD-IQ3_XXS likely, tight for UD-IQ4_XS)UD-IQ3_XXS / UD-IQ4_XS
H200 SXM 141GB141GBYes comfortablyUD-Q4_K_XL+
B200 SXM 192GB192GBYes, plenty of headroomAny quantization

GPU Cloud Pricing (USD/hour, on-demand unless noted):

ProviderA100 80GBH100 SXM 80GBH200 SXMB200 SXMRTX 4090
Lambda$1.993.99$6.69
RunPod (Secure Cloud)1.643.89$4.39$0.69
Vast.ai (marketplace)2.004.000.55
Spheron0.60)1.03)$4.54$6.02$0.55
JarvisLabs$1.49$2.69$3.80
AWS p5~$3.43~$6.88N/A
GCP A3~$5.78~3.69)N/A

Cost Analysis for 6,400 USD at 0.64 rate)

Scenario 1: H100 SXM 80GB — Minimum viable for M3 inference

  • Lambda Labs: 78.96/day → $5,400/yr ✅ (within budget at ~62% utilisation)
  • RunPod Secure: $3.29/hr × 24h = same as Lambda above
  • Spheron spot: 24.72/day → $1,800/yr ✅✅ (excellent value, but spot risk)
  • Vast.ai marketplace: 2.27/hr → **4,050/yr** ✅
  • AWS p5: 165.12/day → $60,269/yr ❌ (way over budget)

Scenario 2: H200 SXM — Comfortable for M3 with headroom

  • JarvisLabs: 91.20/day → $33,288/yr
  • Spheron: 108.96/day → $39,770/yr

Scenario 3: B200 — Overkill but available

  • Lambda Labs: $6.69/hr → way over budget
  • Spheron spot: 4,450/yr ✅ (surprisingly affordable)
RankOptionGPUEst. Cost/yr (USD)ProsCons
1Lambda LabsH100 SXM 80GB~7,368 (11,600 AUD)Reliable, no spot risk, good supportPrice pushes budget slightly
2Spheron (spot)H100 SXM 80GB~2,800 AUD)Cheapest viable optionSpot interruptions possible
3Vast.aiH100 marketplace~4,050 (6,300 AUD)Flexible, cheapVariable reliability, no SLA
4Spheron (spot)B200 SXM 192GB~7,000 AUD)Huge headroom, fast inferenceSpot risk, newer architecture

Next chunk picks up:

  • Validate throughput claims (>10 tok/s on H100 with M3 UD-IQ4_XS quantization)
  • Check if multi-GPU RTX 4090 setups could work (dual 4090 = 48GB VRAM total, likely insufficient for any reasonable quant)
  • Investigate serverless inference options (RunPod Serverless, Together AI API pricing at 1.20 per 1M tokens for M3)

Sources: