Hardware Matcher
Run Gemma 4 Locally
Auto-detects your GPU — get the right model and command for your hardware
What this matcher helps you decide
This tool eliminates the guesswork:
- → Which Gemma 4 tier fits your hardware — E2B for phones, 26B MoE for consumer desktops, 31B Dense for workstations.
- → What framework to use — Ollama for CLI users, LM Studio for a GUI, Google AI Edge Gallery for mobile.
- → What to expect in speed and stability — realistic token-per-second estimates and known caveats for your specific configuration.
Common Gemma 4 local setups
| Hardware | Recommended Setup | Best For | Notes | |
|---|---|---|---|---|
| MacBook Pro 16 GB | 26B-A4B MoE via Ollama (GGUF) | General chat | Keep context under 8k; avoid MLX (known bugs) | Try it → |
| RTX 4060 8 GB | E4B via Ollama / LM Studio | Chat, lightweight local use | 26B MoE needs 12 GB+ VRAM | Try it ↑ |
| RTX 4070 Ti 16 GB | 26B-A4B MoE via Ollama | Chat, multimodal, coding | Sweet spot — comfortable headroom for MoE | Try it ↑ |
| RTX 4090 24 GB | 31B Dense via Ollama | Coding, deep reasoning | Use -np 1; long context (10k+) is tight even on 24 GB | Try it ↑ |
| iPhone 15 Pro | E2B via AI Edge Gallery | Offline chat, translation | E4B crashes on <10 GB RAM — stick to E2B | Try it → |
| Android flagship 8 GB | E2B via AI Edge Gallery | Offline assistant | E4B needs 10 GB+ RAM; E2B is the safe choice | Try it → |
How the hardware matcher works
No installs, no sign-ups. Open the page and get a personalized setup in seconds.
Auto-detect your GPU
The moment you load this page, the tool reads your GPU via browser-native WebGPU and WebGL APIs. On Mac, it identifies your exact Apple Silicon chip and unified memory size. On PC, it reads your NVIDIA / AMD model and VRAM. No data leaves your browser.
Match to the best model
Your detected GPU, VRAM (or unified memory), and OS are cross-referenced against Gemma 4's model tiers — Edge for phones, 26B MoE for consumer hardware, 31B Dense for workstations. The matcher picks the largest model your hardware can run comfortably and flags any known caveats.
Copy your run command
You get a ready-to-paste terminal command for Ollama, llama.cpp, or Transformers — with the right model tag, context flags, and performance tweaks pre-filled. On mobile, the tool links directly to Google AI Edge Gallery for one-tap install.
Everything runs in your browser. The GPU detection uses the same APIs that games and 3D apps use to render graphics. No data sent to any server, no cookies, no analytics, no user tracking. You can even use the tool offline after the page loads.
How to run Gemma 4 locally
Three steps from zero to a working local setup.
Pick a runtime
Ollama
CLI-first. One command to download and run. Best for developers comfortable with the terminal.
LM Studio
GUI app for Mac and Windows. Visual model browser, real-time VRAM monitoring, no terminal needed.
Mobile (AI Edge Gallery)
Run E-series models offline on iOS and Android. No cloud, no API key.
Download the right model
Use the matcher above to find your recommended tier, then pull the model. For Ollama:
ollama run gemma4:26b Run and verify
Start a conversation and watch memory usage. If you hit OOM errors or slowdowns,
reduce context length via
--ctx-size 4096
or try a more aggressively quantized variant.
See it in action
Instant GPU detection
Opens the page, your GPU and memory are identified automatically — no input needed.
Personalized recommendation
Get the right model tier, expected speed, and a copy-paste terminal command.
Full manual control
Switch to manual mode to set OS, RAM, VRAM by hand — or compare upgrade scenarios.
Auto detection vs manual selection
The matcher supports two modes. Most users never need to leave auto mode — but manual mode is there when you need full control.
Auto mode
DefaultOpens instantly when the tool detects your GPU. Shows your hardware, recommended model, and a ready-to-paste run command — no input required.
Best for: first-time users, quick lookups, anyone who just wants the answer fast.
Manual mode
Click "Switch to manual selection" to set OS, RAM, and VRAM by hand. Useful when auto-detection doesn't match your actual setup.
Best for: laptops with dual GPUs (integrated + discrete), privacy browsers that block GPU fingerprinting, or comparing "what if I upgrade to 32 GB?"
When should you switch to manual?
- → Dual-GPU laptop — the browser often reports the weaker integrated GPU (Intel UHD) instead of your discrete NVIDIA/AMD card.
- → New GPU not in our database — very recent models may show as "Unknown GPU". Manual mode lets you enter specs directly.
- → Planning a purchase — switch to manual and try different RAM/VRAM combos to see which upgrade unlocks a bigger model tier.
Setup guides by platform
Detailed guides with device-specific tips, model recommendations, and step-by-step instructions.
Frequently asked questions
How does the automatic GPU detection work? +
Why does the tool detect my GPU but not my RAM? +
navigator.deviceMemory
API exists but is limited to Chromium browsers and caps at 8 GB —
not useful for local AI workloads. GPU model names, on the other hand, are
exposed through WebGPU and WebGL for rendering purposes, which is why we
can detect your GPU but still need you to select your RAM manually.