Gemma 4 Local

Hardware Matcher

Run Gemma 4 Locally

Auto-detects your GPU — get the right model and command for your hardware

Scanning your hardware…

What this matcher helps you decide

This tool eliminates the guesswork:

  • Which Gemma 4 tier fits your hardware — E2B for phones, 26B MoE for consumer desktops, 31B Dense for workstations.
  • What framework to use — Ollama for CLI users, LM Studio for a GUI, Google AI Edge Gallery for mobile.
  • What to expect in speed and stability — realistic token-per-second estimates and known caveats for your specific configuration.

Common Gemma 4 local setups

Hardware Recommended Setup Best For Notes
MacBook Pro 16 GB 26B-A4B MoE via Ollama (GGUF) General chat Keep context under 8k; avoid MLX (known bugs) Try it →
RTX 4060 8 GB E4B via Ollama / LM Studio Chat, lightweight local use 26B MoE needs 12 GB+ VRAM Try it ↑
RTX 4070 Ti 16 GB 26B-A4B MoE via Ollama Chat, multimodal, coding Sweet spot — comfortable headroom for MoE Try it ↑
RTX 4090 24 GB 31B Dense via Ollama Coding, deep reasoning Use -np 1; long context (10k+) is tight even on 24 GB Try it ↑
iPhone 15 Pro E2B via AI Edge Gallery Offline chat, translation E4B crashes on <10 GB RAM — stick to E2B Try it →
Android flagship 8 GB E2B via AI Edge Gallery Offline assistant E4B needs 10 GB+ RAM; E2B is the safe choice Try it →

How the hardware matcher works

No installs, no sign-ups. Open the page and get a personalized setup in seconds.

1

Auto-detect your GPU

The moment you load this page, the tool reads your GPU via browser-native WebGPU and WebGL APIs. On Mac, it identifies your exact Apple Silicon chip and unified memory size. On PC, it reads your NVIDIA / AMD model and VRAM. No data leaves your browser.

2

Match to the best model

Your detected GPU, VRAM (or unified memory), and OS are cross-referenced against Gemma 4's model tiers — Edge for phones, 26B MoE for consumer hardware, 31B Dense for workstations. The matcher picks the largest model your hardware can run comfortably and flags any known caveats.

3

Copy your run command

You get a ready-to-paste terminal command for Ollama, llama.cpp, or Transformers — with the right model tag, context flags, and performance tweaks pre-filled. On mobile, the tool links directly to Google AI Edge Gallery for one-tap install.

💡

Everything runs in your browser. The GPU detection uses the same APIs that games and 3D apps use to render graphics. No data sent to any server, no cookies, no analytics, no user tracking. You can even use the tool offline after the page loads.

How to run Gemma 4 locally

Three steps from zero to a working local setup.

1

Pick a runtime

Ollama

CLI-first. One command to download and run. Best for developers comfortable with the terminal.

LM Studio

GUI app for Mac and Windows. Visual model browser, real-time VRAM monitoring, no terminal needed.

Mobile (AI Edge Gallery)

Run E-series models offline on iOS and Android. No cloud, no API key.

2

Download the right model

Use the matcher above to find your recommended tier, then pull the model. For Ollama:

ollama run gemma4:26b
3

Run and verify

Start a conversation and watch memory usage. If you hit OOM errors or slowdowns, reduce context length via --ctx-size 4096 or try a more aggressively quantized variant.

See it in action

Hardware matcher auto-detection showing Apple M1 Pro with 16 GB unified memory and ~100 GB/s bandwidth

Instant GPU detection

Opens the page, your GPU and memory are identified automatically — no input needed.

Matched result showing 26B MoE recommendation with Ollama run command and 17-22 tok/s speed estimate

Personalized recommendation

Get the right model tier, expected speed, and a copy-paste terminal command.

Animated demo of switching from auto-detection to manual mode, selecting OS, RAM, and VRAM

Full manual control

Switch to manual mode to set OS, RAM, VRAM by hand — or compare upgrade scenarios.

Auto detection vs manual selection

The matcher supports two modes. Most users never need to leave auto mode — but manual mode is there when you need full control.

Auto mode

Default

Opens instantly when the tool detects your GPU. Shows your hardware, recommended model, and a ready-to-paste run command — no input required.

Best for: first-time users, quick lookups, anyone who just wants the answer fast.

🎛

Manual mode

Click "Switch to manual selection" to set OS, RAM, and VRAM by hand. Useful when auto-detection doesn't match your actual setup.

Best for: laptops with dual GPUs (integrated + discrete), privacy browsers that block GPU fingerprinting, or comparing "what if I upgrade to 32 GB?"

When should you switch to manual?

  • Dual-GPU laptop — the browser often reports the weaker integrated GPU (Intel UHD) instead of your discrete NVIDIA/AMD card.
  • New GPU not in our database — very recent models may show as "Unknown GPU". Manual mode lets you enter specs directly.
  • Planning a purchase — switch to manual and try different RAM/VRAM combos to see which upgrade unlocks a bigger model tier.

Frequently asked questions

How does the automatic GPU detection work? +
This tool uses browser-native APIs to identify your GPU — no downloads or extensions required. It first tries the WebGPU API (supported in Chrome, Edge, and recent Safari), which reports your GPU model and architecture directly. If WebGPU is unavailable, it falls back to WebGL renderer detection. For Apple Silicon, it identifies your exact chip (M1, M2 Pro, M4 Max, etc.) and unified memory size. Everything runs entirely in your browser — no data is sent to any server.
Why does the tool detect my GPU but not my RAM? +
Browser security restrictions prevent websites from reading your exact system RAM. The navigator.deviceMemory API exists but is limited to Chromium browsers and caps at 8 GB — not useful for local AI workloads. GPU model names, on the other hand, are exposed through WebGPU and WebGL for rendering purposes, which is why we can detect your GPU but still need you to select your RAM manually.
The detection shows the wrong GPU or VRAM — what should I do? +
Click "Switch to manual selection" below the detected hardware card to enter manual mode, where you can set your OS, RAM, and VRAM by hand. Common reasons for inaccurate detection include: laptops with dual GPUs (the browser may report the integrated GPU instead of the discrete one), privacy-focused browsers that block GPU fingerprinting, or newer GPU models not yet in our lookup database.