← All models
Gemma 4 4B
Phone / tinyGoogle · 4B parameters · released 2026-03 · Gemma · runs on iOS & Android
✓ Free to run locally☁ Free on OpenRouter
Text + image input, 128K context, 140+ languages. Clean writing.
Best used for
WritingVision / image inputMultilingual
TextVision in
Memory needed (GB)
More compression (Q4) and a smaller context window both lower the RAM this model needs. A bigger context window is not free — watch the numbers climb to the right.
| Quant | 4K ctx | 8K ctx | 32K ctx | 128K ctx |
|---|---|---|---|---|
| Q4 | 2.3 | 2.4 | 2.9 | 4 |
| Q8 | 4.6 | 4.8 | 5.8 | 8 |
| FP16 | 9.2 | 9.6 | 11.6 | 16 |
Ways to run it
✓
On your own machine — free & private
LM Studio (search)
gemma-4-4bOllama (local)
ollama run gemma4:4b☁
Or as a hosted API — optional, for when you're away
Reachable on OpenRouter with a free tier — no per-token bill on the free model id. Same one key also works with Ollama's cloud option and most chat apps.
OpenRouter model id
google/gemma-4-4b:freeNew to this? Local vs. cloud, and what's actually free →
Source: https://huggingface.co/google · verified 2026-06-15