← All models
Llama 4 3B
Phone / tinyMeta · 3B parameters · released 2026-02 · Llama-4-Community · runs on iOS & Android
✓ Free to run locally☁ Free + paid API
Broad ecosystem support. Safe default for small hardware.
Best used for
GeneralCoding
Text
Memory needed (GB)
More compression (Q4) and a smaller context window both lower the RAM this model needs. A bigger context window is not free — watch the numbers climb to the right.
| Quant | 4K ctx | 8K ctx | 32K ctx | 128K ctx |
|---|---|---|---|---|
| Q4 | 1.7 | 1.8 | 2.2 | 3 |
| Q8 | 3.5 | 3.6 | 4.4 | 6 |
| FP16 | 6.9 | 7.2 | 8.7 | 12 |
Ways to run it
✓
On your own machine — free & private
LM Studio (search)
llama-4-3bOllama (local)
ollama run llama4:3b☁
Or as a hosted API — optional, for when you're away
Reachable on OpenRouter with a free tier — no per-token bill on the free model id. Same one key also works with Ollama's cloud option and most chat apps.
OpenRouter model id
meta-llama/llama-4-3b:freeNew to this? Local vs. cloud, and what's actually free →
Source: https://huggingface.co/meta-llama · verified 2026-06-15