← All models
Qwen3.5 4B
Phone / tinyAlibaba · 4B parameters · released 2026-04 · Apache-2.0 · runs on iOS & Android
✓ Free to run locally☁ Free + paid API
Best small multilingual model. Vision-capable. The mobile sweet spot.
Best used for
MultilingualGeneralWriting
TextVision in
Memory needed (GB)
More compression (Q4) and a smaller context window both lower the RAM this model needs. A bigger context window is not free — watch the numbers climb to the right.
| Quant | 4K ctx | 8K ctx | 32K ctx | 128K ctx |
|---|---|---|---|---|
| Q4 | 2.3 | 2.4 | 2.9 | 4 |
| Q8 | 4.6 | 4.8 | 5.8 | 8 |
| FP16 | 9.2 | 9.6 | 11.6 | 16 |
Ways to run it
✓
On your own machine — free & private
LM Studio (search)
qwen3.5-4bOllama (local)
ollama run qwen3.5:4b☁
Or as a hosted API — optional, for when you're away
Reachable on OpenRouter with a free tier — no per-token bill on the free model id. Same one key also works with Ollama's cloud option and most chat apps.
OpenRouter model id
qwen/qwen3.5-4b:freeNew to this? Local vs. cloud, and what's actually free →
Source: https://huggingface.co/Qwen · verified 2026-06-15