Qwen3.5 4B

Phone / tiny

Alibaba · 4B parameters · released 2026-04 · Apache-2.0 · runs on iOS & Android

✓ Free to run locally☁ Free + paid API

Best small multilingual model. Vision-capable. The mobile sweet spot.

Best used for

MultilingualGeneralWriting

TextVision in

Memory needed (GB)

More compression (Q4) and a smaller context window both lower the RAM this model needs. A bigger context window is not free — watch the numbers climb to the right.

Quant	4K ctx	8K ctx	32K ctx	128K ctx
Q4	2.3	2.4	2.9	4
Q8	4.6	4.8	5.8	8
FP16	9.2	9.6	11.6	16

Ways to run it

✓

On your own machine — free & private

LM Studio (search)

qwen3.5-4b

Ollama (local)

ollama run qwen3.5:4b

☁

Or as a hosted API — optional, for when you're away

Reachable on OpenRouter with a free tier — no per-token bill on the free model id. Same one key also works with Ollama's cloud option and most chat apps.

OpenRouter model id

qwen/qwen3.5-4b:free

New to this? Local vs. cloud, and what's actually free →

Source: https://huggingface.co/Qwen · verified 2026-06-15