Local vs. cloud, and what's actually free
Every open model on this site can run two ways: on your own machine (free, private, offline) or as a hosted API you reach over the internet. This page explains the difference in plain language — and exactly where money does and doesn't change hands.
The whole point of this app is the local path. But the same open models are also available as a cloud API, and that confuses people about cost. So here is the one thing to remember:
Running it locally is free, forever
Open-weight models are free to download and run on hardware you already own. No account, no API key, no per-message bill. Once it's on your disk it works offline and nobody can take it away or change the price.
The cloud is optional — and often also free
The same model hosted on someone else's server, reachable as an API for when you're away from your machine or on a phone. Many have a free tier; bigger usage is cheap pay-per-token. You never have to use it.
These are two separate facts about a model. That's why each model on this site shows two badges: a green ✓ Free to run locally chip (always true for open weights) and a separate cloud chip telling you whether the hosted API is free, paid, or both.
Way 1 — On your own machine (the free one)
This is what the rest of this site is about. You download the model file once and run it with a local app. Nothing leaves your computer.
- LM Studio — a friendly desktop app with a search box. Find the model id shown on its page, click download, chat. Best starting point for most people.
- Ollama (local) — a one-line command tool:
ollama run llama4downloads and starts the model. Great for wiring into other tools.
Cost: $0 per use, forever. The only “price” is having enough RAM — which is exactly what the model list and hardware guide help you check.
Way 2 — As a hosted API (Ollama Cloud & OpenRouter)
Sometimes you're on a phone, a weak laptop, or away from your main machine and still want the model. That's what the cloud is for. The model is the same; it just runs on someone else's hardware and you reach it over the internet with an API key.
Ollama Cloud
The makers of Ollama also offer Ollama Cloud — the exact same ollama run experience, but the model runs on their servers instead of your machine. You use it when a model is too big for your hardware or you're on a device that can't host it. It has a free allowance, then paid usage above that. Your local Ollama and Ollama Cloud share one tool, so you can flip between “runs here” and “runs in the cloud” without changing how you work.
OpenRouter
OpenRouter is a single gateway to hundreds of models — open and proprietary — behind one API key. It matters here for two reasons:
- You don't need to pay for an API. Many open models are listed with a free model id (they end in
:free). You can call them with no per-token charge — handy for trying a model before you download it, or for light use on the go. - When you do pay, it's cheap. Heavier models, or higher rate limits, are pay-per-token at a few cents per million tokens — far below a typical chatbot subscription, and you only pay for what you send.
On each model's page you'll see its OpenRouter model id when one exists, so you can copy it straight into any app that speaks the OpenAI/OpenRouter API.
So which should I use?
| If you want… | Pick |
|---|---|
| Total privacy, offline, zero cost | Local — LM Studio or Ollama |
| To try a model before downloading it | OpenRouter free model id |
| The model on your phone or a weak laptop | Ollama Cloud or OpenRouter |
| A model too big for your RAM, occasionally | Cloud, pay-per-token (cheap) |
The bottom line
You never have to pay for an API to use these models — running them locally is free and always will be. The cloud is a convenience for when local isn't practical, and even then a free tier usually covers light use. Pay only when you choose to, only for what you use. Browse the models →