This open-source model runs entirely on your phone — and it’s shockingly capable

A 3-billion-parameter open model called Wren runs offline on a mid-range phone, and after a week of using it I’m not sure I needed the cloud at all.

MIRA KAUR

JUN 24, 2026 · 4 MIN READ

Close-up of a person holding and using an illuminated smartphone in a dark setting. — PHOTO: OVERWORLD

For the past five years, the deal with AI assistants has been simple and slightly uncomfortable: you type, your words leave your phone, a data center somewhere does the thinking, and the answer comes back. It works. It also means your half-formed questions, your medical worries, and your draft breakup texts all take a round trip through someone else’s servers.

A small open-source project called Wren is quietly arguing that the round trip was never necessary. Wren is a 3-billion-parameter model released under a permissive license, quantized down to about 1.8 GB, and built specifically to run on the hardware most people already carry. I loaded it onto a two-year-old mid-range Android phone — nothing exotic, 8 GB of RAM, a chip that was never meant to impress anyone — and put it through a week of ordinary use with the radios off.

It held up far better than it had any right to.

What it actually does well

The headline number is responsiveness. On my test phone, Wren produced its first token in well under a second and ran at a steady 14 to 16 tokens per second — fast enough that reading speed, not the model, is the bottleneck. There’s no spinner, no “thinking,” no dependency on whether the coffee shop Wi-Fi is having a moment. You tap, it answers.

The quality is where I expected the wheels to come off, and mostly they didn’t. Wren summarizes a long email thread cleanly. It rewrites a clumsy paragraph into something I’d actually send. It answers the kind of factual questions you’d otherwise pull out a search engine for — unit conversions, the gist of a historical event, how to debug a misbehaving shell command — and it does so without the confident nonsense I’ve learned to brace for from small models.

Flat lay arrangement of a coffee mug, newspaper, smartphone, and plant on a wooden table.

It is, to be clear, not a frontier model. Ask it to reason through a genuinely hard multi-step problem and it stumbles. Push it past roughly 8,000 tokens of context and it starts forgetting the top of the conversation. Its knowledge of anything after early 2025 is patchy, because the training cut-off is the training cut-off and there’s no live web to lean on. But for the unglamorous 80 percent of what people actually ask an assistant to do, the gap between Wren and the giant cloud models is much narrower than the parameter counts suggest.

The most striking thing about Wren isn’t how smart it is. It’s how little you miss the cloud once the answers stop leaving your phone.

Why on-device suddenly matters

The case for running a model locally used to be theoretical — a privacy talking point that fell apart the moment you needed real capability. Wren is the first time I’ve felt the trade flip the other way. Three things make it land.

The first is privacy that you don’t have to take on faith. With the network off, nothing can leave. That’s not a policy or a promise buried in a settings menu; it’s physics. For anyone who’s hesitated before typing something genuinely personal into a chat box, that’s a different category of comfort.

The second is cost. Cloud inference is cheap per query and ruinous at scale, which is why every assistant eventually nudges you toward a subscription. A model that runs on hardware you already own has no per-query meter at all. Developers building on top of Wren can ship features that would be financially impossible if every tap pinged a paid API.

A person texting on a smartphone indoors, showing hands and device close-up.

The third is resilience. On a plane, in a basement, on a hike with one bar of signal, Wren simply works. After a week I caught myself reaching for it in exactly those dead-zone moments — the times the cloud assistants are useless — and that’s when the appeal stopped being abstract.

The catch, and what comes next

None of this is frictionless yet. Getting Wren onto a phone still means sideloading a runtime app and a model file, which is fine for the tinkering crowd and a non-starter for everyone else. Battery draw under sustained use is real; a long session warmed my test phone and visibly nudged the battery meter. And because the model is frozen at download, it can’t tell you today’s news or look anything up.

But the trajectory is obvious. Phone makers are already shipping dedicated neural hardware that sits idle most of the day. Quantization keeps getting better. The moment a model this capable comes preinstalled and invisible — no sideloading, no model files, just an assistant that happens to never go online — the cloud-by-default era of mobile AI starts looking like a phase we passed through. Wren isn’t quite that moment. It’s the proof that the moment is close.

WRITTEN BY

Mira Kaur

Mira Kaur writes for Overworld on gadgets, software, and the tech we carry every day.

This open-source model runs entirely on your phone — and it’s shockingly capable

What it actually does well

Why on-device suddenly matters

The catch, and what comes next

Inside the data-center boom quietly reshaping the entire internet

A rival lab releases a model small enough to run on a USB stick

Test Post

The best laptop of 2026 isn’t the one you expected — it’s lighter, cheaper, and lasts all day

This open-source model runs entirely on your phone — and it’s shockingly capable

What it actually does well

Why on-device suddenly matters

The catch, and what comes next

Inside the data-center boom quietly reshaping the entire internet

A rival lab releases a model small enough to run on a USB stick

Test Post

The best laptop of 2026 isn’t the one you expected — it’s lighter, cheaper, and lasts all day

One email. Every morning. Zero filler.