I was using ChatGPT and Claude subscriptions, and neither felt like mine — context reset between sessions, data left the machine, and I couldn't tune the behavior for my actual workflow. So I built something that didn't have those problems.
The interesting part wasn't the model side — Ollama makes that easy. It was the infrastructure: threading a Tailscale mesh between two machines, writing auth middleware that enforces per-user session isolation, adding a Cloudflare tunnel so it's reliably reachable from anywhere. The kind of plumbing that doesn't show up in tutorials but is the whole reason it works.
If the server is asleep, your chat will be waiting when it wakes.