Onyx AI — private, local, yours.

checking server
open-source self-hosted ollama + local llms multi-provider api node.js tailscale mesh cloudflare edge persistent memory knowledge base web search deep research yours to fork open-source self-hosted ollama + local llms multi-provider api node.js tailscale mesh cloudflare edge persistent memory knowledge base web search deep research yours to fork

Why I built this.

I was using ChatGPT and Claude subscriptions, and neither felt like mine — context reset between sessions, data left the machine, and I couldn't tune the behavior for my actual workflow. So I built something that didn't have those problems.

The interesting part wasn't the model side — Ollama makes that easy. It was the infrastructure: threading a Tailscale mesh between two machines, writing auth middleware that enforces per-user session isolation, adding a Cloudflare tunnel so it's reliably reachable from anywhere. The kind of plumbing that doesn't show up in tutorials but is the whole reason it works.

Inference Ollama (local GPU) + multi-provider API fallback — Anthropic, OpenAI, Gemini
Network Tailscale mesh across two servers + Cloudflare tunnel for public access
Memory SQLite — persistent per-user chat history, knowledge base, searchable context
Auth Two-layer — Cloudflare Access at the edge + custom session tokens in the backend

Take Onyx for a spin.

If the server is asleep, your chat will be waiting when it wakes.