Blueprint

Run open LLMs on your own hardware.

Plan the model, size the hardware, deploy and monitor it — from a free desktop app. Laptop, on-prem server, or your own cloud account.

Start with Step 1 · Plan Download Blueprint Or request a guided demo

Free, no account, no telemetry. Built by Inspire AI Lab — a small engineering firm. The consulting practice funds the tool.

What Blueprint covers

The full lifecycle of a private LLM, in one tool.

Plan and Price run in your browser — start there and the work survives if you close the tab. Deploy, Optimize, Monitor, and Maintain live in the desktop app, because they need to actually see your hardware.

Plan
web
A curated catalog of open models, fit-scored against your workload and constraints.
Price
web
VRAM math and on-prem vs cloud cost in your browser. No signup, no per-token bill.
Deploy
app
Install the runtime, pull the model, start serving — on this machine or a server you SSH to.
Optimize
app
Tune the quant, context, and GPU layer counts without redeploying — see the throughput change.
Monitor
app
Live GPU, VRAM, CPU, and tokens-per-second. Catch problems before they page someone.
Maintain
app
Swap models, update llama.cpp, restart cleanly. The boring lifecycle work, made boring.

Why this exists

Built by an engineering firm, given away for free.

We're Inspire AI Lab — a small firm that helps organizations stand up private AI infrastructure. Most engagements look the same: choose a model, size the rack, install the runtime, harden it, monitor it. We were doing the same handful of steps by hand on every project, so we turned them into a tool.

Blueprint is what we use ourselves, polished enough to put in your hands. It's free because the consulting practice pays for it — and because we think the right way to evaluate this kind of work is to try it, not to read a brochure about it.

If you'd rather hand it to us — model selection, deployment, monitoring hand-off — that's the consulting offer. Book a 30-min review →

Get going

Three ways to start, depending on where you are.

Plan a model

Step 1 of Blueprint — pick what fits your workload and see the hardware it needs. ~5 minutes.

Open Step 1 · Plan

Download the app

Skip the planning and grab Blueprint for your OS. Run a model in under 10 minutes.

Download Blueprint

Have us deploy it

A 30-minute review covers your workload, hardware, security posture. Then we build it.

Book a review