Blueprint
Run open LLMs on your own hardware.
Plan the model, size the hardware, deploy and monitor it — from a free desktop app. Laptop, on-prem server, or your own cloud account.
Free, no account, no telemetry. Built by Inspire AI Lab — a small engineering firm. The consulting practice funds the tool.
What Blueprint covers
The full lifecycle of a private LLM, in one tool.
Plan and Price run in your browser — start there and the work survives if you close the tab. Deploy, Optimize, Monitor, and Maintain live in the desktop app, because they need to actually see your hardware.
Plan
webA curated catalog of open models, fit-scored against your workload and constraints.
Price
webVRAM math and on-prem vs cloud cost in your browser. No signup, no per-token bill.
Deploy
appInstall the runtime, pull the model, start serving — on this machine or a server you SSH to.
Optimize
appTune the quant, context, and GPU layer counts without redeploying — see the throughput change.
Monitor
appLive GPU, VRAM, CPU, and tokens-per-second. Catch problems before they page someone.
Maintain
appSwap models, update llama.cpp, restart cleanly. The boring lifecycle work, made boring.
Why this exists
Built by an engineering firm, given away for free.
We're Inspire AI Lab — a small firm that helps organizations stand up private AI infrastructure. Most engagements look the same: choose a model, size the rack, install the runtime, harden it, monitor it. We were doing the same handful of steps by hand on every project, so we turned them into a tool.
Blueprint is what we use ourselves, polished enough to put in your hands. It's free because the consulting practice pays for it — and because we think the right way to evaluate this kind of work is to try it, not to read a brochure about it.
If you'd rather hand it to us — model selection, deployment, monitoring hand-off — that's the consulting offer. Book a 30-min review →
Get going
Three ways to start, depending on where you are.
01
Plan a model
Step 1 of Blueprint — pick what fits your workload and see the hardware it needs. ~5 minutes.
Open Step 1 · Plan02
Download the app
Skip the planning and grab Blueprint for your OS. Run a model in under 10 minutes.
Download Blueprint03
Have us deploy it
A 30-minute review covers your workload, hardware, security posture. Then we build it.
Book a review