Skip to content
Inspire AI Lab

Blueprint

Run open LLMs on your own hardware.

Plan the model, size the hardware, deploy and monitor it — from a free desktop app. Laptop, on-prem server, or your own cloud account.

Free, no account, no telemetry. Built by Inspire AI Lab — a small engineering firm. The consulting practice funds the tool.

What Blueprint covers

The full lifecycle of a private LLM, in one tool.

Plan and Price run in your browser — start there and the work survives if you close the tab. Deploy, Optimize, Monitor, and Maintain live in the desktop app, because they need to actually see your hardware.

  • Plan

    web

    A curated catalog of open models, fit-scored against your workload and constraints.

  • Price

    web

    VRAM math and on-prem vs cloud cost in your browser. No signup, no per-token bill.

  • Deploy

    app

    Install the runtime, pull the model, start serving — on this machine or a server you SSH to.

  • Optimize

    app

    Tune the quant, context, and GPU layer counts without redeploying — see the throughput change.

  • Monitor

    app

    Live GPU, VRAM, CPU, and tokens-per-second. Catch problems before they page someone.

  • Maintain

    app

    Swap models, update llama.cpp, restart cleanly. The boring lifecycle work, made boring.

Why this exists

Built by an engineering firm, given away for free.

We're Inspire AI Lab — a small firm that helps organizations stand up private AI infrastructure. Most engagements look the same: choose a model, size the rack, install the runtime, harden it, monitor it. We were doing the same handful of steps by hand on every project, so we turned them into a tool.

Blueprint is what we use ourselves, polished enough to put in your hands. It's free because the consulting practice pays for it — and because we think the right way to evaluate this kind of work is to try it, not to read a brochure about it.

If you'd rather hand it to us — model selection, deployment, monitoring hand-off — that's the consulting offer. Book a 30-min review →

Get going

Three ways to start, depending on where you are.

01

Plan a model

Step 1 of Blueprint — pick what fits your workload and see the hardware it needs. ~5 minutes.

Open Step 1 · Plan

02

Download the app

Skip the planning and grab Blueprint for your OS. Run a model in under 10 minutes.

Download Blueprint

03

Have us deploy it

A 30-minute review covers your workload, hardware, security posture. Then we build it.

Book a review