How I’d build it
The engine on the previous page is the strategy. This is the part where I show I can stand it up, not just diagram it. I’d expect to be hands-on building this, so here is the architecture I’d propose, the order I’d ship it in, and the parts I’d own versus the parts that need your interpretability depth.
One data model, two modes
The whole “zero rebuild” claim only holds if the demo and the live product are literally the same application reading the same schema, with a flag that says mode = demo or mode = live. A demo account is a live account that hasn’t been switched on. Migration becomes a boolean, not a project.
So the first decision is the schema, not the UI. Everything downstream (the report, the demo, the monitoring dashboard) is a different view over the same tables: perception scores, persona × query results, competitor benchmarks, the source / citation map, and the content guidebook. Get that right once and the funnel stops generating throwaway work.
Five build blocks
1. The probing engine
async Python · job queue · provider abstractionThe core. It fires thousands of persona-conditioned questions across the major models and providers, captures the answers, and normalizes them into the schema. The hard parts are concurrency, rate-limit handling, and a clean provider abstraction so adding a new model is a config change, not a rewrite. Persona conditioning lives here: the same question asked “as a mid-market CFO” versus “as a developer” is a different prompt and a different row.
2. The source & citation map
citation parsing · domain resolution · influence rankingFor each answer, pull what the model leaned on: explicit citations where the provider gives them, and inferred sources where it doesn’t. Resolve those to domains, cluster them, and rank by how often they show up across the queries that matter. This is what turns “AI thinks you’re niche” into “here are the six pages teaching it that.” It’s also the input to the guidebook.
3. The crawl tracker
log ingestion · UA matching · first-seen / last-seenOnce the client publishes guidebook content, prove it’s working. Watch their access logs for the AI crawler user-agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) and timestamp when each picks up the new content. That’s the “it’s landing” signal that justifies the retainer.
Small flex: I used exactly this technique last week, parsing crawler hits out of nginx logs, to reconstruct who had opened the report I sent your team. The plumbing is familiar.
4. The lite audit
throttled single-persona run · cached · instant teaserThe lead magnet. A stripped-down probing run: one persona, a handful of high-stakes queries, a single competitor. Heavily throttled and cached so a cold prospect gets an alarming taste in seconds without lighting money on fire. It’s the same engine as block 1 with the dials turned down, not separate code.
5. The funnel spine
stage tracking · report render (HTML→PDF) · demo provisioningThe thin layer that moves a prospect through stages 0–5: capture the discovery-call inputs, kick off the full run, render the report, provision the demo account, and flip it to live on close. This is the part that makes the motion runnable by a rep instead of a founder.
What I’d ship, in order
I wouldn’t build all five at once. I’d sequence by what proves value fastest.
Probing engine + report
The wedge. The moment you can generate a credible “here’s what AI says about you, benchmarked” report from a set of inputs, you have the thing that closes meetings. Everything else is leverage on top.
Demo account on the unified schema
Put the report data behind the same UI a customer would log into. Now “see your fixed future” is real, and the zero-rebuild promise is provable, not just asserted.
Crawl tracking + monitoring
Turn the one-time report into a recurring product. This is what the retainer is actually for, and it answers the “the audit is a snapshot” objection directly.
Lite audit + outbound
Once the closing motion works, point a self-serve lite audit at the top of the funnel to feed it. Build the demand engine after the conversion engine, not before.
Where I’m strong, where you carry it
Honest division of labor. I’m not going to pretend I’d out-interpret the people who built the interpretability.
I’d own
- The probing-engine orchestration, concurrency, and provider abstraction
- Cost controls and the model cascade
- The unified schema and the demo–to–live flip
- Crawl tracking and log ingestion
- The funnel spine and report rendering
I’d lean on you
- The black-box interpretability methods that make the analysis defensible
- What “good” looks like in a guidebook
- Which sources actually move a given model
- The science of why a model forms an opinion, not just that it does
The thing that decides if any of this scales
Thousands of model queries per audit is the real productization ceiling, and it’s where my consultancy reflexes kick in. I run per-record LLM work for a living, and the discipline is always the same: never use a frontier model where a cheap one will do.
- Cascade by job. Breadth queries run on cheap, fast models. Only the depth analysis and the “why” reasoning hit a frontier model. Most rows never need the expensive call.
- Cache across clients. Two SaaS companies in the same category ask the models a lot of the same category-level questions. Those answers can be shared and refreshed on a schedule instead of re-queried per audit.
- Batch and schedule. Full audits are not real-time. Queue them, batch them, run them off-peak, and the cost-per-audit becomes a number you can put on a pricing page.
- Draw the lite-vs-full line on cost. The free lite audit has to be cheap by construction (one persona, cached, throttled). The line between lite and full is a budget line as much as a product line.