How I’d build it

The engine on the previous page is the strategy. This is the part where I show I can stand it up, not just diagram it. I’d expect to be hands-on building this, so here is the architecture I’d propose, the order I’d ship it in, and the parts I’d own versus the parts that need your interpretability depth.

Caveat up front: I don’t know your internals. You’ve solved problems here I haven’t seen. So treat this as how I’d reason about the build from the outside, and tell me where it’s wrong.

The shape

One data model, two modes

The whole “zero rebuild” claim only holds if the demo and the live product are literally the same application reading the same schema, with a flag that says mode = demo or mode = live. A demo account is a live account that hasn’t been switched on. Migration becomes a boolean, not a project.

So the first decision is the schema, not the UI. Everything downstream (the report, the demo, the monitoring dashboard) is a different view over the same tables: perception scores, persona × query results, competitor benchmarks, the source / citation map, and the content guidebook. Get that right once and the funnel stops generating throwaway work.

The layers

Five build blocks

1. The probing engine

async Python · job queue · provider abstraction

The core. It fires thousands of persona-conditioned questions across the major models and providers, captures the answers, and normalizes them into the schema. The hard parts are concurrency, rate-limit handling, and a clean provider abstraction so adding a new model is a config change, not a rewrite. Persona conditioning lives here: the same question asked “as a mid-market CFO” versus “as a developer” is a different prompt and a different row.

2. The source & citation map

citation parsing · domain resolution · influence ranking

For each answer, pull what the model leaned on: explicit citations where the provider gives them, and inferred sources where it doesn’t. Resolve those to domains, cluster them, and rank by how often they show up across the queries that matter. This is what turns “AI thinks you’re niche” into “here are the six pages teaching it that.” It’s also the input to the guidebook.

3. The crawl tracker

log ingestion · UA matching · first-seen / last-seen

Once the client publishes guidebook content, prove it’s working. Watch their access logs for the AI crawler user-agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) and timestamp when each picks up the new content. That’s the “it’s landing” signal that justifies the retainer.

Small flex: I used exactly this technique last week, parsing crawler hits out of nginx logs, to reconstruct who had opened the report I sent your team. The plumbing is familiar.

4. The lite audit

throttled single-persona run · cached · instant teaser

The lead magnet. A stripped-down probing run: one persona, a handful of high-stakes queries, a single competitor. Heavily throttled and cached so a cold prospect gets an alarming taste in seconds without lighting money on fire. It’s the same engine as block 1 with the dials turned down, not separate code.

5. The funnel spine

stage tracking · report render (HTML→PDF) · demo provisioning

The thin layer that moves a prospect through stages 0–5: capture the discovery-call inputs, kick off the full run, render the report, provision the demo account, and flip it to live on close. This is the part that makes the motion runnable by a rep instead of a founder.

Build order

What I’d ship, in order

I wouldn’t build all five at once. I’d sequence by what proves value fastest.

Probing engine + report

The wedge. The moment you can generate a credible “here’s what AI says about you, benchmarked” report from a set of inputs, you have the thing that closes meetings. Everything else is leverage on top.

Demo account on the unified schema

Put the report data behind the same UI a customer would log into. Now “see your fixed future” is real, and the zero-rebuild promise is provable, not just asserted.

Crawl tracking + monitoring

Turn the one-time report into a recurring product. This is what the retainer is actually for, and it answers the “the audit is a snapshot” objection directly.

Lite audit + outbound

Once the closing motion works, point a self-serve lite audit at the top of the funnel to feed it. Build the demand engine after the conversion engine, not before.

What I’d own

Where I’m strong, where you carry it

Honest division of labor. I’m not going to pretend I’d out-interpret the people who built the interpretability.

I’d own

The probing-engine orchestration, concurrency, and provider abstraction
Cost controls and the model cascade
The unified schema and the demo–to–live flip
Crawl tracking and log ingestion
The funnel spine and report rendering

I’d lean on you

The black-box interpretability methods that make the analysis defensible
What “good” looks like in a guidebook
Which sources actually move a given model
The science of why a model forms an opinion, not just that it does

Cost ceiling

The thing that decides if any of this scales

Thousands of model queries per audit is the real productization ceiling, and it’s where my consultancy reflexes kick in. I run per-record LLM work for a living, and the discipline is always the same: never use a frontier model where a cheap one will do.

Cascade by job. Breadth queries run on cheap, fast models. Only the depth analysis and the “why” reasoning hit a frontier model. Most rows never need the expensive call.
Cache across clients. Two SaaS companies in the same category ask the models a lot of the same category-level questions. Those answers can be shared and refreshed on a schedule instead of re-queried per audit.
Batch and schedule. Full audits are not real-time. Queue them, batch them, run them off-peak, and the cost-per-audit becomes a number you can put on a pricing page.
Draw the lite-vs-full line on cost. The free lite audit has to be cheap by construction (one persona, cached, throttled). The line between lite and full is a budget line as much as a product line.

This is the question I’d most want to pressure-test with you, because it’s where strategy meets the P&L. Get the unit economics right and “anyone can sell it” becomes “anyone can sell it profitably.”