Deterministic First, Multi-LLM Second: How OFFO's Deal Intelligence Pipeline Works

When someone pastes a $42,000 used EV listing into OFFO at 11pm on a Sunday, they want a verdict in under two seconds — not a spinner for 30 seconds while three AI models deliberate. But they also want the analysis to be right.

These two goals appear to be in tension. They’re not, if you sequence the pipeline correctly.

This post describes how OFFO’s deal intelligence pipeline works: a deterministic rule engine that produces a full, meaningful receipt in under 500ms, followed by an asynchronous three-model AI chain that upgrades it in the background. The user never sees a loading state for the AI.

The two-layer architecture

Every OFFO receipt is generated twice. The first generation is instant and deterministic. The second is async and AI-powered.

Layer 1 — Lite Receipt

~150 deterministic rule patterns
Negation-aware regex extraction
Hard blocker detection
Fit score + evidence score
Returns in < 500ms

Layer 2 — AI Upgrade (async)

Grok: damage classification + tone
Gemini: routine impact + owner translation
GPT-4o: repair cost breakdown (skipped when not needed)
Max 3 model calls enforced
15-minute background budget

The key insight: the deterministic layer is not a placeholder. It produces a real, actionable verdict based on signals that AI frequently gets wrong anyway — title status, accident history, service records. These facts don’t require inference. They require pattern matching, and pattern matching is fast.

Receipt pipeline

Here’s the full receipt pipeline from POST to response, including the background AI upgrade path. The client gets Layer 1 synchronously; Layer 2 streams back via polling on generation_status.

Signal extraction: negation-aware pattern matching

The most important design decision in the rule engine is negation awareness. A listing that says “no frame damage” is categorically different from one that says “frame damage.” Early versions of GPT-based extractors failed this test regularly.

Each signal pattern carries both a positive match and an optional negation override:

// lib/receipt-signal-extractor.ts
const HARD_BLOCKER_PATTERNS: SignalPattern[] = [
  {
    signalId: "frame_damage_major",
    positive: [/\b(frame\s+damage|structural\s+damage|unibody\s+damage)\b/],
    negation: [/\b(no\s+frame\s+damage|no\s+structural\s+damage)\b/],
  },
];

// Negation is checked first — if the negation regex matches,
// the positive match is discarded even if it also fires.

This matters for listings that use softening language: “fully repaired flood vehicle” still triggers title_salvage via structured field extraction, even if the listing text tries to minimize it.

Live example — generated by scenario-generator.ts at build time

Rebuilt-Title Bolt EV — Hard Blocker Activation

A rebuilt-title flood vehicle with partial service history. The rule engine immediately triggers the title_salvage hard blocker and collapses fit score, even though the listing text uses softening language like 'fully repaired'.

Input signals detected (7)

· title_salvage
· ownership_history_clear
· battery_proof_missing
· battery_warranty_unclear
· service_records_missing
· vin_missing
+ 1 more

Scoring result

Verdict: RED

Fit score: 100

Evidence score: 0

Hard blocker: YES

{
  "fit_score": 100,
  "evidence_score": 0,
  "verdict": "RED",
  "evidence_label": "MISSING",
  "scoring_reasons": [
    {
      "signal_id": "title_salvage",
      "category": "listing_risk",
      "points": 0,
      "label": "Salvage, rebuilt, flood, or lemon title detected"
    },
    {
      "signal_id": "ownership_history_clear",
      "category": "listing_risk",
      "points": 5,
      "label": "Ownership history shown"
    },
    {
      "signal_id": "battery_proof_missing",
      "category": "missing_proof",
      "points": -15,
      "label": "No battery health proof provided"
    },
    {
      "signal_id": "battery_warranty_unclear",
      "category": "missing_proof",
      "points": -6,
      "label": "Battery warranty status not shown"
    },
    {
      "signal_id": "service_records_missing",
      "category": "missing_proof",
      "points": -8,
      "label": "No service history shown"
    },
    {
      "signal_id": "vin_missing",
      "category": "missing_proof",
      "points": -6,
      "label": "VIN not provided"
    },
    {
      "signal_id": "structural_claim_no_photo",
      "category": "listing_risk",
      "points": -15,
      "label": "Structural damage claimed but no frame/underbody photo"
    },
    {
      "signal_id": "title_status_unclear",
      "category": "missing_proof",
      "points": -12,
      "label": "Title status not explicitly stated"
    }
  ],
  "why_not_green": [
    {
      "signal_id": "battery_proof_missing",
      "category": "missing_proof",
      "points": -15,
      "label": "No battery health proof provided"
    },
    {
      "signal_id": "structural_claim_no_photo",
      "category": "listing_risk",
      "points": -15,
      "label": "Structural damage claimed but no frame/underbody photo"
    },
    {
      "signal_id": "title_status_unclear",
      "category": "missing_proof",
      "points": -12,
      "label": "Title status not explicitly stated"
    },
    {
      "signal_id": "service_records_missing",
      "category": "missing_proof",
      "points": -8,
      "label": "No service history shown"
    },
    {
      "signal_id": "battery_warranty_unclear",
      "category": "missing_proof",
      "points": -6,
      "label": "Battery warranty status not shown"
    },
    {
      "signal_id": "vin_missing",
      "category": "missing_proof",
      "points": -6,
      "label": "VIN not provided"
    },
    {
      "signal_id": "title_salvage",
      "category": "listing_risk",
      "points": 0,
      "label": "Salvage, rebuilt, flood, or lemon title detected"
    }
  ],
  "hard_blocker_hit": true,
  "verify_before_visit": [
    "No battery health proof provided",
    "Structural damage claimed but no frame/underbody photo",
    "Title status not explicitly stated",
    "No service history shown",
    "Battery warranty status not shown"
  ]
}

The auction AI chain: three models, max three calls

The auction pipeline is more AI-intensive than the receipt pipeline because auction lots don’t have structured fields — we’re working from raw listing text, copart photos, and VIN lookups. But we still enforce a hard cap: maximum three meaningful model calls per request.

The routing logic is encoded in canSkipRepairCost():

// lib/auction/auction-ai-chain.ts
function canSkipRepairCost(
  metrics: DeterministicMetrics,
  arv: number | null,
  isPaid: boolean
): boolean {
  // Only skip when damage is confirmed minor AND we have real damage data
  // (not a data-absent default) — source_confidence: "low" means skip is unsafe
  if (
    metrics.damage_severity_baseline === "minor" &&
    arv !== null &&
    metrics.source_confidence !== "low"
  ) return true;
  return false;
}

When the skip fires, GPT-4o is never called. Gemini handles routine impact in parallel with a skipped repair cost slot, and Grok runs the final polish. Total: 2 calls. When the skip doesn’t fire (severe damage, unknown ARV, low source confidence), all three stages run — GPT-4o and Gemini run in parallel at step 2, Grok finalizes at step 3.

Each step is logged as an AiStepLog:

export interface AiStepLog {
  step: string;
  model: string;
  status: "success" | "failed" | "skipped" | "cancelled";
  latency_ms: number;
  error?: string;
}

Routine fit: six dimensions, one sigmoid

The routine fit engine is the most mathematically interesting part of OFFO. It converts a user’s daily driving pattern into a 0–100 score across six weighted dimensions. No AI is involved — it’s a pure function.

The range buffer dimension (25% weight) uses a logistic sigmoid instead of piecewise linear buckets. This eliminates the cliff problem where a user at 59% range usage scores dramatically differently from one at 61%:

// lib/compute-routine-fit.ts — Phase 4A: sigmoid range buffer
// Centered at 62% usage. Approximate values:
//   0%  → 100   30% → ~96   50% → ~82   60% → ~68
//  70%  → ~50   80% → ~31   90% → ~17  100% →  ~8

function rangeScoreFromUsagePct(usagePct: number): number {
  const pct = Math.max(0, Math.min(100, usagePct));
  // Logistic sigmoid: 100 / (1 + e^(k*(pct - center)))
  const raw = 100 / (1 + Math.exp(0.085 * (pct - 62)));
  return Math.max(5, Math.round(raw));
}

Cross-dimension multipliers apply after individual scoring. The catastrophic failure zone collapses the final score when routine is near-unworkable:

// Catastrophic zone: usage > 90% OR (public charging + poor density)
// × 0.85 collapse multiplier applied to weighted sum

// Compound checks (non-catastrophic):
// × 0.91 — low charging + cold climate street parking
// × 0.93 — public charging + high mileage
// × 0.95 — shared charger + long commute

Live example — Mild Climate + Home L2 — Textbook Great Fit

A suburban commuter with garage L2 charging in a mild climate. Usage runs ~12% of range per day. The sigmoid range buffer gives a near-perfect score and the budget dimension clears comfortably.

Score

Great Fit

Label

low

Mental load

Failure prob.

100

charging

range

recovery

100

climate

100

budget

utility

Why not AI-first?

The original OFFO prototype was AI-first. A single GPT-4 call received the listing text and returned a structured receipt. It was slow (8–18 seconds), expensive ($0.04–0.12 per call), and wrong in predictable ways.

Problem 1: Factual signals are not inference problems

“Salvage title” in a listing is not ambiguous. It doesn’t require reasoning — it requires reading. A regex finds it in 0.1ms. GPT-4 occasionally missed it when buried in dense listing text.

Problem 2: Negation failures

“No frame damage” would sometimes be parsed as frame damage being mentioned. The rule engine checks the negation pattern first and short-circuits — eliminating this class of error entirely.

Problem 3: Timeout cascades

A single model timeout left the user with an error page. The deterministic layer means users always get a real receipt — AI upgrades it when available, but the receipt exists immediately regardless.

The deterministic layer handles the 80% of the signal space that doesn’t require inference. The AI chain handles the 20% that does: nuanced damage tone, routine lifestyle translation, repair cost breakdown when ARV is unknown.

Observability: opt-in pipeline tracing

To instrument the pipeline for debugging without affecting production latency, we added an opt-in trace system. Pass debug_trace: true in the request body and the pipeline captures timings and step logs, then persists them to pipeline_traces via a fire-and-forget write.

// lib/debug-trace.ts
export interface PipelineTrace {
  trace_id: string;
  pipeline: "receipt" | "auction" | "routine";
  created_at: string;
  total_latency_ms: number;
  steps: PipelineStep[];
  timings: Record<string, number>;
  meta: Record<string, unknown>;
}

// In route.ts — zero cost when disabled
const debugEnabled = body.debug_trace === true;
const trace = debugEnabled ? createTrace("receipt") : null;

// Before return:
if (trace) {
  finalizeTrace(trace, { verdict, fit_score, signal_count });
  persistTrace(trace); // fire-and-forget — never blocks response
}

Traces are retrievable via GET /api/admin/trace/:traceId (admin key required). This lets us replay a user’s exact pipeline run for debugging without any live-traffic impact.

What’s next

The two-layer pattern has proven stable. The areas we’re actively improving:

→ARV resolution pipeline: The four-phase ARV chain (listing comparables → VIN decode → depreciation curve → AI fallback) is the highest-variance component. Improving P1 and P2 hit rates lets us skip GPT-4o more often.
→Rule pattern expansion: The ~150-pattern signal set was built from analyzing real listing text. We continue adding patterns as we see new seller language patterns in production data.
→Routine fit calibration:The sigmoid center (62%) and multiplier values (0.85, 0.91, 0.93, 0.95) were set analytically. We’re building a feedback loop to calibrate them against real EV owner satisfaction data.
→Trace-driven content: The scenario generator in lib/content/scenario-generator.ts calls real functions and embeds live outputs into blog posts at build time — which means this post updates automatically when the scoring engine changes.

The core principle behind all of this: be deterministic wherever determinism is possible. Reserve inference for the genuinely ambiguous. Move the inference off the critical path. The result is a system that is faster, cheaper, and more predictable than an AI-first approach — and more reliable when models have outages or rate limits.

← All posts Try OFFO →