Skip to main content
Kuratr
Independent process-risk audit · Published 2026-05-25

AI agent knowledge that pays back in under 3 months.

We paid three outside experts to grade the money case for Pensiv. They compared it two ways: against having no memory at all, and against the basic memory most AI products ship today. Here is what they found — and what we'll put in writing.

$950K
Net present value per mid-ACV deployment.12% discount rate · 3-year horizon · P50 of 27-persona Monte Carlo
2.6 mo
Payback period at the median customer.P50 mid-ACV · P95 worst-case 6.7 months
56.7%
Process-risk reduction validated by three independent expert raters.Median across raters · range 7.9%–76.2%

All three figures replicate from the canonical 8-document v3 study. Ask us for it before you cite us.

Why we ran this — and why we paid three outside raters to grade it

Every AI-memory company in 2026 has a slide deck. Most have a benchmark. Almost none have run the hard money test on themselves — the kind a finance chief would actually accept — and shown their work.

So we did. We ran the same check a careful buyer runs before signing a big contract. We modeled the costs and savings across twenty-seven realistic customers. We listed fourteen ways the product could fail. We found the one factor that changes the result most. And we worked out the dollar value over three years.

Then we did the thing nobody else does. We sent the list of failures to three independent experts and asked them to score it blind — without seeing our own marks. The headline number — 56.7% less risk — is the middle of their three scores. Not the best one. Not ours.

We didn't grade our own homework. Three independent expert raters scored the failure-mode catalogue blind. The number you see is the median, with the range published alongside it.

The full eight-part study is available on request. Every assumption is named. Every source is cited. Every claim of less risk shows the risk that's left over too — including one we own up to about ourselves. Hiding it would be exactly the kind of behavior this whole exercise was meant to catch.

The dual baseline — vs no memory, and vs commodity AI memory

Comparing against just one thing is how companies hide the real answer. "Versus no memory at all" makes anything look like a win. "Versus the next-best product" hides whether memory is worth buying in the first place. The honest way is to show both at once.

Monthly savings per seat — both baselines, P50 of 27-persona Monte Carlo
BaselineSavings per seat per monthWhat it represents
Running with no memory at all$910The McKinsey / Panopto reconstruction tax — knowledge workers rebuilding what was already known.
Running on commodity AI memory$426–$591Incremental savings above the five most-deployed memory products, after adjusting each for its published junk-memory rate.

Both numbers are large enough to justify the contract. The second is what matters: it answers the question every buyer eventually asks — "isn't memory already commoditized? What am I paying you for that I can't get from the open-source library?"

Net present value by deal size and discount rate

Finance chiefs don't buy on monthly savings. They buy on the total value over a few years, in today's dollars. Below is the same result, shown across three deal sizes and three ways of counting future money, over three years.

Bar chart of net present value by annual contract value tier — small, mid, and high ACV — at three discount rates (8%, 12%, 18%) over a three-year horizon. Mid-ACV at 12% lands at $950,000 P50.
Net present value by deal size × discount rate × three-year horizon · P50 of 27-persona Monte Carlo
Three-year NPV by deal size · 12% discount rate · P50 · per seat-bundle
Deal sizeNPV P50Payback P50Notes
Small ACV ($35K)positive~4 months~20% of personas land below break-even in the worst tail — the kill criterion below catches this.
Mid ACV ($125K)$950,0002.6 monthsThe headline tier — the customer we're selling to.
High ACV ($300K+)multi-million<2 monthsEnterprise multi-seat — savings scale super-linearly with seat count.

The small-ACV tail is the honest disclosure. About one in five personas in that tier do not clear the hurdle in the three-year window. We tell you up front — and the kill criterion below names the exact buyer signature where pensiv is the wrong product.

The twelve commitments — pre-agreed criteria you can hold us to

Most vendors only publish what they are good at. This study publishes twelve pre-agreed criteria that, if any one of them fails in your pilot, are the explicit signal to walk away from pensiv. They are written into the master services agreement on offer. The audit log is open to inspection.

  • K1 · Customer fitLoaded hourly rate of the rescued workflow is below the floor where pensiv pays for itself. Pilot reports this in week one.
  • K2 · Junk rateMemory junk-rate after sixty days exceeds the commodity baseline pensiv claims to beat.
  • K3 · Retrieval precisionCross-source retrieval precision below the rate disclosed for your vertical's reference architecture.
  • K4 · Over-trust eventsAny pilot incident where the system surfaces a high-confidence answer that turned out to be wrong, with no surfaced caveat.
  • K5 · Decision-grade briefAuditor cannot trace every claim in the brief back to its source record within the click budget.
  • K6 · Forgetting curveImportance decay produces a measurable false-negative rate above the published threshold over the pilot window.
  • K7 · Cross-source bridgePilot cannot demonstrate a same-pattern match across two source systems with zero shared vocabulary.
  • K8 · Source-trail gapAnything gets stored without a complete record of where it came from.
  • K9 · Audit failThe published audit-trail capabilities cannot be re-derived by an outside auditor at pilot exit.
  • K10 · Right-to-forgetA deletion request cannot be propagated across all federated sources within the published time-to-forget budget.
  • K11 · Cost overrunTotal cost of ownership exceeds the pilot quote by more than the disclosed contingency band.
  • K12 · Risk reduction floorThree-rater process-risk reduction on your workflow lands below the published floor of the range disclosed in the master study.

The full text of each criterion — measurement protocol, evaluation window, dispute resolution — is in the v3 study, available on request. The thirty-day pilot offer at the bottom of this page operationalizes them.

Eight questions to ask any AI-memory vendor

If you are evaluating pensiv against any other vendor, these are the questions that surface whether the other product has done this work or has only built the slide deck. Use them on us. Use them on everyone.

  1. What is the dual-baseline savings rate?Show the number against "no memory" and against the commodity AI-memory tier, both with sources.
  2. What is the junk-memory rate at the sixty-day mark? Every memory system accumulates noise; the question is how fast.
  3. What is the highest residual risk after deployment? If the vendor cannot name one, the audit was not done.
  4. Who scored the failure-mode catalogue? A vendor scoring their own catalogue is not an audit. Ask for the inter-rater agreement number.
  5. What is the net present value at your cost of capital, over your defensible horizon? "Positive ROI" is a meaningless answer.
  6. What is the payback period at the worst-case decile of your customer set? P50 is where you want to live; P95 is where you will live.
  7. What are the published kill criteria?If the answer is "there aren't any," the vendor is not yet selling.
  8. Is the audit trail re-derivable by an outside auditor without vendor cooperation? Anything else isn't a real trail.

If pensiv ever stops being able to answer any one of these with documented evidence, that is a kill criterion you should hold us to.

The thirty-day pilot offer

Capability pilot · 30 days · with the kill criteria pre-agreed

The pilot runs on a documented measurement protocol — same protocol the master study used. The twelve kill criteria above are written into the master services agreement before kickoff. The audit log is yours to keep at exit. If any criterion fails, the pilot exits with no further obligation.

We do this because the worst outcome for both sides is a year-long deployment where the buyer cannot tell whether the product paid for itself. The pilot resolves that question in thirty days, against numbers we put in writing on day one.

Methodology — in plain English

What we ran. We modeled the costs and savings for twenty-seven realistic customers, across twelve industries and four company sizes. Every number we fed in came from a named, public source. They are all listed in the study.

Who scored the risk. Three independent experts — two AI engineers and one senior reviewer — each scored our list of fourteen possible failures. They worked blind, without seeing our marks, using a standard scoring method from car safety. The 56.7% figure is the middle of their three scores. We publish the full range, 7.9% to 76.2%, right next to it.

What matters most.One thing changes the result more than anything else: how much an hour of the rescued work is worth. A team paying $500 an hour for expert time saves six times more than a team paying $95. We're honest about who the numbers fit best.

What we admit about ourselves.Pensiv adds one new risk that wasn't there before: people can trust a polished AI brief too much. All three experts agreed on this (they just disagreed on how big it is). Our plan to handle it is in the study.

Request the full studyRequest a 30-day pilot