AI evaluates. Deterministic systems execute. Humans govern.

The design discipline that will define the next decade of enterprise AI architecture.

Picture a renewal flow in a regulated business. A capable model reads the account, understands the customer’s intent, decides a retention offer is warranted, and applies it. It also, in the course of being helpful, rephrases a required disclosure to sound friendlier, drops a consent step that felt redundant, and writes the change to the system of record. Every individual decision was reasonable. The customer was happy. And the company just shipped a compliance violation at machine speed, with no clean record of how it happened.

Nothing in that story is a failure of intelligence. The model was good at its job. The failure is architectural. A probabilistic system was handed an irreversible, regulated action and allowed to improvise it. That is the mistake the next decade of enterprise software has to design out, and the principle that designs it out is simple enough to state in one line.

AI evaluates. Deterministic systems execute. Humans govern.

We think this is becoming the organizing axiom of the AI stack. Not a slogan, a division of labor. Each clause names a different kind of work, and the systems that endure will be the ones that keep those three kinds of work cleanly separated.

The first clause is already settled

There is no point arguing about whether AI evaluation spreads everywhere. It does, and it should. The evidence is overwhelming and it compounds every quarter. As Ashu Garg of Foundation Capital has noted, AI now writes the large majority of new code at Google and well over ninety percent at Anthropic. The first-year tracks that used to be safe on-ramps, in programming, banking, consulting, and law, are being absorbed into model output. This is the zeitgeist, and nothing stops it.

We want to be unambiguous here, because the rest of this piece pushes in a direction that gets misread as caution. We are not AI skeptics. Evaluation, judgment, interpretation, synthesis, the reading of a messy situation and the proposal of what to do about it, is exactly what these models are becoming extraordinary at. Anywhere the job is to assess, recommend, classify, draft, or decide what should happen, AI belongs in the loop and increasingly belongs at the center of it. That half of the axiom is not in dispute.

The interesting question is what happens at the moment of action.

The clause everyone skips

The reflexive reading of the axiom is “keep a human in the loop.” That is not the point, and it is not the hard part. The hard part is the middle clause, the one most people read past on the way to the comfortable conclusion.

The danger is not AI making judgments. The danger is a probabilistic system being trusted to execute consequential, irreversible actions directly. The discipline is to separate the judgment from the act. AI evaluates the situation and proposes the action. A deterministic system carries out the action inside hard constraints, so that what executes is verifiable, repeatable, bounded, and auditable. The act becomes the same every time the inputs are the same, and you can prove afterward exactly what ran and why it was allowed to.

This is not distrust of model capability, and that distinction matters. A model that is 99.9 percent reliable is still the wrong thing to put in charge of an irreversible action at scale, because the failing tenth of a percent is unbounded. You do not know in advance which case it is, and you cannot take the action back. Determinism is not a verdict on how smart the model is. It is a decision about where you are willing to place irreversibility. You put the irreversible step behind a system whose behavior you can guarantee, and you let the model do everything up to that line.

Said plainly: let the model be brilliant about deciding what to do, and let a deterministic system be boring about doing it.

Engineers already bet their lives on this

The axiom sounds novel only because AI makes it feel new. In truth it is the codification of something mature, high-stakes fields have practiced for decades. The pattern predates the technology by a long way.

Consider payments. A ledger is deterministic by construction: double-entry, idempotency keys, exactly-once settlement. AI can classify a transaction, draft the journal entry, and flag the fraud. The posting itself is deterministic, because nobody wants a probabilistic account balance. The model advises right up to the ledger, and the ledger does not improvise.

Consider modern infrastructure. Tools like Terraform and Kubernetes are declarative systems that reconcile the world toward a desired state in a deterministic, convergent way. A coding assistant can author the manifest, and increasingly does. The control plane executes it. The intelligence sits in authoring. The guarantees sit in execution. The two are deliberately not the same layer.

Consider aviation. Flight control laws are deterministic and formally certified to standards like DO-178C. AI can advise on weather, routing, and traffic. Envelope protection, the logic that keeps the aircraft inside its safe operating limits, executes deterministically, and the pilots govern, with authority to override. The three-way split is almost exactly our axiom, written in an industry where the cost of getting it wrong is measured in lives.

Consider access control. Authorization is deterministic policy evaluation. AI is excellent at detecting anomalies and recommending policy changes. You still do not want a language model probabilistically deciding, in the moment, whether to grant root. Enforcement is deterministic on purpose. This is the same instinct that produced the recent wave of “systems of decision” thinking in security, which spun directly out of Foundation Capital’s work: detection and recommendation are probabilistic, but the enforced decision has to be something you can stand behind.

In each case the field did not arrive at determinism out of nostalgia or distrust of automation. It arrived there because the action was irreversible and the cost of an unbounded error was unacceptable. Determinism is where these systems chose to be certain.

Where the pattern stops being optional

So far, determinism is an engineering choice, adopted because it is wiser. In regulated customer-facing workflows, it stops being a choice. It becomes the law.

This is the sharpest edge of the whole pattern, and it is where the stakes change character. In most domains a paraphrasing, improvising model is a quality risk. In a regulated transaction it is a liability, because the regulator does not grade on reasonableness. The rule is the rule, and the proof that you followed it is part of the product.

A few concrete surfaces:

Disclosures and consent in financial services.

Regimes like Truth in Lending and E-SIGN require that specific language be presented, in a specific way, with consent captured and timestamped. AI can interpret the customer’s intent and route the conversation beautifully. The disclosure and the consent capture have to be deterministic and identical every time. A model that “improves” the wording of a mandated disclosure has not improved the experience. It has manufactured a violation.

Insurance quoting and binding.

Under filed-rate rules, the rates a carrier may charge are the ones approved by the state, applied exactly. AI can guide, pre-qualify, and explain. The premium calculation and the bind must execute deterministically against the filed rates, and an adverse action, a denial, has to come with specific, auditable reasons, not a model’s freeform paraphrase of why it felt like a no.

Telecom cancellation and number porting.

Rules govern how a cancellation must run and the windows in which a port must complete. AI can personalize the entire conversation. It cannot quietly reshape the regulated action or introduce the kind of friction the rule exists to forbid.

Identity, KYC, and AML

The model scores risk and surfaces suspicious patterns extremely well. The decision to place a hold, fail verification, or escalate has to follow documented, deterministic rules, because the examiner wants to see the rule that fired, not a sentence that begins “the model found it suspicious.”

The through-line is the same in every one of these. AI is genuinely excellent at understanding the customer and deciding what should happen. But the completion of the action, the disclosure actually shown, the consent actually captured, the transaction actually written to the system of record, the audit trail actually produced, has to be deterministic and verifiable. Otherwise it is neither compliant nor defensible, and in a regulated business those are not separate problems.

The honest counterargument

The strongest objection deserves a real answer, not a strawman. It goes: models are getting more reliable every month, so why bottleneck them behind a rigid execution layer? Why not let a sufficiently good model just do the thing?

Two reasons, and neither is about the model being dumb.

  1. First, the properties you need at the moment of action are not properties of intelligence. They are verifiability, reproducibility, bounded liability, and accountability. A more capable model does not give you any of these for free. A system can be smarter and still be something you cannot replay, cannot bound, and cannot defend. Reliability raises the average. It does not cap the worst case, and irreversible actions are governed by the worst case.
  2. Second, in regulated flows the standard is categorical. “The model was right almost every time” is not a defense an examiner accepts. The control either fires deterministically and provably, or it does not exist as a control. No amount of additional model quality converts a probabilistic action into a provable one. Only architecture does that.

This is why we frame determinism as a placement decision rather than a limitation. You are not holding the model back. You are deciding, deliberately, which step in the workflow is the one you refuse to leave to chance.

What governance actually means

The third clause, humans govern, is the one most likely to be sentimentalized into “humans stay in the loop to feel better.” That undersells it.

Governance is ownership of the boundaries. Humans decide which actions must be deterministic, what the constraints are, which exceptions are permitted and who may grant them, and who is accountable when something goes wrong. The model proposes, the deterministic layer executes within those boundaries, and humans own the boundaries themselves. In his recent advice to new graduates, Garg made the same point from the other direction: AI can complete tasks, but it cannot decide which tasks are worth doing, how they fit together, or whether the output is any good. Judgment, taste, and direction stay out front. That is governance.

Governance also has to scale, and this is where Foundation Capital’s most useful contribution comes in. Their thesis on decision traces and context graphs is, at heart, a theory of how governance becomes durable. Enterprise systems record what happened, the final price, the approved discount, the escalated ticket, but almost never why it was allowed to happen: which exception applied, which precedent governed, who approved the deviation. Capturing that reasoning as first-class data is what turns a one-off human judgment into reusable, auditable precedent. A deterministic execution layer is the natural place to capture it, because it sits in the execution path at the moment of action, exactly where the “why” is still visible. Determinism and decision traces are two halves of the same governance story. One makes the action provable. The other makes the reasoning behind it searchable.

The completion and compliance layer

Put the three clauses together and an architecture falls out of them. Evaluation belongs to the model. Execution belongs to a deterministic layer that completes the regulated action and proves it. Governance, the boundaries and the accountability, belongs to people, supported by a durable record of how decisions were actually made.

This is the layer we work on at Callvu, and we will leave it at that, because the argument does not need us to be the conclusion. The conclusion is structural. If you run customer-facing workflows in telecom, insurance, financial services, or healthcare, you cannot let a probabilistic system freewheel through a regulated transaction. You need something that completes the action deterministically and can show its work. Whether you build that or buy it, the requirement is the same.

The discipline of the next decade

The first clause of the axiom is the zeitgeist, and it will take care of itself. AI evaluation is going into everything, and that is good. The discipline, the part that separates systems you can trust from systems that merely impress, lives in the other two clauses. Decide where you will be certain. Keep the act separate from the judgment. Own the boundaries, and keep a record of why each one is where it is.

AI evaluates. Deterministic systems execute. Humans govern. It reads like an observation. It is really an instruction, and the companies that take it literally are the ones that will still be standing when the novelty wears off and only the architecture remains.

Building AI workflows in regulated industries? If you’re grappling with how to let models evaluate brilliantly while keeping execution deterministic and humans in governance, contact Callvu.

Facebook
Twitter
LinkedIn

Get the latest content straight to your inbox.

Callvu How Customers Feel About AI in Customer Service CX Research

How will customers feel about AI in your customer service?

Many companies are rushing to offer AI assistants and other AI-powered tools in their customer service. But are consumers ready?

Callvu How Customers Feel About AI in Customer Service CX Research

How will customers feel about AI in your customer service?