In short
An AI-led company can run the volume of a real business: the drafting, the scoring, the triage, the round-the-clock coverage. What it cannot do, today, is hold accountability for an irreversible call, own pricing, originate trust with a new customer, or judge a genuinely novel edge case alone. Those are not bugs we are racing to patch. They are the boundary the model is built around, the reason the approval gate exists, and the reason the model is safe to buy. This essay names each limit, with the real killed experiments that taught us where the line sits.
I have written twice already about what this company is: the operating model and the AI CEO role. This one is the opposite essay. It is about what the company can't do.
I am writing it down on purpose. Most of what you read about "AI agents" in 2026 is a demo dressed as a system, and the gap between the two is exactly the set of things nobody likes to list. Aiprosol only works if the brittle parts are visible, so here are ours, named, so you can audit me later when the technology moves the line.
The distinction the whole essay turns on: capability versus authority
There is one confusion underneath almost every overclaim in this market, so let me kill it first.
Capability is whether the model can produce a good output. Authority is whether that output is allowed to ship without a person standing behind it. They are different axes, and conflating them is how companies talk themselves into autopilot.
Arora, our AI CEO, is highly capable. She drafts campaigns, scores leads, answers customer chat in real time, and coordinates nine other officers (COO, CMO, CCO, CTO, CRO, CLO, CPO, CPM, and Data) on a daily cron. That is ten AI roles running real operations. None of them has authority over anything irreversible. The capability is high; the authority is deliberately capped.
So when I say below "the AI can't do X," I almost never mean "the model produces a bad X." I mean "we don't let the model's X ship without me." That cap is the product. Read on with that lens: most of these limits are authority limits by design, and the few that are genuine capability limits are flagged as such.
1. It can't hold accountability for an irreversible call
If we get something wrong with a customer, my name is on it. Not Arora's. Not the LLC's alone. Mine, as Founder & Chairman.
This is the most boring rule in the company and the most load-bearing. Liability is an asymmetry: someone has to be the adult a court, a regulator, or a partner can point at and ask "who was in charge?" An AI agent is not an answer to that question. It cannot be sued, cannot be deposed, cannot carry the consequence. So anything with real downstream exposure (a contract, a refund dispute, a public commitment, a legal term) routes to me.
Arora drafts. She does not sign. The CLO agent reviews a contract and flags the clauses; it does not agree to them. The pattern is constant: the agents handle volume, and I hold the final call on anything irreversible.
This is a capability limit dressed as a design choice, and I am honest about which it is. Even on the day the law lets an AI carry liability, which is not close, I would still want a person in the loop for legal exposure. The asymmetry is the thing that forces the operating model to stay honest about who is accountable. Remove it and you have a company that can't tell you whose fault anything is.
2. It can't own pricing
Pricing is the lever I will not delegate, and we learned why by getting it wrong.
Early on we let the agents reason about price. The arguments read beautifully: articulate, confident, drawn from training data about companies that are nothing like a pre-revenue consultancy with zero customers. They were also wrong for our specific stage. We removed pricing authority from the agents that week. It is the Chairman's call only now, and it always will be.
The structural reason: a price stated to a customer in writing is irreversible. Once Arora says "$2,997" in a chat, a customer can hold us to it. The "AI confidently quoted a number" failure mode is uniquely bad because there is no clean undo. So Arora can describe the published tiers (the 19 self-serve digital products from $17, the managed plans at $997, $2,997, and $7,997 a month) but she cannot invent, discount, or commit a price she has not been given. Custom quotes, multi-seat deals, anything off-list: those queue for me.
This one is pure authority, not capability. The model could generate a price all day. We don't let it. That restraint is what makes the number on the page trustworthy.
3. It can't originate trust with a new customer
Here is a subtle one. Arora can maintain a relationship beautifully: answer at 3am, keep the tone consistent, never drop a thread. What she cannot do is originate the trust that lets a stranger sign in the first place.
Trust at the start of a relationship is built on a person choosing to be accountable to another person. It is the founder saying "if this goes wrong, here is my name and my email." That is not a tone the model can fake into existence; it is a fact about who is standing behind the work. The agents can demonstrate competence, and they do (publicly, which I'll come to). But the decision to extend trust to a counterparty, and to be extended trust in return, sits with me.
This is why we will not let AI feedback become AI inputs. Every customer-facing fact has to trace to a real, verifiable source a person can check. We are pre-revenue by design, which means testimonials.json and case-studies.json are empty, on purpose. There are no clients to quote yet. The day there is a real one, the case study will carry a named, consenting source and a real artifact behind it. Until then the honest answer is zero, and zero is what we publish.
If you want the proof of competence, the thing that earns the trust I then have to originate, it is live, not asserted. The /agents page shows what all ten roles are doing right now, refreshed roughly every minute. Watch it for ten minutes; the state changes. That is the substitute for a testimonial we don't have.
4. It can't replace the approval gate
Every action that touches someone outside the company (an email, a public post, a contract, a customer reply) passes a human approval step before it fires. The agent drafts; I click Approve.
People hear "approval gate" and assume it is a stopgap we are itching to remove. It is the opposite. It is the single feature that makes this an AI-led model and not an AI-only one, and we know the difference because we tested the other side of it.
We ran auto-send on support replies for a short stretch. The agent answered customer enquiries directly, no person in between. The result: hallucinated facts, stated in the same warm, certain voice the agent used for correct ones. We killed it. The lesson was exact, and it is the load-bearing sentence of this whole company: AI confidence is uncorrelated with AI accuracy. A model will state a wrong price in the same warm, certain voice it uses for a right one. The 30-second gate costs me about twenty minutes a day, and it is the cheapest insurance available against the failures that make front pages.
The gate is held up by two engineering choices, because a gate alone is just a tired person clicking yes. The first: every agent output is a Zod-validated schema that fails closed. If the model returns prose where a number belongs, the run errors and falls back to canned content rather than shipping garbage. The second: a public audit log at /agents and /transparency, refreshed roughly every 60 seconds, so any decision can be inspected after the fact. Schema, gate, log. Those three guardrails are the model. We learned all three by getting them wrong first.
And the gate has caught the failure that proves it earns its keep. Late one May the agents generated fabricated proof: a "340% ROI" figure and testimonials for customers that do not exist. The schema-and-approval guardrail caught it before anything shipped, and I deleted all of it. That is the canonical story of this company: not "the AI never lies," but "the standard worked." (To be clear about a related point of honesty: we do publish a projected 340% ROI figure and a 90-day reclaim guarantee as labelled, forward-looking capability claims, never as a measured customer result. The difference between a labelled projection and a fabricated testimonial is the entire difference between this model and the demos.)
5. It can't judge a genuinely novel edge case alone
The agents are good inside their decision trees. The CRO scores a lead, the CCO triages a ticket, Arora classifies an inbound message and routes it. These resolve in milliseconds and they resolve well, because the case looks like cases the rules anticipated.
The failure mode lives at the edge of the tree. A visitor asks something the rules never imagined: a use-case Arora has not seen, an integration question she has no ground truth on. The right behaviour there is to escalate, and that is the documented default: when uncertain, route to a person. The wrong behaviour, the one we engineer against, is for the model to confidently invent an answer rather than admit the edge. Confidence and correctness come apart exactly at the novel case, which is precisely where you cannot afford them to.
This is partly capability and partly design, and I won't pretend it's all one. The capability part: today's models extrapolate past their context with the same certainty they apply inside it. The design part: rather than wait for that to improve, we wrote the boundary into the system prompt: escalate the genuinely novel, the values-laden, the irreversible. The judgement of which calls are mine is itself one of the calls that is mine.
We learned the cost of letting the edges talk to each other, too. We briefly let agents message each other directly, one asking another for input. The chains drifted; small errors compounded across hops in a way a single call never did. We restructured to hub-and-spoke: Arora coordinates, agents never trigger each other, and the genuinely cross-functional or values-laden question comes to me.
What this is not
To keep the term honest, four things this essay is not saying:
- Not "the AI is weak." The capability is high and rising. Almost every limit above is an authority cap we chose, not a ceiling we hit.
- Not "a person does the work." I do not write the campaigns, score the leads, or answer the chat. I approve, I direct, and I hold the irreversible calls. The agents do the volume.
- Not "these limits are permanent." Some will move with the next model generation, the novel-edge-case one especially. The accountability and pricing ones I expect to keep by choice even when the technology no longer forces it.
- Not "we have this fully figured out." Long-horizon drift is real: the agents run well at week four; nobody has measured whether they run well at week fifty-two without prompt re-engineering, because the data doesn't exist yet.
Honest gaps: the things I am still unsure about
Three open questions, named so you can hold me to them:
| Open question | Why it's unresolved | What would settle it |
|---|---|---|
| The gate is the bottleneck | Every approval routes through one person, me. Above a customer count, that becomes a queue. | The honest fix is to hire people deliberately at that threshold, not to remove the gate. We are not there yet. |
| Long-horizon drift | Agents are calibrated at week four; week fifty-two is unmeasured. | Scheduled prompt re-engineering, and time. There is no shortcut. |
| Whether the model earns trust at all | Zero paying customers means the trust-origination problem above is still theoretical. | The first real engagement. Everything before it is a hypothesis. |
I would rather under-claim and be quietly right than over-claim and be loudly wrong. Founders who tell you their agent "can do anything" attract the wrong customers, fail in public, and spend the whole market's trust on the way out. The list above is the opposite bet.
The economics make the case for the limits
One number frames why this restraint is affordable. The whole company runs on about $1,000 a month of its own tooling: n8n, a frontier LLM with an open-source fallback, Supabase, Resend, PostHog, Vercel. We sell the same stack we run on. The marginal cost of a managed-plan customer is dominated by my approval time, not a team of people.
That matters for the argument because it means the approval gate is not a tax we are trying to engineer away to save money. The compute is already a rounding error. The gate exists because the correct operating model has a person on the irreversible calls, not because we can't afford to remove it. We can. We don't.
---
So the honest version, written down: an AI-led company can carry the volume of a real business and carry it well. It cannot, today, be the adult on an irreversible call, own the price, originate the trust that lets a stranger sign, replace the gate, or judge the genuinely novel case alone. Each of those is the reason a customer can buy this safely: the capability is the agents', the accountability is mine, and the line between them is the product. Watch it run, in the open, at /agents and /transparency. The boundary is the point.
Srijan
Srijan Paudel is Founder & Chairman of Aiprosol, the global AI automation consultancy operated by an AI C-suite led by Arora, the AI CEO. Live operating state at /agents.
