AI-Native Operations

Token Costs, Governance, Persistent Memory: The Three Scars of Running Virtual Employees

Veeral LakhaniMay 19, 20269 min readvirtual employees, governance, token economics

Three failure modes break most AI deployments before they reach scale. We learned each one running a suite of Virtual Employees inside Reliable Group.

Easy to say. Hard to do. That is the work.

Saying "we will deploy Virtual Employees" is a sentence anyone can write on a slide. Running them in production for two years inside a regulated business is a different thing. Three failure modes break most efforts before they reach scale. Most vendors do not talk about them, because most vendors have not run AI in production at scale inside their own operations. We have. We have scars on each. This post is what we learned, in the order we learned it.

Scar 1: Token costs

The first time you run a Virtual Employee in production, the unit economics look beautiful. Twenty cents a task. Pennies for routine work that used to take a human five minutes. The CFO is happy. The slide deck is convincing. Then the prompt structure changes. Or the system upgrades to a longer-context model. Or someone adds a few extra steps to handle an edge case. Or the volume scales from a thousand a day to ten thousand. Suddenly the bill is fifteen times what the model said.

We learned this the hard way running our own AI BDR. The first version of the prompt was structured to give the model maximum context for every contact, on every call. The token consumption per task was acceptable at low volume. When volume scaled and we added longer-context retrieval to handle relationship history, the per-task cost went up roughly four times, and we did not see it for two weeks because the bill is monthly. By the time we noticed, we had spent more on the BDR’s compute that month than we would have spent on a junior human BDR.

Three things changed after that. Prompt-structure economics became a design constraint, not an afterthought. Every Virtual Employee now goes through a prompt-structure review before it ships. Context that does not change per task gets cached. Context that changes per task gets retrieved on demand, not stuffed into every call. Surprise bills stopped being a thing. Every Virtual Employee runs against a monthly token-cost budget with a 70 percent alert and a 100 percent hard cap. Unit-economics models live before the Virtual Employee does. Before we instantiate a Virtual Employee for a client, we build the cost model. The client sees the numbers before signing. The system goes live with budget controls already in place.

The lesson: token costs are an engineering discipline, not an accounting line. Build it in or pay for the lesson.

Scar 2: Governance

The first time a Virtual Employee makes a wrong call in a regulated workflow, you have a serious problem. Not because the Virtual Employee was wrong. Wrong calls happen. Humans make wrong calls too. The serious problem is the answer to the question "who approved that." If the answer is "the Virtual Employee approved itself," the regulator is going to have a problem with you. If the answer is "we have a policy that requires human review but I cannot show you it ran on this case," the regulator is going to have a problem with you. If the answer is "here is the audit log showing the approval chain, the human reviewer, the timestamp, and the decision criteria," you are fine.

We learned this on our internal AI General Counsel, which triages every contract that comes through the company. The first version flagged risk levels and produced a recommendation. It worked well. Then we hit a contract where the recommendation was a low-risk pass and it should not have been. A human reviewer would have caught it. The system was not designed to require human review on low-risk passes. We caught the issue in a Friday review session, before the contract was signed. We were lucky.

We rebuilt the governance pattern after that. Every Virtual Employee now has, on day one, six documented elements. Scope. Owner. Approval path. Audit trail. Token-cost budget. Persistent memory architecture. The approval path matters most. For any output that touches a regulated workflow, a high-dollar decision, or an external commitment, a human owns the call. The Virtual Employee makes the recommendation. The human signs.

Three rules for governance that we now apply to every client engagement. The owner is named, on day one, before the Virtual Employee runs. Not the vendor. Not the platform. A specific human inside the company. The audit trail is engineered, not promised. Tamper-evident for regulated workflows. Retained for the regulatory retention period. The approval path is documented, sampled, and tested. Not just listed in a policy. Actually used. Tested quarterly with mock scenarios.

The lesson: governance is what makes a Virtual Employee a real role on the org chart. Skip it and you have a chatbot that produces output a regulator will eventually look at.

Scar 3: Persistent memory

A Virtual Employee that starts from zero every morning is a chatbot. A Virtual Employee that holds your business context across months of work is a system. The gap between the two is the difference between a pilot that never reaches production and an asset that compounds.

We learned this on our internal AI Recruiter. The first version was good at drafting job descriptions and summarizing candidate screens. It was bad at recognizing that the candidate it was screening on Tuesday had already been screened by a different Virtual Employee two months earlier and ruled out for a different role. The pattern repeated for weeks before a human noticed. We were doing duplicate work and producing inconsistent decisions because the system had no memory.

The fix was not "add memory." The fix was an architecture decision that took three months to ship. Persistent memory has to be defined storage, not chat history. A separate system that holds canonical facts about candidates, contracts, claims, customers. The chat history is volatile. The memory store is the source of truth. Retrieval has to be deterministic. When the Virtual Employee processes a new case, it looks up the canonical record first. Rule-based, not vibe-based. Update rules have to be explicit. When a Virtual Employee learns something new, the update is written to the canonical store with a timestamp and a source. Not to the model. Not to the prompt. To the store.

The lesson: persistent memory is the architecture decision that determines whether you are running a system or a demo.

How RG built scars on each of these

We run a suite of Virtual Employees inside Reliable Group every day. Our AI BDR surfaces deal signals from thousands of sources. Our AI CFO pressure-tests pricing decisions. Our AI Recruiter drafts requisitions, screens candidates, schedules interviews. Our AI General Counsel triages every contract. Our AI Marketing drafts content. Our AI Board of Directors sits monthly and challenges the leadership team on capital allocation. The suite grows every quarter. Every Virtual Employee has the six documented elements. Every one has been through the token-cost lesson, the governance lesson, and the persistent memory lesson. The patterns we ship to clients today are the patterns we tested on ourselves first.

What to ask any AI vendor before signing

Five questions. If the vendor cannot answer all five with specifics, the engagement is going to fail in one of the three ways above. 1. Show me your token-cost model for this workflow. 2. Who is the named owner of the Virtual Employee’s output, and what is the approval path? 3. Where is the audit trail stored, what is in it, and how long is it retained? 4. What is the persistent memory architecture? 5. What does this look like inside your own operation?

Easy to say. Hard to do. That is the work.

Start with the Blueprint

Get More Insights Like This

The GCC Briefing delivers weekly insights on building and running India operations.

Keep Reading

Ready to Build Your India Team?

Book a 30-minute strategy call. We will walk through your situation and tell you honestly whether a GCC is the right move.

Start the Blueprint

Talk to Our Team

400+ ClientsUS-HeadquarteredSince 19716 India Cities

Token Costs, Governance, Persistent Memory: The Three Scars of Running Virtual Employees

Scar 1: Token costs

Scar 2: Governance

Scar 3: Persistent memory

How RG built scars on each of these

What to ask any AI vendor before signing

Get More Insights Like This

Related Articles

The AI-Native Org Chart: What Actually Changes in the 30→6+14 Redesign

Majority Recap AI Strategy: Revenue Multiple vs. EBITDA Multiple

The Mid-Market Family Business Is Not a Small PE Deal

Ready to Build Your India Team?