What actually has to change going from a Claude prototype to production?

Four things change predictably: the data layer (usually a fiction held in browser state or a hardcoded array, replaced with a real schema), the auth boundary (almost always wrong, replaced with server-side authentication and row-level security), the error path (silent by default, replaced with caught, logged and recoverable errors), and the cost model (unbounded until the bill arrives, replaced with rate limits, caching and budget caps). The screens usually stay; the foundation underneath gets rebuilt.

Is a working AI-built prototype the same as a working product?

No. The prototype passes the demo. The product has to survive real traffic. AI coding tools optimise for the demo: a single session, a single user, friendly inputs, no concurrency and no failure cases. What the prototype proves is the idea and the screens your team wants, and that is a real asset worth keeping. What it does not prove is that the system can hold more than one user, more than one record, or more than one Tuesday's worth of traffic.

What is the most common security flaw in a Cursor or Claude prototype?

API keys in the frontend bundle. We have found this on every Cursor-built prototype we have been called in to review. Not most. Every. View-source on the deployed app and search for sk-, Bearer, or the service name, and the key is there in plain text. The fix is to move the key to a small backend so it never leaves the server. This is usually the first thing we ship, often inside the first week.

Does a small internal tool need the same work as a public SaaS app?

No. Production is not a single bar. For a tool under 100 users inside one firm, the four changes (schema, auth, error handling, capped costs) plus backups and a staging environment are most of the work. A tool with thousands of paying customers needs that floor plus horizontal scaling, a deployment pipeline, multi-layer monitoring, an on-call rotation and often a SOC 2 or ISO 27001 path. The work scales to the risk. Most firms calling us need the floor.

How long does it take to make an inherited prototype safe to use?

The first two weeks always look the same. Week one is the read: we open the codebase, the database, the deployed app and the browser dev tools, and answer the four questions about data, auth, errors and cost. Week two is the first wave of fixes: secrets moved to the server, a real auth layer, a real schema behind the screens with existing data migrated, row-level security turned on, basic error logging and alerting, and capped API spend. After two weeks the prototype is no longer wide open.

From Claude prototype to production: what actually has to change

A working prototype built in Cursor, Claude or v0 is not a working product. The prototype passes the demo. The product has to survive real traffic. The four things that change between the two are predictable: the data layer (missing), the auth boundary (almost always wrong), the error path (silent by default), and the cost model (unbounded until the bill arrives). We’ve been called in to fix several of these over the last year for operator-builders across accounting, legal, wealth and trades. The pattern is consistent enough to publish.

This post is for the firm owner who has a tool a non-engineer built. It works on their laptop. The team is already using it. Nobody has the stomach to keep prompting until it becomes a system the business can run on.

What the prototype actually proves

The prototype proves the idea. That is a real asset. Most software fails at the idea stage, not the engineering stage, and the prototype is the cheapest way to find out whether the workflow even makes sense. We do not dismiss the work that produced it. The Claude session, the Cursor file tree, the v0 screen layouts: those are product decisions, and they are usually decent ones.

What the prototype does not prove is that the system can hold more than one user, more than one record, or more than one Tuesday’s worth of traffic. The reason is structural. AI coding tools optimise for the demo. The demo is a single session, single user, friendly inputs, no concurrency, no failure cases. The model writes the code that makes the demo work. Anything outside the demo is out of scope by default, and the model does not know to flag it.

Before any rebuild starts, we run a one-day read of the prototype. We answer two questions. What did the prototype actually prove about the workflow? And which of the four things below is missing or broken? The first answer is the asset. The second is the scope.

What actually changes going from prototype to production

The four changes are not a checklist of features. They are the four places where prototype thinking and production thinking diverge, in the order we hit them.

1. The data layer (missing)

In a working production system, the database is the spine. Every screen reads from it, every action writes to it, and the schema is the place where the rules of the business live. In a Claude or Cursor prototype, the database is usually a fiction. The data is held in browser state, in a local JSON file, in a Supabase table the model spun up to make the demo run, or in a hardcoded array that someone forgot to replace.

The gap is in modelling, not in coding. The prototype was built around a screen, and the screen was built around an example record. The relationships between records, the constraints, the foreign keys, the indexes: none of those exist yet, because the prototype never had two records of anything that needed to relate to each other.

What changes in production: a real schema. Tables that map to the actual things the business tracks. Keys that prevent duplicates. Constraints that keep bad data out. An audit table that records who changed what and when, because firms in regulated industries are asked for that record sooner or later.

We’ve rebuilt the data layer for several inherited prototypes in the last year. The pattern is the same: keep the screens, throw out the data shape, model the schema properly, and re-point the screens at the new database. The screens usually need only minor changes once the database is real, because the product decisions in the screens were sound. The data layer was the gap.

2. The auth boundary (almost always wrong)

In a production system, the authentication boundary is the wall between what one user can see and what another user cannot. Every request crosses it. Every record has an owner. Every action checks who is doing it. In a prototype, the auth boundary is usually a costume.

The most common shapes we see:

A hardcoded admin login, sometimes with the password in the source.
A check that runs in the browser, which means anyone with a browser can turn it off.
An API key for OpenAI, Anthropic or Stripe pasted into the frontend code, which means anyone who opens dev tools can read it and run up the bill.
A Supabase or Firebase project with the row-level security disabled, because turning it on broke the demo and nobody turned it back on.
An endpoint that returns every customer’s records when the URL is hit directly, because the screen filters the records on the client side.

We reviewed a Cursor-built internal tool earlier this year where the API keys were in the frontend, the customer data flowed through an unauthenticated endpoint, and the tool was already in daily use by a small team. The tool worked. It was also wide open. The team building it had no engineering background and no way to know what they were looking at. The model that wrote the code did not flag it, because the demo worked.

What changes in production: real authentication, with sessions that expire, password reset, multi-factor for anything sensitive, and single sign-on where a team already runs Microsoft 365 or Google Workspace. Role-based access that runs on the server, not in the browser. Secrets in environment variables, not in the bundle. Row-level security on the database, tested. An audit trail of logins and privileged actions.

The auth boundary is the change that matters most in regulated industries. An AFSL-licensed wealth firm, a legal practice, an allied health provider: any of them running an AI-generated tool with a broken auth boundary has a compliance exposure they cannot defend in front of a regulator. We always fix this first.

3. The error path (silent by default)

In a production system, when something goes wrong, somebody finds out. The error is logged, surfaced in monitoring, sometimes routed to an on-call person, and the user is told something useful. In a prototype, errors are silent by default. The model wrote the happy path. The unhappy path was out of scope.

The shapes we see:

A failed write that returns success because the catch block swallows the exception.
A network error that leaves the screen showing stale data, with no indication anything went wrong.
A field that arrives in the wrong shape, gets coerced to undefined, and quietly disappears from the next save.
A background job that crashes silently and never restarts. The next time anyone notices, six weeks of data is missing.
Logs that go to the developer console of whoever happens to have the page open, which is nobody once the demo is over.

The prototype passes the demo because the demo runs on clean inputs. Real users supply messy inputs. They paste content with smart quotes. They lose internet halfway through a save. They have two browser tabs open and submit the same form twice. The prototype was not built to survive any of that.

What changes in production: a real error model. Errors are caught, logged, surfaced, and recoverable. The user sees a message that helps them. The team sees an alert that tells them what happened. The data does not silently disappear. There are tests for the unhappy path, not just the happy one.

4. The cost model (unbounded until the bill arrives)

In a production system that calls a paid API (OpenAI, Anthropic, Stripe, Twilio, Google Maps, anything metered), there is a budget, a meter, and a stopper. In a prototype, there is none of those. The model wrote the code that calls the API. It did not write the code that limits how often.

The shapes we see:

A loop that calls Claude or GPT-4 once per record, with no rate limit, no cache, and no budget cap. A user hits a button that processes a thousand records and the bill is in the hundreds of dollars before the page finishes loading.
A retry that fires every five seconds when the upstream is down, doubling the call volume and the cost.
A cron job that runs nightly without checking if the previous run finished, so two runs overlap and double the spend.
A test environment using the production API key, because the model never separated the two.

The cost surprise usually arrives at the end of the first month. Until then, nobody is watching, because the prototype was running on a free trial or a small balance. By the time the firm notices, the spend is real and the team has no idea where it came from.

What changes in production: rate limits, caching, budget caps, and observability. A test key for the test environment. Alerts when daily spend crosses a threshold. A sense, at any moment, of how much the system is costing per user, per request, per day. None of this is hard to add. It just has to be added on purpose.

What is the first vulnerability we find in an AI-built prototype?

When we open a vibe-coded prototype, we know where to look first. The API keys are in the frontend bundle. We open the browser, view-source on the deployed app, search for sk-, Bearer , or the name of the service, and the key is right there in plain text.

We have done this on every Cursor-built prototype we’ve been called in to review. Not most. Every.

The keys are sometimes for Anthropic, sometimes for OpenAI, sometimes for Supabase service-role, sometimes for Stripe, sometimes all of them in one bundle. The model writes the code that uses the key. The simplest place to put the key is the frontend, because that is where the call originates. The model does not move the call to a backend, because there is no backend yet.

This is not a sophisticated finding. It is the first thing a security-conscious developer would look for. The reason it stays in the bundle is that the person who built the prototype does not know what dev tools is. They are not careless. They have no engineering background, and the AI tool that wrote the code did not flag the issue, because the demo worked with the key in the bundle.

The fix is straightforward. Move the key to the server. Put a small backend (a function on Vercel, a worker on Cloudflare, an Express server on Render) between the frontend and the paid API. The frontend calls the backend, the backend calls the paid API, the key never leaves the server. This is usually the first thing we ship in a vibe-code-to-production engagement, often inside the first week, because the exposure window is open until it is closed.

What does “production-ready” mean for a 100-user app vs a 10,000-user app?

Production is not a single bar. The four changes above are the floor. How far above the floor a system has to sit depends on who is using it.

For a tool with under 100 users, all inside a single firm, with predictable usage and a small enough blast radius that one bad day will not end the business: the floor is most of the work. Real schema, real auth, real error handling, capped costs, nightly backups, a staging environment. This is most of the rebuilds we do, because most of the prototypes we inherit are internal tools or first customer-facing products serving a small group.

For a tool with thousands of users, customers paying real money, and a usage pattern the firm cannot predict: the floor is the start of the work. The system needs the four changes plus horizontal scaling, a proper deployment pipeline, multi-layer monitoring, an on-call rotation, security review, often a SOC 2 or ISO 27001 path. The data model has to be designed for the workload, not the demo. Cost has to be tracked per customer, not just in aggregate.

The honest answer for most firms calling us is that they need the floor. The exposure they have is the four things we listed, and fixing those buys them eighteen months of runway before the next set of constraints bites. We do not pretend a small internal tool needs the same treatment as a public SaaS. The work scales to the risk.

What happens in the first two weeks of taking a prototype to production?

When a firm hands us a Claude, Cursor or v0 prototype and asks us to take it to production, the first two weeks always look the same.

Week one is the read. We open the codebase, the database (if any), the deployed app, and the browser dev tools. We answer the four questions: what is the data layer, what is the auth boundary, what is the error path, what is the cost model. We test the worst thing first (the keys in the frontend, almost always). We agree on the floor for this specific tool, given who is using it and the blast radius if it stays as it is.

Week two is the first wave of fixes. We move the secrets to the server. We add a real auth layer or harden the existing one. We put a real schema behind the screens, with the existing data migrated. We turn on row-level security where the platform has it. We add basic error logging and an alert when the system breaks. We cap the API spend.

After two weeks the prototype is no longer wide open. It is not yet a polished product, but it is a system the firm can keep using without holding its breath. The remaining work is the long tail: tests, monitoring, the deployment pipeline, the staging environment, the second wave of features the team has been waiting to ship. That work runs in the weeks that follow, alongside the team using the system.

If the prototype turns out to be unrecoverable (a single screen of fake data, a backend that is mostly placeholder), we say so on day three and switch to a rebuild that keeps the product decisions and replaces the code. The honest call saves the firm a month.

If you have a tool a non-engineer built and it is now in daily use, the conversation worth having is which of the four things is missing and what the blast radius looks like if it stays missing. We run a half-day vibe-code to production review that surfaces this in writing. Most of the time the data layer is the gap, which means the conversation extends into our integrations and custom CRM work, depending on what the prototype was actually trying to be. The pattern, by now, is consistent.

From Claude prototype to production: what actually has to change

What the prototype actually proves

What actually changes going from prototype to production

1. The data layer (missing)

2. The auth boundary (almost always wrong)

3. The error path (silent by default)

4. The cost model (unbounded until the bill arrives)

What is the first vulnerability we find in an AI-built prototype?

What does “production-ready” mean for a 100-user app vs a 10,000-user app?

What happens in the first two weeks of taking a prototype to production?

Common questions

What actually has to change going from a Claude prototype to production?

Is a working AI-built prototype the same as a working product?

What is the most common security flaw in a Cursor or Claude prototype?

Does a small internal tool need the same work as a public SaaS app?

How long does it take to make an inherited prototype safe to use?

Marty Papamanolis

Need help with your data?

More articles

LEAP integration checklist for Melbourne law firms

Xero to Karbon: the data problems no one warns you about

Access database modernisation: what changes between 2007 and now