
The Ultimate AI App & MVP Workflow - Ship Production Software, Not Demos
Most AI apps fail. Not because the models are bad. Not because the idea is wrong. They fail because the people building them treat AI apps like regular apps, or worse, like demos.
I've shipped 30+ production AI products. I've seen the same mistakes kill projects before they even get users. This isn't a tutorial. This is the system I use to ship software that actually works.
Table of Contents
- Hook / Context
- The Real AI App Stack (High-Level)
- IDE & Core Workflow
- Design & UX Rules for AI Products
- Backend & AI Orchestration
- Security Checklist (NON-NEGOTIABLE)
- DevOps & Deployment Setup
- Monetization & Paywalls
- The Exact Workflow I Use (Step-by-Step)
- Common Mistakes I See After Reviewing Dozens of AI Apps
- Final Cheat Sheet (Skimmable)
- Closing
Hook / Context
You've seen the demos. The Twitter threads showing "I built an AI app in 2 hours." The GitHub repos with 10k stars and zero production users. The landing pages promising the moon, backed by code that breaks when you look at it wrong.
Here's what they're not showing you: the $5,000 OpenAI bill from one weekend. The security holes that leak API keys. The users who hit rate limits on day one. The apps that work in the demo but fail when real people use them.
The gap between "AI demo" and "production AI app" is massive. Most people never cross it.
I've built AI products for startups that raised Series A. I've built internal tools for enterprises processing millions of requests. I've also seen dozens of "AI apps" that were one API call away from being a security disaster.
The difference isn't the model. It's the system.
This post is that system. It's the workflow I use to ship AI products that don't break, don't leak secrets, and don't cost $10k in unexpected API bills. It's opinionated. It's specific. It assumes you can code but haven't shipped production AI software before.
If you want to build something real, read this. If you want to build a demo, there are plenty of YouTube tutorials for that.
The Real AI App Stack (High-Level)
An AI app isn't a frontend calling OpenAI. That's a prototype. A production AI app has seven layers, and most people skip four of them.
1. Product & UX Layer
This is where most AI apps die. You can't prompt-engineer your way out of a bad product.
What it includes:
- User intent understanding (what are they actually trying to do?)
- Input constraints (don't let users paste novels)
- Output expectations (what does "done" look like?)
- Failure states (what happens when the model hallucinates?)
The mistake: Building the AI feature first, then figuring out the product.
The fix: Design the user outcome first. The AI is a means, not the end.
2. Frontend Layer
Your UI needs to handle latency, streaming, partial outputs, and failures gracefully.
What it includes:
- Streaming UI (show progress, not spinners)
- Optimistic updates (make it feel instant)
- Skeleton states (mask loading)
- Error boundaries (fail gracefully)
- Input validation (constrain before sending)
The mistake: Building a form that submits and shows a spinner for 10 seconds.
The fix: Stream responses, show progress, validate inputs client-side.
3. Backend & Orchestration Layer
This is where the magic happens. Or where everything breaks.
What it includes:
- API proxy (never expose keys to frontend)
- Request routing (which model? which endpoint?)
- Tool calling / function routing (when to call external APIs)
- State machines (multi-step workflows)
- Retry logic (with exponential backoff)
- Fallback chains (model A fails, try model B)
- Rate limiting (per user, per IP, per feature)
- Cost tracking (log every token)
The mistake: Frontend → OpenAI directly. No backend. No protection.
The fix: Everything goes through your backend. Always.
4. AI Layer
The models themselves. This is the smallest part of the stack, but everyone obsesses over it.
What it includes:
- Model selection (GPT-4 vs Claude vs open source)
- Prompt templates (versioned, tested)
- Context management (RAG, memory, conversation history)
- Token optimization (trim context, compress prompts)
- Output parsing (structured extraction, validation)
The mistake: Using GPT-4 for everything, ignoring costs, no prompt versioning.
The fix: Right model for the job. Track costs. Version prompts like code.
5. Data & Memory Layer
AI apps need memory. Users expect continuity.
What it includes:
- Conversation history (vector DB or SQL)
- User preferences (what they like, what they don't)
- Context windows (what to include, what to exclude)
- Embeddings (for RAG, search, similarity)
- Cache layer (don't regenerate the same thing)
The mistake: Stateless apps that forget everything.
The fix: Store conversations. Build context. Use RAG when needed.
6. Security Layer
This isn't optional. AI apps are attack vectors waiting to happen.
What it includes:
- API key management (never in code, never in frontend)
- Authentication (who is this user?)
- Authorization (what can they do?)
- Input sanitization (prevent injection attacks)
- Output filtering (prevent data leaks)
- Rate limiting (prevent abuse)
- Audit logging (who did what, when)
The mistake: Hardcoded keys, no auth, no rate limits.
The fix: Secrets in env vars. Auth on every endpoint. Rate limits everywhere.
7. Infra & DevOps Layer
How you deploy, monitor, and scale.
What it includes:
- Environment separation (dev, staging, prod)
- CI/CD (automated tests, deployments)
- Observability (logs, errors, metrics)
- Cost monitoring (track API spend)
- Kill switches (turn off expensive features)
- Rollback procedures (when things break)
The mistake: Deploying to production from localhost. No monitoring. No rollback plan.
The fix: Proper environments. Automated deployments. Real observability.
8. Monetization & Scaling Layer
Most AI apps never get here because they die earlier. But if you make it, this is critical.
What it includes:
- Usage tracking (credits, tokens, requests)
- Billing integration (Stripe, Paddle)
- Paywall logic (free tier limits)
- Subscription management
- Cost allocation (what features cost what)
The mistake: Building features, then trying to monetize.
The fix: Design monetization into the product from day one.
IDE & Core Workflow
I use Cursor. Not because it's perfect, but because it's the best tool for shipping AI products fast. Here's how I use it without creating garbage code.
Why Cursor Works
Cursor understands your codebase. It can read multiple files, understand context, and make changes across your project. ChatGPT can't do that. GitHub Copilot can't do that. This is why Cursor wins for production work.
Rules for Prompting Cursor
1. Be specific about scope
Bad: "Add authentication"
Good: "Add NextAuth.js authentication to this Next.js app. Use email/password and Google OAuth. Store sessions in the existing PostgreSQL database. Add a protected route at /dashboard that requires auth."
2. Reference existing patterns
Bad: "Create a new API route"
Good: "Create a new API route following the same pattern as /api/users/route.ts. Use the same error handling and response format."
3. Specify file locations
Bad: "Add a component for user profiles"
Good: "Create a new component at components/user-profile.tsx that displays user information. Use the existing User type from lib/types.ts."
4. Include constraints
Bad: "Make it responsive"
Good: "Make it responsive using Tailwind breakpoints. Mobile-first design. Max width 1280px on desktop."
When to Let AI Generate Code
Let AI generate:
- Boilerplate (API routes, CRUD operations)
- Type definitions (from existing data structures)
- Test cases (unit tests, integration tests)
- Documentation (JSDoc comments, README sections)
- Error handling patterns (try/catch, validation)
Don't let AI generate:
- Business logic (you understand the domain better)
- Security-critical code (auth, payments, secrets)
- Performance-critical paths (AI doesn't optimize well)
- Complex state management (AI creates overcomplicated solutions)
Folder-Level Prompting
When working on a feature that spans multiple files:
I'm building a feature for user onboarding. It needs:
1. A new API route at `/api/onboarding/route.ts` that:
- Accepts POST requests with user data
- Validates input using Zod
- Creates a user record in the database
- Sends a welcome email
- Returns the created user
2. A new page at `app/onboarding/page.tsx` that:
- Shows a multi-step form (3 steps)
- Uses the existing form components from `components/forms/`
- Calls the API route on submit
- Handles errors and loading states
3. Update the database schema to include an `onboarding_completed` field
Follow existing patterns in the codebase. Use TypeScript. Use the existing error handling utilities.
File-Level Prompting
When editing a single file:
In this file, I need to:
1. Add a new function `validateUserInput` that takes user data and returns validation errors
2. Update the `createUser` function to use the new validator
3. Add error handling for database connection failures
4. Add JSDoc comments to all exported functions
Keep the existing code style. Don't change anything else.
Anti-Patterns That Cause Bad AI Code
1. Vague prompts
"Make it better" → AI will change random things.
2. No context
"Add a button" → AI doesn't know where, what style, what it does.
3. Too many changes at once
"Refactor the entire auth system and add OAuth and update the UI" → AI will break things.
4. Ignoring existing patterns
"Add a new API route" without showing existing routes → AI creates inconsistent code.
5. Not reviewing AI output
Accepting everything AI generates → Technical debt and bugs.
The Cursor Workflow I Use
- Plan the change (in my head or notes)
- Find similar code (grep for patterns)
- Prompt Cursor with context (reference existing code)
- Review the diff (does it make sense?)
- Test it (does it work?)
- Refine if needed (small follow-up prompts)
I never let Cursor make large architectural changes. I use it for implementation, not design.
Design & UX Rules for AI Products
AI products have different UX requirements than regular apps. Most people ignore this and build forms that submit to APIs. That's not good enough.
Why AI UX Is Different
Latency is unpredictable
A regular API call takes 100-500ms. An AI call takes 2-10 seconds. Sometimes 30 seconds. Users will think your app is broken.
Outputs are non-deterministic
The same input can produce different outputs. Users need to understand this.
Failures are common
Models hallucinate. APIs rate limit. Networks fail. Your UI must handle this gracefully.
Partial outputs are valuable
Users don't want to wait 10 seconds for nothing. Show progress. Stream responses.
Latency Masking Patterns
1. Streaming responses
Don't wait for the full response. Stream tokens as they arrive.
// Bad: Wait for everything
const response = await fetch('/api/generate');
const data = await response.json();
setOutput(data.text);
// Good: Stream it
const response = await fetch('/api/generate');
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
setOutput(prev => prev + chunk);
}
2. Optimistic UI
Show the expected result immediately, update when real data arrives.
// User submits form
setOptimisticResult(calculateExpectedResult(input));
// Then fetch real result
const realResult = await fetch('/api/process');
setOptimisticResult(realResult);
3. Skeleton states
Show the structure of what's coming, not a spinner.
// Bad: Spinner
{isLoading && <Spinner />}
// Good: Skeleton
{isLoading && <ResultSkeleton />}
4. Progressive enhancement
Show what you can, when you can.
// Show metadata first
setMetadata(extractMetadata(response));
// Then show full content
setContent(await streamFullContent(response));
Input Constraints > Prompt Engineering
Most people spend hours on prompts. They should spend hours on input validation.
Why constraints matter:
- Shorter inputs = faster responses = lower costs
- Validated inputs = fewer errors = better outputs
- Constrained inputs = predictable outputs = better UX
What to constrain:
- Length (max characters, max words)
- Format (structured data, specific fields)
- Content (no PII, no sensitive data)
- Language (if you only support English, say so)
Example:
// Bad: Accept anything
const prompt = userInput;
// Good: Constrain it
const schema = z.object({
topic: z.string().min(10).max(200),
tone: z.enum(['professional', 'casual', 'friendly']),
length: z.enum(['short', 'medium', 'long'])
});
const validated = schema.parse(userInput);
const prompt = buildPrompt(validated);
Designing for Failure
Your AI will fail. Design for it.
1. Retry logic (with limits)
async function generateWithRetry(input: string, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await generate(input);
} catch (error) {
if (i === maxRetries - 1) throw error;
await sleep(1000 * (i + 1)); // Exponential backoff
}
}
}
2. Fallback chains
async function generateWithFallback(input: string) {
try {
return await generateWithGPT4(input);
} catch (error) {
console.warn('GPT-4 failed, trying GPT-3.5');
return await generateWithGPT35(input);
}
}
3. Partial outputs
// If generation fails halfway, show what you got
try {
const fullOutput = await streamGeneration(input);
setOutput(fullOutput);
} catch (error) {
// Keep partial output, show error message
setError('Generation incomplete. Partial result shown.');
}
4. Clear error messages
// Bad: Generic error
setError('Something went wrong');
// Good: Specific error
if (error.code === 'RATE_LIMIT') {
setError('Too many requests. Please wait a moment.');
} else if (error.code === 'INVALID_INPUT') {
setError('Your input is too long. Please shorten it.');
} else {
setError('Generation failed. Please try again.');
}
The UX Checklist
Before shipping, ask:
- Can users see progress during long operations?
- Are inputs validated before sending?
- Are errors clear and actionable?
- Is there a retry mechanism?
- Can users cancel long-running operations?
- Are partial outputs shown if generation fails?
- Is the UI responsive during AI operations?
- Are loading states informative (not just spinners)?
Backend & AI Orchestration
This is where most AI apps die. People build frontends that call OpenAI directly. That's a prototype, not a product.
Why Direct Frontend → OpenAI Is a Mistake
1. Security
You can't hide API keys in the frontend. They'll be exposed. Someone will find them. You'll get a $10k bill.
2. No control
You can't rate limit. You can't log requests. You can't track costs. You can't prevent abuse.
3. No orchestration
You can't chain multiple API calls. You can't use tool calling. You can't implement retries or fallbacks.
4. No business logic
You can't enforce usage limits. You can't check subscriptions. You can't add paywalls.
Always use a backend proxy.
Backend Proxy Patterns
Pattern 1: Simple Proxy
// app/api/generate/route.ts
export async function POST(req: Request) {
const { input } = await req.json();
// Validate input
if (!input || input.length > 1000) {
return Response.json({ error: 'Invalid input' }, { status: 400 });
}
// Check auth
const user = await getCurrentUser(req);
if (!user) {
return Response.json({ error: 'Unauthorized' }, { status: 401 });
}
// Check rate limits
const rateLimited = await checkRateLimit(user.id);
if (rateLimited) {
return Response.json({ error: 'Rate limited' }, { status: 429 });
}
// Call OpenAI
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: input }],
});
// Log usage
await logUsage(user.id, response.usage);
return Response.json({ output: response.choices[0].message.content });
}
Pattern 2: Streaming Proxy
export async function POST(req: Request) {
const { input } = await req.json();
// ... validation, auth, rate limits ...
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: input }],
stream: true,
});
// Create a readable stream
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
controller.enqueue(encoder.encode(content));
}
controller.close();
},
});
return new Response(readable, {
headers: { 'Content-Type': 'text/event-stream' },
});
}
Pattern 3: Tool Calling / Function Routing
export async function POST(req: Request) {
const { input, tools } = await req.json();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: input }],
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string' },
},
},
},
},
],
});
const message = response.choices[0].message;
// Check if model wants to call a function
if (message.tool_calls) {
for (const toolCall of message.tool_calls) {
if (toolCall.function.name === 'get_weather') {
const args = JSON.parse(toolCall.function.arguments);
const weather = await fetchWeather(args.location);
// Call model again with function result
const secondResponse = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'user', content: input },
message,
{
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(weather),
},
],
});
return Response.json({
output: secondResponse.choices[0].message.content,
});
}
}
}
return Response.json({ output: message.content });
}
State Machines for Multi-Step Workflows
Complex AI workflows need state machines. Don't try to manage this with if/else.
type WorkflowState =
| { type: 'idle' }
| { type: 'validating'; input: string }
| { type: 'generating'; validatedInput: string }
| { type: 'post-processing'; output: string }
| { type: 'complete'; finalOutput: string }
| { type: 'error'; error: string };
async function runWorkflow(input: string): Promise<string> {
let state: WorkflowState = { type: 'idle' };
try {
// Validate
state = { type: 'validating', input };
const validated = await validateInput(input);
// Generate
state = { type: 'generating', validatedInput: validated };
const generated = await generate(validated);
// Post-process
state = { type: 'post-processing', output: generated };
const processed = await postProcess(generated);
// Complete
state = { type: 'complete', finalOutput: processed };
return processed;
} catch (error) {
state = { type: 'error', error: error.message };
throw error;
}
}
Managing Retries, Fallbacks, and Hallucination Control
Retry logic:
async function generateWithRetry(
input: string,
options: { maxRetries?: number; backoffMs?: number } = {}
): Promise<string> {
const { maxRetries = 3, backoffMs = 1000 } = options;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await generate(input);
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = backoffMs * Math.pow(2, attempt);
await sleep(delay);
}
}
throw new Error('Max retries exceeded');
}
Fallback chains:
async function generateWithFallback(input: string): Promise<string> {
const models = ['gpt-4', 'gpt-3.5-turbo', 'claude-3-opus'];
for (const model of models) {
try {
return await generate(input, { model });
} catch (error) {
console.warn(`${model} failed, trying next`);
continue;
}
}
throw new Error('All models failed');
}
Hallucination control:
async function generateWithValidation(input: string): Promise<string> {
const output = await generate(input);
// Check for hallucinations
const validation = await validateOutput(output, input);
if (!validation.isValid) {
// Regenerate with stricter prompt
return await generate(input, {
systemPrompt: 'Be extremely factual. If unsure, say so.',
});
}
return output;
}
Logging and Traceability as First-Class Concerns
Every AI request should be logged. You need to debug issues, track costs, and understand usage.
async function generateWithLogging(input: string, userId: string) {
const requestId = crypto.randomUUID();
const startTime = Date.now();
try {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: input }],
});
const duration = Date.now() - startTime;
const tokens = response.usage?.total_tokens || 0;
const cost = calculateCost(tokens, 'gpt-4');
// Log success
await logRequest({
requestId,
userId,
input,
output: response.choices[0].message.content,
tokens,
cost,
duration,
status: 'success',
});
return response.choices[0].message.content;
} catch (error) {
const duration = Date.now() - startTime;
// Log failure
await logRequest({
requestId,
userId,
input,
error: error.message,
duration,
status: 'error',
});
throw error;
}
}
What to log:
- Request ID (for tracing)
- User ID (for attribution)
- Input (for debugging)
- Output (for quality analysis)
- Tokens used (for cost tracking)
- Duration (for performance)
- Model used (for cost allocation)
- Status (success/error)
- Error messages (if failed)
Security Checklist (NON-NEGOTIABLE)
I've seen too many AI apps with hardcoded API keys, no authentication, and zero rate limiting. This section is non-negotiable. If you skip it, you're building a liability, not a product.
API Key Handling
Never do this:
// ❌ NEVER
const OPENAI_API_KEY = 'sk-...';
Always do this:
// ✅ ALWAYS
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;
if (!OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY is not set');
}
Environment variables:
- Use
.env.localfor local development (gitignored) - Use your hosting platform's secrets manager for production
- Never commit secrets to git
- Rotate keys regularly
- Use different keys for dev/staging/prod
Secrets Management
For local development:
# .env.local (gitignored)
OPENAI_API_KEY=sk-...
DATABASE_URL=postgresql://...
NEXTAUTH_SECRET=...
For production (Vercel example):
# Set in Vercel dashboard
vercel env add OPENAI_API_KEY
For other platforms:
- AWS: AWS Secrets Manager
- GCP: Secret Manager
- Azure: Key Vault
- Railway/Render: Environment variables in dashboard
Never:
- Hardcode secrets
- Commit
.envfiles - Share secrets in Slack/Discord
- Log secrets (even in error messages)
Authentication vs Authorization
Authentication: Who is this user?
// Check if user is logged in
const user = await getCurrentUser(req);
if (!user) {
return Response.json({ error: 'Unauthorized' }, { status: 401 });
}
Authorization: What can this user do?
// Check if user has permission
if (user.role !== 'admin') {
return Response.json({ error: 'Forbidden' }, { status: 403 });
}
// Check if user has credits
if (user.credits < requiredCredits) {
return Response.json({ error: 'Insufficient credits' }, { status: 402 });
}
Common patterns:
- JWT for stateless auth (NextAuth.js, Clerk, Auth0)
- Session-based auth for stateful apps
- API keys for server-to-server (different from user auth)
JWT Usage (Where It Fits, Where It Doesn't)
Use JWT when:
- Stateless authentication (no server-side sessions)
- Microservices (token can be verified without DB lookup)
- Mobile apps (token stored on device)
Don't use JWT when:
- You need to revoke tokens immediately (JWT is valid until expiry)
- You need server-side session management
- Token size matters (JWTs can be large)
Example (NextAuth.js):
// app/api/auth/[...nextauth]/route.ts
import NextAuth from 'next-auth';
export const authOptions = {
providers: [
// ... providers
],
callbacks: {
async jwt({ token, user }) {
if (user) {
token.id = user.id;
token.role = user.role;
}
return token;
},
async session({ session, token }) {
session.user.id = token.id;
session.user.role = token.role;
return session;
},
},
};
export const handler = NextAuth(authOptions);
Rate Limiting & Abuse Prevention
Why it matters:
- Prevents API key abuse
- Prevents cost explosions
- Prevents DDoS attacks
- Ensures fair usage
Implementation:
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, '10 s'), // 10 requests per 10 seconds
});
export async function POST(req: Request) {
const ip = req.headers.get('x-forwarded-for') || 'unknown';
const { success } = await ratelimit.limit(ip);
if (!success) {
return Response.json(
{ error: 'Rate limit exceeded' },
{ status: 429 }
);
}
// ... rest of handler
}
Rate limit strategies:
- Per IP (prevent abuse from single source)
- Per user (prevent abuse from single account)
- Per feature (different limits for different features)
- Tiered (free users: 10/min, paid: 100/min)
Input Sanitization
Never trust user input:
// ❌ BAD
const prompt = userInput;
await openai.chat.completions.create({
messages: [{ role: 'user', content: prompt }],
});
// ✅ GOOD
const sanitized = sanitizeInput(userInput);
const validated = validateInput(sanitized);
await openai.chat.completions.create({
messages: [{ role: 'user', content: validated }],
});
What to sanitize:
- Remove PII (emails, phone numbers, SSNs)
- Remove sensitive data (passwords, API keys)
- Limit length (prevent token bombs)
- Validate format (structured inputs)
- Escape special characters (prevent injection)
Output Filtering
Filter outputs before sending to users:
const output = await generate(input);
// Filter sensitive data
const filtered = filterOutput(output, {
removePII: true,
removeSecrets: true,
maxLength: 10000,
});
return Response.json({ output: filtered });
Why "Vibe Coding" Without Security Is Dangerous
I've seen apps that:
- Exposed API keys in client-side code → $5k OpenAI bill
- Had no rate limits → DDoS'd themselves
- Accepted unlimited input → Token bombs that cost $100s per request
- Had no auth → Anyone could use the API
- Logged sensitive data → Privacy violations
The cost of skipping security:
- Financial (unexpected API bills)
- Legal (data breaches, privacy violations)
- Reputational (users lose trust)
- Operational (downtime, abuse)
The fix:
Security isn't optional. Build it in from day one. It's easier to add security early than to retrofit it later.
Security Checklist
Before shipping, verify:
- No hardcoded API keys or secrets
- All secrets in environment variables
- Authentication on all protected endpoints
- Authorization checks for user permissions
- Rate limiting implemented
- Input validation and sanitization
- Output filtering for sensitive data
- Error messages don't leak secrets
- HTTPS only (no HTTP in production)
- CORS configured correctly
- SQL injection prevention (parameterized queries)
- XSS prevention (sanitize user input)
- Audit logging for sensitive operations
DevOps & Deployment Setup
Most AI apps are deployed like demos: push to main, hope it works. That's not how you ship production software.
Environment Separation
Three environments minimum:
-
Development (local)
- Your machine
.env.localfor secrets- Can break freely
-
Staging (pre-production)
- Mirrors production
- Test deployments here first
- Real API keys (but test accounts)
-
Production (live)
- Real users
- Real money
- Zero tolerance for breaks
Why this matters:
- Test changes before production
- Catch bugs before users see them
- Safe rollbacks
- Different API keys (so staging doesn't affect production costs)
Implementation:
// lib/config.ts
const env = process.env.NODE_ENV;
export const config = {
env,
isDev: env === 'development',
isStaging: env === 'staging',
isProd: env === 'production',
openai: {
apiKey: process.env.OPENAI_API_KEY!,
model: env === 'production' ? 'gpt-4' : 'gpt-3.5-turbo', // Cheaper in dev
},
database: {
url: process.env.DATABASE_URL!,
},
rateLimit: {
requests: env === 'production' ? 100 : 1000, // Stricter in prod
window: '1m',
},
};
CI/CD Expectations for MVPs vs Scale
For MVPs (shipping fast):
- Automated tests (unit tests for critical paths)
- Automated deployments (push to main = deploy)
- Basic monitoring (errors, logs)
For scale (shipping safely):
- Comprehensive test suite (unit, integration, E2E)
- Staged deployments (staging → production)
- Code review requirements
- Automated security scans
- Performance testing
- Canary deployments
- Rollback automation
MVP CI/CD example (GitHub Actions):
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Deploy to Vercel
uses: amondnet/vercel-action@v20
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
Observability Basics
What to monitor:
-
Errors
- Unhandled exceptions
- API failures
- Database errors
-
Performance
- Response times
- API latency
- Database query times
-
Usage
- Request volume
- User activity
- Feature usage
-
Costs
- API token usage
- Cost per request
- Daily/weekly/monthly spend
Implementation:
// lib/monitoring.ts
import * as Sentry from '@sentry/nextjs';
export function logError(error: Error, context?: Record<string, any>) {
console.error(error);
Sentry.captureException(error, { extra: context });
}
export function logEvent(name: string, data?: Record<string, any>) {
console.log(`[EVENT] ${name}`, data);
Sentry.captureMessage(name, { level: 'info', extra: data });
}
export function trackCost(feature: string, tokens: number, cost: number) {
logEvent('cost_tracked', {
feature,
tokens,
cost,
timestamp: new Date().toISOString(),
});
}
Tools:
- Errors: Sentry, Rollbar, Bugsnag
- Logs: Vercel Logs, Datadog, Logtail
- Metrics: Vercel Analytics, PostHog, Mixpanel
- APM: New Relic, Datadog APM
Cost Explosions and How to Prevent Them
Common causes:
- No rate limiting → Users spam requests
- No input validation → Token bombs (100k token inputs)
- Wrong model → Using GPT-4 for everything
- No caching → Regenerating same content
- No kill switches → Can't turn off expensive features
Prevention:
// lib/cost-control.ts
export async function generateWithCostControl(
input: string,
userId: string
): Promise<string> {
// 1. Validate input length
if (input.length > 10000) {
throw new Error('Input too long');
}
// 2. Check user credits
const user = await getUser(userId);
if (user.credits < 10) {
throw new Error('Insufficient credits');
}
// 3. Check daily limit
const dailyUsage = await getDailyUsage(userId);
if (dailyUsage.cost > 100) {
throw new Error('Daily limit exceeded');
}
// 4. Use appropriate model
const model = user.plan === 'premium' ? 'gpt-4' : 'gpt-3.5-turbo';
// 5. Generate
const response = await generate(input, { model });
// 6. Track cost
const cost = calculateCost(response.usage.total_tokens, model);
await deductCredits(userId, cost);
await logCost(userId, cost);
// 7. Check for anomalies
if (cost > 10) {
logEvent('high_cost_request', { userId, cost, input: input.substring(0, 100) });
}
return response.choices[0].message.content;
}
Kill switches:
// lib/feature-flags.ts
export const featureFlags = {
expensiveFeature: process.env.ENABLE_EXPENSIVE_FEATURE === 'true',
experimentalModel: process.env.ENABLE_EXPERIMENTAL_MODEL === 'true',
};
export async function generate(input: string) {
if (!featureFlags.expensiveFeature) {
throw new Error('Feature temporarily disabled');
}
// ... generate
}
Rollbacks and Kill Switches
When things go wrong:
- Immediate: Kill switch (disable feature)
- Short-term: Rollback deployment
- Long-term: Fix and redeploy
Kill switch implementation:
// app/api/generate/route.ts
export async function POST(req: Request) {
// Kill switch check
if (process.env.KILL_SWITCH_GENERATE === 'true') {
return Response.json(
{ error: 'Service temporarily unavailable' },
{ status: 503 }
);
}
// ... rest of handler
}
Rollback procedure:
- Identify the bad deployment
- Revert to previous version (Git + Vercel/Railway/etc.)
- Verify fix
- Investigate root cause
- Deploy fix
Automated rollback (advanced):
# Monitor error rate, auto-rollback if > threshold
# (Use your platform's health checks + automation)
Monetization & Paywalls
Most AI apps never monetize. They build features, then try to add billing later. That's backwards. Design monetization into the product from day one.
Why Most AI Apps Fail to Monetize
1. No value proposition
Users don't understand why they should pay. The free tier does everything.
2. Wrong pricing model
Charging per month when usage varies wildly.
3. No enforcement
Free tier limits exist on paper, not in code.
4. Poor paywall UX
Paywalls block users instead of converting them.
Credits vs Subscriptions
Credits (usage-based):
- Good for: Variable usage, pay-as-you-go
- Example: 1000 credits = $10, each generation costs 10 credits
- Pros: Fair, scales with usage
- Cons: Harder to predict revenue
Subscriptions (recurring):
- Good for: Predictable usage, SaaS model
- Example: $29/month = unlimited generations
- Pros: Predictable revenue, better for users
- Cons: Heavy users cost you money
Hybrid (best of both):
- Base subscription + usage-based overage
- Example: $19/month = 1000 generations, $0.01 per extra
Usage-Based Pricing Logic
Track everything:
// lib/usage-tracking.ts
export async function trackUsage(
userId: string,
feature: string,
cost: number
) {
await db.usage.create({
data: {
userId,
feature,
cost,
timestamp: new Date(),
},
});
// Update user's credit balance
await db.user.update({
where: { id: userId },
data: {
credits: { decrement: cost },
},
});
}
Calculate costs:
export function calculateCost(tokens: number, model: string): number {
const pricing = {
'gpt-4': { input: 0.03 / 1000, output: 0.06 / 1000 },
'gpt-3.5-turbo': { input: 0.0015 / 1000, output: 0.002 / 1000 },
};
const rates = pricing[model] || pricing['gpt-3.5-turbo'];
// Simplified: assume 50/50 input/output split
return (tokens * rates.input) + (tokens * rates.output);
}
Enforcing Limits Server-Side
Never trust the client:
// ❌ BAD: Client-side check
if (user.credits < 10) {
alert('Not enough credits');
return;
}
await generate();
// ✅ GOOD: Server-side check
export async function POST(req: Request) {
const user = await getCurrentUser(req);
if (user.credits < 10) {
return Response.json(
{ error: 'Insufficient credits' },
{ status: 402 }
);
}
// Deduct before generation (atomic)
await db.user.update({
where: { id: user.id },
data: { credits: { decrement: 10 } },
});
try {
const result = await generate();
return Response.json({ result });
} catch (error) {
// Refund on error
await db.user.update({
where: { id: user.id },
data: { credits: { increment: 10 } },
});
throw error;
}
}
Free tier limits:
export async function checkLimit(userId: string, feature: string): Promise<boolean> {
const user = await getUser(userId);
if (user.plan === 'free') {
const dailyUsage = await getDailyUsage(userId, feature);
// Free tier: 10 generations per day
if (dailyUsage.count >= 10) {
return false;
}
}
return true;
}
Designing Paywalls Without Killing Activation
Bad paywall:
// Blocks user immediately
{user.credits === 0 && <PaywallModal />}
Good paywall:
// Shows value first, then paywall
{user.credits > 0 && <GenerateButton />}
{user.credits === 0 && (
<div>
<p>You've used your free credits! Upgrade to continue.</p>
<UpgradeButton />
<p>Or share on Twitter for 10 free credits</p>
</div>
)}
Paywall best practices:
- Show value first (let users try before paying)
- Clear pricing (no hidden fees)
- Multiple options (free, pro, enterprise)
- Social proof (testimonials, usage stats)
- Easy upgrade (one click, no friction)
- Transparent limits (show what they get)
Example paywall component:
export function Paywall({ user, feature }: { user: User; feature: string }) {
const limits = {
free: { generations: 10, features: ['basic'] },
pro: { generations: 1000, features: ['basic', 'advanced', 'api'] },
};
return (
<div className="paywall">
<h2>Upgrade to continue</h2>
<p>You've reached your free tier limit for {feature}</p>
<div className="plans">
<Plan
name="Pro"
price="$29/month"
features={limits.pro.features}
current={user.plan === 'pro'}
/>
</div>
<Button onClick={handleUpgrade}>Upgrade now</Button>
</div>
);
}
The Exact Workflow I Use (Step-by-Step)
This is the workflow I use to ship AI products. It's not theoretical. It's what I do, in order, every time.
Step 1: Idea → Validation
Don't build yet. Validate first.
-
Define the outcome (not the feature)
- Bad: "Build an AI writing assistant"
- Good: "Help users write blog posts 10x faster"
-
Find 5 people who have this problem
- Talk to them
- Understand their current solution
- Validate they'd pay for a better solution
-
Build a landing page (no code yet)
- Explain the outcome
- Show mockups
- Collect emails
- If 50+ people sign up, proceed
Why this matters:
Most ideas are bad. Validation filters them out before you waste weeks building.
Step 2: UX First, Not Model First
Design the experience before choosing the model.
-
Map the user journey
- Where do they start?
- What do they input?
- What do they see?
- What happens when it fails?
-
Design the UI (Figma, sketches, whatever)
- Input form
- Loading states
- Output display
- Error states
-
Define the API contract
- What does the request look like?
- What does the response look like?
- What are the error cases?
Why this matters:
The model is a detail. The experience is the product. Design the experience first.
Step 3: API Contracts Before UI Polish
Build the backend first. Make it work. Then make it pretty.
-
Create the API route
- Input validation
- Auth check
- Rate limiting
- AI generation
- Response formatting
-
Test with curl/Postman
- Valid requests
- Invalid requests
- Edge cases
- Error handling
-
Only then build the UI
- Call the API
- Handle responses
- Handle errors
- Polish the design
Why this matters:
Backend defines what's possible. Frontend is presentation. Build the foundation first.
Step 4: Build Thin, Ship Early, Harden Later
Ship the minimum that delivers value. Improve based on feedback.
v1 (Week 1):
- Basic input/output
- No streaming
- No retries
- Basic error handling
- Deploy to production
v2 (Week 2):
- Add streaming
- Add retries
- Better error messages
- Improve UX
v3 (Week 3):
- Add paywall
- Add usage tracking
- Add analytics
- Optimize costs
Why this matters:
Perfect is the enemy of shipped. Ship something that works, then improve it.
Step 5: What I Deliberately Ignore in v1
Things I skip in the first version:
- Comprehensive test suite (unit tests for critical paths only)
- Perfect error handling (basic try/catch is enough)
- Advanced monitoring (basic logs are enough)
- Multiple models (pick one, stick with it)
- Complex state management (keep it simple)
- Optimizations (premature optimization is evil)
- Documentation (code should be self-documenting)
Things I never skip:
- Authentication (who is this user?)
- Rate limiting (prevent abuse)
- Input validation (prevent token bombs)
- Error boundaries (don't crash the app)
- Basic logging (need to debug)
- Environment variables (never hardcode secrets)
Why this matters:
Focus on what matters. Ignore what doesn't. Ship faster.
The Complete Workflow Timeline
Week 1: Foundation
- Day 1-2: Validation (landing page, user interviews)
- Day 3-4: UX design (journey map, UI mockups)
- Day 5-6: API development (backend, testing)
- Day 7: Basic UI (connect to API, deploy)
Week 2: Polish
- Day 8-9: Streaming, better UX
- Day 10-11: Error handling, retries
- Day 12-13: Testing, bug fixes
- Day 14: Launch (share with initial users)
Week 3: Scale
- Day 15-16: Paywall, usage tracking
- Day 17-18: Analytics, monitoring
- Day 19-20: Optimizations, cost control
- Day 21: Iterate based on feedback
This is realistic. Not "ship in 2 hours." Not "ship in 6 months." Three weeks to something real.
Common Mistakes I See After Reviewing Dozens of AI Apps
I've reviewed dozens of AI apps. The same mistakes show up over and over. Here's what to avoid.
Mistake 1: Hardcoded Keys
The mistake:
const apiKey = 'sk-...';
Why it's bad:
- Keys get committed to git
- Keys get exposed in client-side code
- Keys get shared in screenshots
- Result: $10k OpenAI bill
The fix:
const apiKey = process.env.OPENAI_API_KEY;
Always use environment variables. Always.
Mistake 2: No Backend
The mistake:
// Frontend calling OpenAI directly
const response = await openai.chat.completions.create({
apiKey: 'sk-...', // Exposed!
// ...
});
Why it's bad:
- Can't hide API keys
- Can't rate limit
- Can't track usage
- Can't enforce paywalls
- Can't prevent abuse
The fix:
Always use a backend proxy. Always.
Mistake 3: No Abuse Protection
The mistake:
// No rate limiting, no input validation
export async function POST(req: Request) {
const { input } = await req.json();
return await generate(input); // Anything goes!
}
Why it's bad:
- Users can send 100k token inputs
- Users can spam requests
- Users can DDoS your API
- Result: $1000s in unexpected costs
The fix:
// Rate limit + input validation
export async function POST(req: Request) {
// Rate limit
const { success } = await ratelimit.limit(userId);
if (!success) return Response.json({ error: 'Rate limited' }, { status: 429 });
// Validate input
const { input } = await req.json();
if (input.length > 10000) {
return Response.json({ error: 'Input too long' }, { status: 400 });
}
return await generate(input);
}
Mistake 4: Over-Engineering Infra Too Early
The mistake:
- Kubernetes for an MVP
- Microservices for 100 users
- Complex CI/CD for a weekend project
- Over-architecting before you have users
Why it's bad:
- Wastes time
- Adds complexity
- Slows iteration
- Premature optimization
The fix:
Start simple. Vercel/Railway/Render for hosting. PostgreSQL for database. Add complexity when you need it.
Mistake 5: Shipping Features Instead of Outcomes
The mistake:
- Building 10 features before launching
- Perfecting the UI before validating the idea
- Adding "nice to have" features before core works
Why it's bad:
- Wastes time on things users don't want
- Delays learning
- Delays revenue
- Builds the wrong product
The fix:
Ship the minimum that delivers value. One feature that works is better than ten features that don't.
Mistake 6: No Error Handling
The mistake:
const result = await generate(input);
return result; // What if it fails?
Why it's bad:
- App crashes on errors
- Users see cryptic error messages
- No way to debug issues
- Bad user experience
The fix:
try {
const result = await generate(input);
return Response.json({ result });
} catch (error) {
console.error(error);
return Response.json(
{ error: 'Generation failed. Please try again.' },
{ status: 500 }
);
}
Mistake 7: Ignoring Costs
The mistake:
- Using GPT-4 for everything
- No cost tracking
- No usage limits
- No kill switches
Why it's bad:
- Unexpected bills
- Can't optimize
- Can't price correctly
- Can go bankrupt
The fix:
Track every request. Log costs. Set limits. Use cheaper models when possible.
Mistake 8: No Observability
The mistake:
- No error tracking
- No usage analytics
- No performance monitoring
- Flying blind
Why it's bad:
- Can't debug issues
- Can't understand usage
- Can't optimize
- Can't make data-driven decisions
The fix:
Add error tracking (Sentry). Add analytics (PostHog). Add logging. Know what's happening.
Final Cheat Sheet (Skimmable)
Print this. Keep it handy. Reference it before shipping.
Architecture Checklist
- Backend proxy (never frontend → OpenAI directly)
- Environment variables for all secrets
- Authentication on all protected endpoints
- Rate limiting implemented
- Input validation and sanitization
- Error handling and logging
- Cost tracking and limits
- Observability (errors, logs, metrics)
Security Checklist
- No hardcoded API keys
- Secrets in environment variables
- Authentication implemented
- Authorization checks
- Rate limiting
- Input sanitization
- Output filtering
- HTTPS only
- CORS configured
- Audit logging
UX Checklist
- Streaming responses (not spinners)
- Input constraints (length, format)
- Clear error messages
- Retry mechanisms
- Loading states (skeletons, not spinners)
- Optimistic UI where possible
- Mobile responsive
- Accessible (keyboard navigation, screen readers)
Deployment Checklist
- Environment separation (dev/staging/prod)
- CI/CD pipeline
- Error tracking (Sentry)
- Logging (structured logs)
- Monitoring (uptime, performance)
- Cost monitoring (API spend)
- Kill switches for expensive features
- Rollback procedure documented
Monetization Checklist
- Usage tracking implemented
- Credits/subscriptions system
- Paywall designed (not blocking)
- Limits enforced server-side
- Billing integration (Stripe/Paddle)
- Cost calculation accurate
- Free tier limits clear
Development Workflow
- Validate → Landing page, user interviews
- Design → UX first, model second
- Build → API contracts before UI polish
- Ship → Thin v1, improve based on feedback
- Iterate → Data-driven improvements
Tools I Use
- IDE: Cursor
- Framework: Next.js
- Database: PostgreSQL (Supabase/Railway)
- Auth: NextAuth.js / Clerk
- Hosting: Vercel / Railway
- Error Tracking: Sentry
- Analytics: PostHog / Vercel Analytics
- Rate Limiting: Upstash
- Billing: Stripe
Rules to Live By
- Backend always. Never frontend → AI directly.
- Security first. Never skip it.
- Ship thin. Perfect is the enemy of shipped.
- Track costs. Every request, every token.
- Design for failure. AI will fail. Handle it.
- Validate early. Don't build in a vacuum.
- Monitor everything. You can't fix what you can't see.
Closing
If you've read this far, you're serious about building real AI products. That's good. The world needs more builders, fewer demos.
This workflow isn't theoretical. It's what I use to ship software that works. It's opinionated. It's specific. It assumes you can code but haven't shipped production AI software before.
The gap between "AI demo" and "production AI app" is massive. Most people never cross it. They build demos, get excited, then hit a wall when real users show up.
You don't have to hit that wall.
Follow this system. Use these patterns. Avoid these mistakes. Ship something real.
The models are good enough. The tools are good enough. The only thing missing is the system. Now you have it.
Build something people actually use. Charge for it. Make it work.
If you're building something real and want to talk shop, find me on Twitter. I review AI apps and give honest feedback. No fluff. No corporate speak. Just real talk about what works and what doesn't.
Now go ship.
Need a build partner?
Launch your AI app development workflow with DreamLaunch
We deliver production-grade products in 28 days with research, design, engineering, and launch support handled end-to-end. Our team blends production AI software, MVP development with senior founders so you can stay focused on growth.
Ready to Build Your MVP?
Turn your idea into a revenue-ready product in just 28 days.
