The Ultimate AI App & MVP Workflow - Ship Production Software, Not Demos

Most AI apps fail. Not because the models are bad. Not because the idea is wrong. They fail because the people building them treat AI apps like regular apps, or worse, like demos.

I've shipped 30+ production AI products. I've seen the same mistakes kill projects before they even get users. This isn't a tutorial. This is the system I use to ship software that actually works.

Hook / Context
The Real AI App Stack (High-Level)
IDE & Core Workflow
Design & UX Rules for AI Products
Backend & AI Orchestration
Security Checklist (NON-NEGOTIABLE)
DevOps & Deployment Setup
Monetization & Paywalls
The Exact Workflow I Use (Step-by-Step)
Common Mistakes I See After Reviewing Dozens of AI Apps
Final Cheat Sheet (Skimmable)
Closing

Hook / Context

You've seen the demos. The Twitter threads showing "I built an AI app in 2 hours." The GitHub repos with 10k stars and zero production users. The landing pages promising the moon, backed by code that breaks when you look at it wrong.

Here's what they're not showing you: the $5,000 OpenAI bill from one weekend. The security holes that leak API keys. The users who hit rate limits on day one. The apps that work in the demo but fail when real people use them.

The gap between "AI demo" and "production AI app" is massive. Most people never cross it.

I've built AI products for startups that raised Series A. I've built internal tools for enterprises processing millions of requests. I've also seen dozens of "AI apps" that were one API call away from being a security disaster.

The difference isn't the model. It's the system.

This post is that system. It's the workflow I use to ship AI products that don't break, don't leak secrets, and don't cost $10k in unexpected API bills. It's opinionated. It's specific. It assumes you can code but haven't shipped production AI software before.

If you want to build something real, read this. If you want to build a demo, there are plenty of YouTube tutorials for that.

The Real AI App Stack (High-Level)

An AI app isn't a frontend calling OpenAI. That's a prototype. A production AI app has seven layers, and most people skip four of them.

1. Product & UX Layer

This is where most AI apps die. You can't prompt-engineer your way out of a bad product.

What it includes:

User intent understanding (what are they actually trying to do?)
Input constraints (don't let users paste novels)
Output expectations (what does "done" look like?)
Failure states (what happens when the model hallucinates?)

The mistake: Building the AI feature first, then figuring out the product.

The fix: Design the user outcome first. The AI is a means, not the end.

2. Frontend Layer

Your UI needs to handle latency, streaming, partial outputs, and failures gracefully.

What it includes:

Streaming UI (show progress, not spinners)
Optimistic updates (make it feel instant)
Skeleton states (mask loading)
Error boundaries (fail gracefully)
Input validation (constrain before sending)

The mistake: Building a form that submits and shows a spinner for 10 seconds.

The fix: Stream responses, show progress, validate inputs client-side.

3. Backend & Orchestration Layer

This is where the magic happens. Or where everything breaks.

What it includes:

API proxy (never expose keys to frontend)
Request routing (which model? which endpoint?)
Tool calling / function routing (when to call external APIs)
State machines (multi-step workflows)
Retry logic (with exponential backoff)
Fallback chains (model A fails, try model B)
Rate limiting (per user, per IP, per feature)
Cost tracking (log every token)

The mistake: Frontend → OpenAI directly. No backend. No protection.

The fix: Everything goes through your backend. Always.

4. AI Layer

The models themselves. This is the smallest part of the stack, but everyone obsesses over it.

What it includes:

Model selection (GPT-4 vs Claude vs open source)
Prompt templates (versioned, tested)
Context management (RAG, memory, conversation history)
Token optimization (trim context, compress prompts)
Output parsing (structured extraction, validation)

The mistake: Using GPT-4 for everything, ignoring costs, no prompt versioning.

The fix: Right model for the job. Track costs. Version prompts like code.

5. Data & Memory Layer

AI apps need memory. Users expect continuity.

What it includes:

Conversation history (vector DB or SQL)
User preferences (what they like, what they don't)
Context windows (what to include, what to exclude)
Embeddings (for RAG, search, similarity)
Cache layer (don't regenerate the same thing)

The mistake: Stateless apps that forget everything.

The fix: Store conversations. Build context. Use RAG when needed.

6. Security Layer

This isn't optional. AI apps are attack vectors waiting to happen.

What it includes:

API key management (never in code, never in frontend)
Authentication (who is this user?)
Authorization (what can they do?)
Input sanitization (prevent injection attacks)
Output filtering (prevent data leaks)
Rate limiting (prevent abuse)
Audit logging (who did what, when)

The mistake: Hardcoded keys, no auth, no rate limits.

The fix: Secrets in env vars. Auth on every endpoint. Rate limits everywhere.

7. Infra & DevOps Layer

How you deploy, monitor, and scale.

What it includes:

Environment separation (dev, staging, prod)
CI/CD (automated tests, deployments)
Observability (logs, errors, metrics)
Cost monitoring (track API spend)
Kill switches (turn off expensive features)
Rollback procedures (when things break)

The mistake: Deploying to production from localhost. No monitoring. No rollback plan.

The fix: Proper environments. Automated deployments. Real observability.

8. Monetization & Scaling Layer

Most AI apps never get here because they die earlier. But if you make it, this is critical.

What it includes:

Usage tracking (credits, tokens, requests)
Billing integration (Stripe, Paddle)
Paywall logic (free tier limits)
Subscription management
Cost allocation (what features cost what)

The mistake: Building features, then trying to monetize.

The fix: Design monetization into the product from day one.

IDE & Core Workflow

I use Cursor. Not because it's perfect, but because it's the best tool for shipping AI products fast. Here's how I use it without creating garbage code.

Why Cursor Works

Cursor understands your codebase. It can read multiple files, understand context, and make changes across your project. ChatGPT can't do that. GitHub Copilot can't do that. This is why Cursor wins for production work.

Rules for Prompting Cursor

1. Be specific about scope

Bad: "Add authentication"

Good: "Add NextAuth.js authentication to this Next.js app. Use email/password and Google OAuth. Store sessions in the existing PostgreSQL database. Add a protected route at /dashboard that requires auth."

2. Reference existing patterns

Bad: "Create a new API route"

Good: "Create a new API route following the same pattern as /api/users/route.ts. Use the same error handling and response format."

3. Specify file locations

Bad: "Add a component for user profiles"

Good: "Create a new component at components/user-profile.tsx that displays user information. Use the existing User type from lib/types.ts."

4. Include constraints

Bad: "Make it responsive"

Good: "Make it responsive using Tailwind breakpoints. Mobile-first design. Max width 1280px on desktop."

When to Let AI Generate Code

Let AI generate:

Boilerplate (API routes, CRUD operations)
Type definitions (from existing data structures)
Test cases (unit tests, integration tests)
Documentation (JSDoc comments, README sections)
Error handling patterns (try/catch, validation)

Don't let AI generate:

Business logic (you understand the domain better)
Security-critical code (auth, payments, secrets)
Performance-critical paths (AI doesn't optimize well)
Complex state management (AI creates overcomplicated solutions)

Folder-Level Prompting

When working on a feature that spans multiple files:

I'm building a feature for user onboarding. It needs:

1. A new API route at `/api/onboarding/route.ts` that:
   - Accepts POST requests with user data
   - Validates input using Zod
   - Creates a user record in the database
   - Sends a welcome email
   - Returns the created user

2. A new page at `app/onboarding/page.tsx` that:
   - Shows a multi-step form (3 steps)
   - Uses the existing form components from `components/forms/`
   - Calls the API route on submit
   - Handles errors and loading states

3. Update the database schema to include an `onboarding_completed` field

Follow existing patterns in the codebase. Use TypeScript. Use the existing error handling utilities.

File-Level Prompting

When editing a single file:

In this file, I need to:

1. Add a new function `validateUserInput` that takes user data and returns validation errors
2. Update the `createUser` function to use the new validator
3. Add error handling for database connection failures
4. Add JSDoc comments to all exported functions

Keep the existing code style. Don't change anything else.

Anti-Patterns That Cause Bad AI Code

1. Vague prompts

"Make it better" → AI will change random things.

2. No context

"Add a button" → AI doesn't know where, what style, what it does.

3. Too many changes at once

"Refactor the entire auth system and add OAuth and update the UI" → AI will break things.

4. Ignoring existing patterns

"Add a new API route" without showing existing routes → AI creates inconsistent code.

5. Not reviewing AI output

Accepting everything AI generates → Technical debt and bugs.

The Cursor Workflow I Use

Plan the change (in my head or notes)
Find similar code (grep for patterns)
Prompt Cursor with context (reference existing code)
Review the diff (does it make sense?)
Test it (does it work?)
Refine if needed (small follow-up prompts)

I never let Cursor make large architectural changes. I use it for implementation, not design.

Design & UX Rules for AI Products

AI products have different UX requirements than regular apps. Most people ignore this and build forms that submit to APIs. That's not good enough.

Why AI UX Is Different

Latency is unpredictable

A regular API call takes 100-500ms. An AI call takes 2-10 seconds. Sometimes 30 seconds. Users will think your app is broken.

Outputs are non-deterministic

The same input can produce different outputs. Users need to understand this.

Failures are common

Models hallucinate. APIs rate limit. Networks fail. Your UI must handle this gracefully.

Partial outputs are valuable

Users don't want to wait 10 seconds for nothing. Show progress. Stream responses.

Latency Masking Patterns

1. Streaming responses

Don't wait for the full response. Stream tokens as they arrive.

// Bad: Wait for everything
const response = await fetch('/api/generate');
const data = await response.json();
setOutput(data.text);

// Good: Stream it
const response = await fetch('/api/generate');
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  setOutput(prev => prev + chunk);
}

2. Optimistic UI

Show the expected result immediately, update when real data arrives.

// User submits form
setOptimisticResult(calculateExpectedResult(input));

// Then fetch real result
const realResult = await fetch('/api/process');
setOptimisticResult(realResult);

3. Skeleton states

Show the structure of what's coming, not a spinner.

// Bad: Spinner
{isLoading && <Spinner />}

// Good: Skeleton
{isLoading && <ResultSkeleton />}

4. Progressive enhancement

Show what you can, when you can.

// Show metadata first
setMetadata(extractMetadata(response));

// Then show full content
setContent(await streamFullContent(response));

Input Constraints > Prompt Engineering

Most people spend hours on prompts. They should spend hours on input validation.

Why constraints matter:

Shorter inputs = faster responses = lower costs
Validated inputs = fewer errors = better outputs
Constrained inputs = predictable outputs = better UX

What to constrain:

Length (max characters, max words)
Format (structured data, specific fields)
Content (no PII, no sensitive data)
Language (if you only support English, say so)

Example:

// Bad: Accept anything
const prompt = userInput;

// Good: Constrain it
const schema = z.object({
  topic: z.string().min(10).max(200),
  tone: z.enum(['professional', 'casual', 'friendly']),
  length: z.enum(['short', 'medium', 'long'])
});

const validated = schema.parse(userInput);
const prompt = buildPrompt(validated);

Designing for Failure

Your AI will fail. Design for it.

1. Retry logic (with limits)

async function generateWithRetry(input: string, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await generate(input);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(1000 * (i + 1)); // Exponential backoff
    }
  }
}

2. Fallback chains

async function generateWithFallback(input: string) {
  try {
    return await generateWithGPT4(input);
  } catch (error) {
    console.warn('GPT-4 failed, trying GPT-3.5');
    return await generateWithGPT35(input);
  }
}

3. Partial outputs

// If generation fails halfway, show what you got
try {
  const fullOutput = await streamGeneration(input);
  setOutput(fullOutput);
} catch (error) {
  // Keep partial output, show error message
  setError('Generation incomplete. Partial result shown.');
}

4. Clear error messages

// Bad: Generic error
setError('Something went wrong');

// Good: Specific error
if (error.code === 'RATE_LIMIT') {
  setError('Too many requests. Please wait a moment.');
} else if (error.code === 'INVALID_INPUT') {
  setError('Your input is too long. Please shorten it.');
} else {
  setError('Generation failed. Please try again.');
}

The UX Checklist

Before shipping, ask:

Can users see progress during long operations?
Are inputs validated before sending?
Are errors clear and actionable?
Is there a retry mechanism?
Can users cancel long-running operations?
Are partial outputs shown if generation fails?
Is the UI responsive during AI operations?
Are loading states informative (not just spinners)?

Backend & AI Orchestration

This is where most AI apps die. People build frontends that call OpenAI directly. That's a prototype, not a product.

Why Direct Frontend → OpenAI Is a Mistake

1. Security

You can't hide API keys in the frontend. They'll be exposed. Someone will find them. You'll get a $10k bill.

2. No control

You can't rate limit. You can't log requests. You can't track costs. You can't prevent abuse.

3. No orchestration

You can't chain multiple API calls. You can't use tool calling. You can't implement retries or fallbacks.

4. No business logic

You can't enforce usage limits. You can't check subscriptions. You can't add paywalls.

Always use a backend proxy.

Backend Proxy Patterns

Pattern 1: Simple Proxy

// app/api/generate/route.ts
export async function POST(req: Request) {
  const { input } = await req.json();
  
  // Validate input
  if (!input || input.length > 1000) {
    return Response.json({ error: 'Invalid input' }, { status: 400 });
  }
  
  // Check auth
  const user = await getCurrentUser(req);
  if (!user) {
    return Response.json({ error: 'Unauthorized' }, { status: 401 });
  }
  
  // Check rate limits
  const rateLimited = await checkRateLimit(user.id);
  if (rateLimited) {
    return Response.json({ error: 'Rate limited' }, { status: 429 });
  }
  
  // Call OpenAI
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: input }],
  });
  
  // Log usage
  await logUsage(user.id, response.usage);
  
  return Response.json({ output: response.choices[0].message.content });
}

Pattern 2: Streaming Proxy

export async function POST(req: Request) {
  const { input } = await req.json();
  
  // ... validation, auth, rate limits ...
  
  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: input }],
    stream: true,
  });
  
  // Create a readable stream
  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        controller.enqueue(encoder.encode(content));
      }
      controller.close();
    },
  });
  
  return new Response(readable, {
    headers: { 'Content-Type': 'text/event-stream' },
  });
}

Pattern 3: Tool Calling / Function Routing

export async function POST(req: Request) {
  const { input, tools } = await req.json();
  
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: input }],
    tools: [
      {
        type: 'function',
        function: {
          name: 'get_weather',
          description: 'Get weather for a location',
          parameters: {
            type: 'object',
            properties: {
              location: { type: 'string' },
            },
          },
        },
      },
    ],
  });
  
  const message = response.choices[0].message;
  
  // Check if model wants to call a function
  if (message.tool_calls) {
    for (const toolCall of message.tool_calls) {
      if (toolCall.function.name === 'get_weather') {
        const args = JSON.parse(toolCall.function.arguments);
        const weather = await fetchWeather(args.location);
        
        // Call model again with function result
        const secondResponse = await openai.chat.completions.create({
          model: 'gpt-4',
          messages: [
            { role: 'user', content: input },
            message,
            {
              role: 'tool',
              tool_call_id: toolCall.id,
              content: JSON.stringify(weather),
            },
          ],
        });
        
        return Response.json({
          output: secondResponse.choices[0].message.content,
        });
      }
    }
  }
  
  return Response.json({ output: message.content });
}

State Machines for Multi-Step Workflows

Complex AI workflows need state machines. Don't try to manage this with if/else.

type WorkflowState =
  | { type: 'idle' }
  | { type: 'validating'; input: string }
  | { type: 'generating'; validatedInput: string }
  | { type: 'post-processing'; output: string }
  | { type: 'complete'; finalOutput: string }
  | { type: 'error'; error: string };

async function runWorkflow(input: string): Promise<string> {
  let state: WorkflowState = { type: 'idle' };
  
  try {
    // Validate
    state = { type: 'validating', input };
    const validated = await validateInput(input);
    
    // Generate
    state = { type: 'generating', validatedInput: validated };
    const generated = await generate(validated);
    
    // Post-process
    state = { type: 'post-processing', output: generated };
    const processed = await postProcess(generated);
    
    // Complete
    state = { type: 'complete', finalOutput: processed };
    return processed;
  } catch (error) {
    state = { type: 'error', error: error.message };
    throw error;
  }
}

Managing Retries, Fallbacks, and Hallucination Control

Retry logic:

async function generateWithRetry(
  input: string,
  options: { maxRetries?: number; backoffMs?: number } = {}
): Promise<string> {
  const { maxRetries = 3, backoffMs = 1000 } = options;
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await generate(input);
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      
      const delay = backoffMs * Math.pow(2, attempt);
      await sleep(delay);
    }
  }
  
  throw new Error('Max retries exceeded');
}

Fallback chains:

async function generateWithFallback(input: string): Promise<string> {
  const models = ['gpt-4', 'gpt-3.5-turbo', 'claude-3-opus'];
  
  for (const model of models) {
    try {
      return await generate(input, { model });
    } catch (error) {
      console.warn(`${model} failed, trying next`);
      continue;
    }
  }
  
  throw new Error('All models failed');
}

Hallucination control:

async function generateWithValidation(input: string): Promise<string> {
  const output = await generate(input);
  
  // Check for hallucinations
  const validation = await validateOutput(output, input);
  
  if (!validation.isValid) {
    // Regenerate with stricter prompt
    return await generate(input, {
      systemPrompt: 'Be extremely factual. If unsure, say so.',
    });
  }
  
  return output;
}

Logging and Traceability as First-Class Concerns

Every AI request should be logged. You need to debug issues, track costs, and understand usage.

async function generateWithLogging(input: string, userId: string) {
  const requestId = crypto.randomUUID();
  const startTime = Date.now();
  
  try {
    const response = await openai.chat.completions.create({
      model: 'gpt-4',
      messages: [{ role: 'user', content: input }],
    });
    
    const duration = Date.now() - startTime;
    const tokens = response.usage?.total_tokens || 0;
    const cost = calculateCost(tokens, 'gpt-4');
    
    // Log success
    await logRequest({
      requestId,
      userId,
      input,
      output: response.choices[0].message.content,
      tokens,
      cost,
      duration,
      status: 'success',
    });
    
    return response.choices[0].message.content;
  } catch (error) {
    const duration = Date.now() - startTime;
    
    // Log failure
    await logRequest({
      requestId,
      userId,
      input,
      error: error.message,
      duration,
      status: 'error',
    });
    
    throw error;
  }
}

What to log:

Request ID (for tracing)
User ID (for attribution)
Input (for debugging)
Output (for quality analysis)
Tokens used (for cost tracking)
Duration (for performance)
Model used (for cost allocation)
Status (success/error)
Error messages (if failed)

Security Checklist (NON-NEGOTIABLE)

I've seen too many AI apps with hardcoded API keys, no authentication, and zero rate limiting. This section is non-negotiable. If you skip it, you're building a liability, not a product.

API Key Handling

Never do this:

// ❌ NEVER
const OPENAI_API_KEY = 'sk-...';

Always do this:

// ✅ ALWAYS
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;

if (!OPENAI_API_KEY) {
  throw new Error('OPENAI_API_KEY is not set');
}

Environment variables:

Use .env.local for local development (gitignored)
Use your hosting platform's secrets manager for production
Never commit secrets to git
Rotate keys regularly
Use different keys for dev/staging/prod

Secrets Management

For local development:

# .env.local (gitignored)
OPENAI_API_KEY=sk-...
DATABASE_URL=postgresql://...
NEXTAUTH_SECRET=...

For production (Vercel example):

# Set in Vercel dashboard
vercel env add OPENAI_API_KEY

For other platforms:

AWS: AWS Secrets Manager
GCP: Secret Manager
Azure: Key Vault
Railway/Render: Environment variables in dashboard

Never:

Hardcode secrets
Commit .env files
Share secrets in Slack/Discord
Log secrets (even in error messages)

Authentication vs Authorization

Authentication: Who is this user?

// Check if user is logged in
const user = await getCurrentUser(req);
if (!user) {
  return Response.json({ error: 'Unauthorized' }, { status: 401 });
}

Authorization: What can this user do?

// Check if user has permission
if (user.role !== 'admin') {
  return Response.json({ error: 'Forbidden' }, { status: 403 });
}

// Check if user has credits
if (user.credits < requiredCredits) {
  return Response.json({ error: 'Insufficient credits' }, { status: 402 });
}

Common patterns:

JWT for stateless auth (NextAuth.js, Clerk, Auth0)
Session-based auth for stateful apps
API keys for server-to-server (different from user auth)

JWT Usage (Where It Fits, Where It Doesn't)

Use JWT when:

Stateless authentication (no server-side sessions)
Microservices (token can be verified without DB lookup)
Mobile apps (token stored on device)

Don't use JWT when:

You need to revoke tokens immediately (JWT is valid until expiry)
You need server-side session management
Token size matters (JWTs can be large)

Example (NextAuth.js):

// app/api/auth/[...nextauth]/route.ts
import NextAuth from 'next-auth';

export const authOptions = {
  providers: [
    // ... providers
  ],
  callbacks: {
    async jwt({ token, user }) {
      if (user) {
        token.id = user.id;
        token.role = user.role;
      }
      return token;
    },
    async session({ session, token }) {
      session.user.id = token.id;
      session.user.role = token.role;
      return session;
    },
  },
};

export const handler = NextAuth(authOptions);

Rate Limiting & Abuse Prevention

Why it matters:

Prevents API key abuse
Prevents cost explosions
Prevents DDoS attacks
Ensures fair usage

Implementation:

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '10 s'), // 10 requests per 10 seconds
});

export async function POST(req: Request) {
  const ip = req.headers.get('x-forwarded-for') || 'unknown';
  const { success } = await ratelimit.limit(ip);
  
  if (!success) {
    return Response.json(
      { error: 'Rate limit exceeded' },
      { status: 429 }
    );
  }
  
  // ... rest of handler
}

Rate limit strategies:

Per IP (prevent abuse from single source)
Per user (prevent abuse from single account)
Per feature (different limits for different features)
Tiered (free users: 10/min, paid: 100/min)

Input Sanitization

Never trust user input:

// ❌ BAD
const prompt = userInput;
await openai.chat.completions.create({
  messages: [{ role: 'user', content: prompt }],
});

// ✅ GOOD
const sanitized = sanitizeInput(userInput);
const validated = validateInput(sanitized);
await openai.chat.completions.create({
  messages: [{ role: 'user', content: validated }],
});

What to sanitize:

Remove PII (emails, phone numbers, SSNs)
Remove sensitive data (passwords, API keys)
Limit length (prevent token bombs)
Validate format (structured inputs)
Escape special characters (prevent injection)

Output Filtering

Filter outputs before sending to users:

const output = await generate(input);

// Filter sensitive data
const filtered = filterOutput(output, {
  removePII: true,
  removeSecrets: true,
  maxLength: 10000,
});

return Response.json({ output: filtered });

Why "Vibe Coding" Without Security Is Dangerous

I've seen apps that:

Exposed API keys in client-side code → $5k OpenAI bill
Had no rate limits → DDoS'd themselves
Accepted unlimited input → Token bombs that cost $100s per request
Had no auth → Anyone could use the API
Logged sensitive data → Privacy violations

The cost of skipping security:

Financial (unexpected API bills)
Legal (data breaches, privacy violations)
Reputational (users lose trust)
Operational (downtime, abuse)

The fix:

Security isn't optional. Build it in from day one. It's easier to add security early than to retrofit it later.

Security Checklist

Before shipping, verify:

DevOps & Deployment Setup

Most AI apps are deployed like demos: push to main, hope it works. That's not how you ship production software.

Environment Separation

Three environments minimum:

Development (local)
- Your machine
- .env.local for secrets
- Can break freely
Staging (pre-production)
- Mirrors production
- Test deployments here first
- Real API keys (but test accounts)
Production (live)
- Real users
- Real money
- Zero tolerance for breaks

Why this matters:

Test changes before production
Catch bugs before users see them
Safe rollbacks
Different API keys (so staging doesn't affect production costs)

Implementation:

// lib/config.ts
const env = process.env.NODE_ENV;

export const config = {
  env,
  isDev: env === 'development',
  isStaging: env === 'staging',
  isProd: env === 'production',
  
  openai: {
    apiKey: process.env.OPENAI_API_KEY!,
    model: env === 'production' ? 'gpt-4' : 'gpt-3.5-turbo', // Cheaper in dev
  },
  
  database: {
    url: process.env.DATABASE_URL!,
  },
  
  rateLimit: {
    requests: env === 'production' ? 100 : 1000, // Stricter in prod
    window: '1m',
  },
};

CI/CD Expectations for MVPs vs Scale

For MVPs (shipping fast):

Automated tests (unit tests for critical paths)
Automated deployments (push to main = deploy)
Basic monitoring (errors, logs)

For scale (shipping safely):

Comprehensive test suite (unit, integration, E2E)
Staged deployments (staging → production)
Code review requirements
Automated security scans
Performance testing
Canary deployments
Rollback automation

MVP CI/CD example (GitHub Actions):

# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '20'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run tests
        run: npm test
      
      - name: Deploy to Vercel
        uses: amondnet/vercel-action@v20
        with:
          vercel-token: ${{ secrets.VERCEL_TOKEN }}
          vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
          vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}

Observability Basics

What to monitor:

Errors
- Unhandled exceptions
- API failures
- Database errors
Performance
- Response times
- API latency
- Database query times
Usage
- Request volume
- User activity
- Feature usage
Costs
- API token usage
- Cost per request
- Daily/weekly/monthly spend

Implementation:

// lib/monitoring.ts
import * as Sentry from '@sentry/nextjs';

export function logError(error: Error, context?: Record<string, any>) {
  console.error(error);
  Sentry.captureException(error, { extra: context });
}

export function logEvent(name: string, data?: Record<string, any>) {
  console.log(`[EVENT] ${name}`, data);
  Sentry.captureMessage(name, { level: 'info', extra: data });
}

export function trackCost(feature: string, tokens: number, cost: number) {
  logEvent('cost_tracked', {
    feature,
    tokens,
    cost,
    timestamp: new Date().toISOString(),
  });
}

Tools:

Errors: Sentry, Rollbar, Bugsnag
Logs: Vercel Logs, Datadog, Logtail
Metrics: Vercel Analytics, PostHog, Mixpanel
APM: New Relic, Datadog APM

Cost Explosions and How to Prevent Them

Common causes:

No rate limiting → Users spam requests
No input validation → Token bombs (100k token inputs)
Wrong model → Using GPT-4 for everything
No caching → Regenerating same content
No kill switches → Can't turn off expensive features

Prevention:

// lib/cost-control.ts
export async function generateWithCostControl(
  input: string,
  userId: string
): Promise<string> {
  // 1. Validate input length
  if (input.length > 10000) {
    throw new Error('Input too long');
  }
  
  // 2. Check user credits
  const user = await getUser(userId);
  if (user.credits < 10) {
    throw new Error('Insufficient credits');
  }
  
  // 3. Check daily limit
  const dailyUsage = await getDailyUsage(userId);
  if (dailyUsage.cost > 100) {
    throw new Error('Daily limit exceeded');
  }
  
  // 4. Use appropriate model
  const model = user.plan === 'premium' ? 'gpt-4' : 'gpt-3.5-turbo';
  
  // 5. Generate
  const response = await generate(input, { model });
  
  // 6. Track cost
  const cost = calculateCost(response.usage.total_tokens, model);
  await deductCredits(userId, cost);
  await logCost(userId, cost);
  
  // 7. Check for anomalies
  if (cost > 10) {
    logEvent('high_cost_request', { userId, cost, input: input.substring(0, 100) });
  }
  
  return response.choices[0].message.content;
}

Kill switches:

// lib/feature-flags.ts
export const featureFlags = {
  expensiveFeature: process.env.ENABLE_EXPENSIVE_FEATURE === 'true',
  experimentalModel: process.env.ENABLE_EXPERIMENTAL_MODEL === 'true',
};

export async function generate(input: string) {
  if (!featureFlags.expensiveFeature) {
    throw new Error('Feature temporarily disabled');
  }
  
  // ... generate
}

Rollbacks and Kill Switches

When things go wrong:

Immediate: Kill switch (disable feature)
Short-term: Rollback deployment
Long-term: Fix and redeploy

Kill switch implementation:

// app/api/generate/route.ts
export async function POST(req: Request) {
  // Kill switch check
  if (process.env.KILL_SWITCH_GENERATE === 'true') {
    return Response.json(
      { error: 'Service temporarily unavailable' },
      { status: 503 }
    );
  }
  
  // ... rest of handler
}

Rollback procedure:

Identify the bad deployment
Revert to previous version (Git + Vercel/Railway/etc.)
Verify fix
Investigate root cause
Deploy fix

Automated rollback (advanced):

# Monitor error rate, auto-rollback if > threshold
# (Use your platform's health checks + automation)

Monetization & Paywalls

Most AI apps never monetize. They build features, then try to add billing later. That's backwards. Design monetization into the product from day one.

Why Most AI Apps Fail to Monetize

1. No value proposition

Users don't understand why they should pay. The free tier does everything.

2. Wrong pricing model

Charging per month when usage varies wildly.

3. No enforcement

Free tier limits exist on paper, not in code.

4. Poor paywall UX

Paywalls block users instead of converting them.

Credits vs Subscriptions

Credits (usage-based):

Good for: Variable usage, pay-as-you-go
Example: 1000 credits = $10, each generation costs 10 credits
Pros: Fair, scales with usage
Cons: Harder to predict revenue

Subscriptions (recurring):

Good for: Predictable usage, SaaS model
Example: $29/month = unlimited generations
Pros: Predictable revenue, better for users
Cons: Heavy users cost you money

Hybrid (best of both):

Base subscription + usage-based overage
Example: $19/month = 1000 generations, $0.01 per extra

Usage-Based Pricing Logic

Track everything:

// lib/usage-tracking.ts
export async function trackUsage(
  userId: string,
  feature: string,
  cost: number
) {
  await db.usage.create({
    data: {
      userId,
      feature,
      cost,
      timestamp: new Date(),
    },
  });
  
  // Update user's credit balance
  await db.user.update({
    where: { id: userId },
    data: {
      credits: { decrement: cost },
    },
  });
}

Calculate costs:

export function calculateCost(tokens: number, model: string): number {
  const pricing = {
    'gpt-4': { input: 0.03 / 1000, output: 0.06 / 1000 },
    'gpt-3.5-turbo': { input: 0.0015 / 1000, output: 0.002 / 1000 },
  };
  
  const rates = pricing[model] || pricing['gpt-3.5-turbo'];
  // Simplified: assume 50/50 input/output split
  return (tokens * rates.input) + (tokens * rates.output);
}

Enforcing Limits Server-Side

Never trust the client:

// ❌ BAD: Client-side check
if (user.credits < 10) {
  alert('Not enough credits');
  return;
}
await generate();

// ✅ GOOD: Server-side check
export async function POST(req: Request) {
  const user = await getCurrentUser(req);
  
  if (user.credits < 10) {
    return Response.json(
      { error: 'Insufficient credits' },
      { status: 402 }
    );
  }
  
  // Deduct before generation (atomic)
  await db.user.update({
    where: { id: user.id },
    data: { credits: { decrement: 10 } },
  });
  
  try {
    const result = await generate();
    return Response.json({ result });
  } catch (error) {
    // Refund on error
    await db.user.update({
      where: { id: user.id },
      data: { credits: { increment: 10 } },
    });
    throw error;
  }
}

Free tier limits:

export async function checkLimit(userId: string, feature: string): Promise<boolean> {
  const user = await getUser(userId);
  
  if (user.plan === 'free') {
    const dailyUsage = await getDailyUsage(userId, feature);
    
    // Free tier: 10 generations per day
    if (dailyUsage.count >= 10) {
      return false;
    }
  }
  
  return true;
}

Designing Paywalls Without Killing Activation

Bad paywall:

// Blocks user immediately
{user.credits === 0 && <PaywallModal />}

Good paywall:

// Shows value first, then paywall
{user.credits > 0 && <GenerateButton />}
{user.credits === 0 && (
  <div>
    <p>You've used your free credits! Upgrade to continue.</p>
    <UpgradeButton />
    <p>Or share on Twitter for 10 free credits</p>
  </div>
)}

Paywall best practices:

Show value first (let users try before paying)
Clear pricing (no hidden fees)
Multiple options (free, pro, enterprise)
Social proof (testimonials, usage stats)
Easy upgrade (one click, no friction)
Transparent limits (show what they get)

Example paywall component:

export function Paywall({ user, feature }: { user: User; feature: string }) {
  const limits = {
    free: { generations: 10, features: ['basic'] },
    pro: { generations: 1000, features: ['basic', 'advanced', 'api'] },
  };
  
  return (
    <div className="paywall">
      <h2>Upgrade to continue</h2>
      <p>You've reached your free tier limit for {feature}</p>
      
      <div className="plans">
        <Plan
          name="Pro"
          price="$29/month"
          features={limits.pro.features}
          current={user.plan === 'pro'}
        />
      </div>
      
      <Button onClick={handleUpgrade}>Upgrade now</Button>
    </div>
  );
}

The Exact Workflow I Use (Step-by-Step)

This is the workflow I use to ship AI products. It's not theoretical. It's what I do, in order, every time.

Step 1: Idea → Validation

Don't build yet. Validate first.

Define the outcome (not the feature)
- Bad: "Build an AI writing assistant"
- Good: "Help users write blog posts 10x faster"
Find 5 people who have this problem
- Talk to them
- Understand their current solution
- Validate they'd pay for a better solution
Build a landing page (no code yet)
- Explain the outcome
- Show mockups
- Collect emails
- If 50+ people sign up, proceed

Why this matters:

Most ideas are bad. Validation filters them out before you waste weeks building.

Step 2: UX First, Not Model First

Design the experience before choosing the model.

Map the user journey
- Where do they start?
- What do they input?
- What do they see?
- What happens when it fails?
Design the UI (Figma, sketches, whatever)
- Input form
- Loading states
- Output display
- Error states
Define the API contract
- What does the request look like?
- What does the response look like?
- What are the error cases?

Why this matters:

The model is a detail. The experience is the product. Design the experience first.

Step 3: API Contracts Before UI Polish

Build the backend first. Make it work. Then make it pretty.

Create the API route
- Input validation
- Auth check
- Rate limiting
- AI generation
- Response formatting
Test with curl/Postman
- Valid requests
- Invalid requests
- Edge cases
- Error handling
Only then build the UI
- Call the API
- Handle responses
- Handle errors
- Polish the design

Why this matters:

Backend defines what's possible. Frontend is presentation. Build the foundation first.

Step 4: Build Thin, Ship Early, Harden Later

Ship the minimum that delivers value. Improve based on feedback.

v1 (Week 1):

Basic input/output
No streaming
No retries
Basic error handling
Deploy to production

v2 (Week 2):

Add streaming
Add retries
Better error messages
Improve UX

v3 (Week 3):

Add paywall
Add usage tracking
Add analytics
Optimize costs

Why this matters:

Perfect is the enemy of shipped. Ship something that works, then improve it.

Step 5: What I Deliberately Ignore in v1

Things I skip in the first version:

Comprehensive test suite (unit tests for critical paths only)
Perfect error handling (basic try/catch is enough)
Advanced monitoring (basic logs are enough)
Multiple models (pick one, stick with it)
Complex state management (keep it simple)
Optimizations (premature optimization is evil)
Documentation (code should be self-documenting)

Things I never skip:

Authentication (who is this user?)
Rate limiting (prevent abuse)
Input validation (prevent token bombs)
Error boundaries (don't crash the app)
Basic logging (need to debug)
Environment variables (never hardcode secrets)

Why this matters:

Focus on what matters. Ignore what doesn't. Ship faster.

The Complete Workflow Timeline

Week 1: Foundation

Day 1-2: Validation (landing page, user interviews)
Day 3-4: UX design (journey map, UI mockups)
Day 5-6: API development (backend, testing)
Day 7: Basic UI (connect to API, deploy)

Week 2: Polish

Day 8-9: Streaming, better UX
Day 10-11: Error handling, retries
Day 12-13: Testing, bug fixes
Day 14: Launch (share with initial users)

Week 3: Scale

Day 15-16: Paywall, usage tracking
Day 17-18: Analytics, monitoring
Day 19-20: Optimizations, cost control
Day 21: Iterate based on feedback

This is realistic. Not "ship in 2 hours." Not "ship in 6 months." Three weeks to something real.

Common Mistakes I See After Reviewing Dozens of AI Apps

I've reviewed dozens of AI apps. The same mistakes show up over and over. Here's what to avoid.

Mistake 1: Hardcoded Keys

The mistake:

const apiKey = 'sk-...';

Why it's bad:

Keys get committed to git
Keys get exposed in client-side code
Keys get shared in screenshots
Result: $10k OpenAI bill

The fix:

const apiKey = process.env.OPENAI_API_KEY;

Always use environment variables. Always.

Mistake 2: No Backend

The mistake:

// Frontend calling OpenAI directly
const response = await openai.chat.completions.create({
  apiKey: 'sk-...', // Exposed!
  // ...
});

Why it's bad:

Can't hide API keys
Can't rate limit
Can't track usage
Can't enforce paywalls
Can't prevent abuse

The fix:

Always use a backend proxy. Always.

Mistake 3: No Abuse Protection

The mistake:

// No rate limiting, no input validation
export async function POST(req: Request) {
  const { input } = await req.json();
  return await generate(input); // Anything goes!
}

Why it's bad:

Users can send 100k token inputs
Users can spam requests
Users can DDoS your API
Result: $1000s in unexpected costs

The fix:

// Rate limit + input validation
export async function POST(req: Request) {
  // Rate limit
  const { success } = await ratelimit.limit(userId);
  if (!success) return Response.json({ error: 'Rate limited' }, { status: 429 });
  
  // Validate input
  const { input } = await req.json();
  if (input.length > 10000) {
    return Response.json({ error: 'Input too long' }, { status: 400 });
  }
  
  return await generate(input);
}

Mistake 4: Over-Engineering Infra Too Early

The mistake:

Kubernetes for an MVP
Microservices for 100 users
Complex CI/CD for a weekend project
Over-architecting before you have users

Why it's bad:

Wastes time
Adds complexity
Slows iteration
Premature optimization

The fix:

Start simple. Vercel/Railway/Render for hosting. PostgreSQL for database. Add complexity when you need it.

Mistake 5: Shipping Features Instead of Outcomes

The mistake:

Building 10 features before launching
Perfecting the UI before validating the idea
Adding "nice to have" features before core works

Why it's bad:

Wastes time on things users don't want
Delays learning
Delays revenue
Builds the wrong product

The fix:

Ship the minimum that delivers value. One feature that works is better than ten features that don't.

Mistake 6: No Error Handling

The mistake:

const result = await generate(input);
return result; // What if it fails?

Why it's bad:

App crashes on errors
Users see cryptic error messages
No way to debug issues
Bad user experience

The fix:

try {
  const result = await generate(input);
  return Response.json({ result });
} catch (error) {
  console.error(error);
  return Response.json(
    { error: 'Generation failed. Please try again.' },
    { status: 500 }
  );
}

Mistake 7: Ignoring Costs

The mistake:

Using GPT-4 for everything
No cost tracking
No usage limits
No kill switches

Why it's bad:

Unexpected bills
Can't optimize
Can't price correctly
Can go bankrupt

The fix:

Track every request. Log costs. Set limits. Use cheaper models when possible.

Mistake 8: No Observability

The mistake:

No error tracking
No usage analytics
No performance monitoring
Flying blind

Why it's bad:

Can't debug issues
Can't understand usage
Can't optimize
Can't make data-driven decisions

The fix:

Add error tracking (Sentry). Add analytics (PostHog). Add logging. Know what's happening.

Final Cheat Sheet (Skimmable)

Print this. Keep it handy. Reference it before shipping.

Architecture Checklist

Backend proxy (never frontend → OpenAI directly)
Environment variables for all secrets
Authentication on all protected endpoints
Rate limiting implemented
Input validation and sanitization
Error handling and logging
Cost tracking and limits
Observability (errors, logs, metrics)

Security Checklist

UX Checklist

Streaming responses (not spinners)
Input constraints (length, format)
Clear error messages
Retry mechanisms
Loading states (skeletons, not spinners)
Optimistic UI where possible
Mobile responsive
Accessible (keyboard navigation, screen readers)

Deployment Checklist

Environment separation (dev/staging/prod)
CI/CD pipeline
Error tracking (Sentry)
Logging (structured logs)
Monitoring (uptime, performance)
Cost monitoring (API spend)
Kill switches for expensive features
Rollback procedure documented

Monetization Checklist

Development Workflow

Validate → Landing page, user interviews
Design → UX first, model second
Build → API contracts before UI polish
Ship → Thin v1, improve based on feedback
Iterate → Data-driven improvements

Tools I Use

IDE: Cursor
Framework: Next.js
Database: PostgreSQL (Supabase/Railway)
Auth: NextAuth.js / Clerk
Hosting: Vercel / Railway
Error Tracking: Sentry
Analytics: PostHog / Vercel Analytics
Rate Limiting: Upstash
Billing: Stripe

Rules to Live By

Backend always. Never frontend → AI directly.
Security first. Never skip it.
Ship thin. Perfect is the enemy of shipped.
Track costs. Every request, every token.
Design for failure. AI will fail. Handle it.
Validate early. Don't build in a vacuum.
Monitor everything. You can't fix what you can't see.

Closing

If you've read this far, you're serious about building real AI products. That's good. The world needs more builders, fewer demos.

This workflow isn't theoretical. It's what I use to ship software that works. It's opinionated. It's specific. It assumes you can code but haven't shipped production AI software before.

The gap between "AI demo" and "production AI app" is massive. Most people never cross it. They build demos, get excited, then hit a wall when real users show up.

You don't have to hit that wall.

Follow this system. Use these patterns. Avoid these mistakes. Ship something real.

The models are good enough. The tools are good enough. The only thing missing is the system. Now you have it.

Build something people actually use. Charge for it. Make it work.

If you're building something real and want to talk shop, find me on Twitter. I review AI apps and give honest feedback. No fluff. No corporate speak. Just real talk about what works and what doesn't.

Now go ship.

In this article

The Ultimate AI App & MVP Workflow - Ship Production Software, Not Demos

Table of Contents

Hook / Context

The Real AI App Stack (High-Level)

1. Product & UX Layer

2. Frontend Layer

3. Backend & Orchestration Layer

4. AI Layer

5. Data & Memory Layer

6. Security Layer

7. Infra & DevOps Layer

8. Monetization & Scaling Layer

IDE & Core Workflow

Why Cursor Works

Rules for Prompting Cursor

When to Let AI Generate Code

Folder-Level Prompting

File-Level Prompting

Anti-Patterns That Cause Bad AI Code

The Cursor Workflow I Use

Design & UX Rules for AI Products

Why AI UX Is Different

Latency Masking Patterns

Input Constraints > Prompt Engineering

Designing for Failure

The UX Checklist

Backend & AI Orchestration

Why Direct Frontend → OpenAI Is a Mistake

Backend Proxy Patterns

State Machines for Multi-Step Workflows

Managing Retries, Fallbacks, and Hallucination Control

Logging and Traceability as First-Class Concerns

Security Checklist (NON-NEGOTIABLE)

API Key Handling

Secrets Management

Authentication vs Authorization

JWT Usage (Where It Fits, Where It Doesn't)

Rate Limiting & Abuse Prevention

Input Sanitization

Output Filtering

Why "Vibe Coding" Without Security Is Dangerous

Security Checklist

DevOps & Deployment Setup

Environment Separation

CI/CD Expectations for MVPs vs Scale

Observability Basics

Cost Explosions and How to Prevent Them

Rollbacks and Kill Switches

Monetization & Paywalls

Why Most AI Apps Fail to Monetize

Credits vs Subscriptions

Usage-Based Pricing Logic

Enforcing Limits Server-Side

Designing Paywalls Without Killing Activation

The Exact Workflow I Use (Step-by-Step)

Step 1: Idea → Validation

Step 2: UX First, Not Model First

Step 3: API Contracts Before UI Polish

Step 4: Build Thin, Ship Early, Harden Later

Step 5: What I Deliberately Ignore in v1

The Complete Workflow Timeline

Common Mistakes I See After Reviewing Dozens of AI Apps

Mistake 1: Hardcoded Keys

Mistake 2: No Backend

Mistake 3: No Abuse Protection

Mistake 4: Over-Engineering Infra Too Early

Mistake 5: Shipping Features Instead of Outcomes

Mistake 6: No Error Handling

Mistake 7: Ignoring Costs

Mistake 8: No Observability

Final Cheat Sheet (Skimmable)

Architecture Checklist

Security Checklist

UX Checklist

Deployment Checklist

Monetization Checklist

Development Workflow

Tools I Use

Rules to Live By