Dreamlaunch

The Ultimate AI App & MVP Workflow - Ship Production Software, Not Demos
45 min readAI app development workflow

The Ultimate AI App & MVP Workflow - Ship Production Software, Not Demos

Most AI apps fail. Not because the models are bad. Not because the idea is wrong. They fail because the people building them treat AI apps like regular apps, or worse, like demos.

I've shipped 30+ production AI products. I've seen the same mistakes kill projects before they even get users. This isn't a tutorial. This is the system I use to ship software that actually works.


Table of Contents

  1. Hook / Context
  2. The Real AI App Stack (High-Level)
  3. IDE & Core Workflow
  4. Design & UX Rules for AI Products
  5. Backend & AI Orchestration
  6. Security Checklist (NON-NEGOTIABLE)
  7. DevOps & Deployment Setup
  8. Monetization & Paywalls
  9. The Exact Workflow I Use (Step-by-Step)
  10. Common Mistakes I See After Reviewing Dozens of AI Apps
  11. Final Cheat Sheet (Skimmable)
  12. Closing

Hook / Context

You've seen the demos. The Twitter threads showing "I built an AI app in 2 hours." The GitHub repos with 10k stars and zero production users. The landing pages promising the moon, backed by code that breaks when you look at it wrong.

Here's what they're not showing you: the $5,000 OpenAI bill from one weekend. The security holes that leak API keys. The users who hit rate limits on day one. The apps that work in the demo but fail when real people use them.

The gap between "AI demo" and "production AI app" is massive. Most people never cross it.

I've built AI products for startups that raised Series A. I've built internal tools for enterprises processing millions of requests. I've also seen dozens of "AI apps" that were one API call away from being a security disaster.

The difference isn't the model. It's the system.

This post is that system. It's the workflow I use to ship AI products that don't break, don't leak secrets, and don't cost $10k in unexpected API bills. It's opinionated. It's specific. It assumes you can code but haven't shipped production AI software before.

If you want to build something real, read this. If you want to build a demo, there are plenty of YouTube tutorials for that.


The Real AI App Stack (High-Level)

An AI app isn't a frontend calling OpenAI. That's a prototype. A production AI app has seven layers, and most people skip four of them.

1. Product & UX Layer

This is where most AI apps die. You can't prompt-engineer your way out of a bad product.

What it includes:

  • User intent understanding (what are they actually trying to do?)
  • Input constraints (don't let users paste novels)
  • Output expectations (what does "done" look like?)
  • Failure states (what happens when the model hallucinates?)

The mistake: Building the AI feature first, then figuring out the product.

The fix: Design the user outcome first. The AI is a means, not the end.

2. Frontend Layer

Your UI needs to handle latency, streaming, partial outputs, and failures gracefully.

What it includes:

  • Streaming UI (show progress, not spinners)
  • Optimistic updates (make it feel instant)
  • Skeleton states (mask loading)
  • Error boundaries (fail gracefully)
  • Input validation (constrain before sending)

The mistake: Building a form that submits and shows a spinner for 10 seconds.

The fix: Stream responses, show progress, validate inputs client-side.

3. Backend & Orchestration Layer

This is where the magic happens. Or where everything breaks.

What it includes:

  • API proxy (never expose keys to frontend)
  • Request routing (which model? which endpoint?)
  • Tool calling / function routing (when to call external APIs)
  • State machines (multi-step workflows)
  • Retry logic (with exponential backoff)
  • Fallback chains (model A fails, try model B)
  • Rate limiting (per user, per IP, per feature)
  • Cost tracking (log every token)

The mistake: Frontend → OpenAI directly. No backend. No protection.

The fix: Everything goes through your backend. Always.

4. AI Layer

The models themselves. This is the smallest part of the stack, but everyone obsesses over it.

What it includes:

  • Model selection (GPT-4 vs Claude vs open source)
  • Prompt templates (versioned, tested)
  • Context management (RAG, memory, conversation history)
  • Token optimization (trim context, compress prompts)
  • Output parsing (structured extraction, validation)

The mistake: Using GPT-4 for everything, ignoring costs, no prompt versioning.

The fix: Right model for the job. Track costs. Version prompts like code.

5. Data & Memory Layer

AI apps need memory. Users expect continuity.

What it includes:

  • Conversation history (vector DB or SQL)
  • User preferences (what they like, what they don't)
  • Context windows (what to include, what to exclude)
  • Embeddings (for RAG, search, similarity)
  • Cache layer (don't regenerate the same thing)

The mistake: Stateless apps that forget everything.

The fix: Store conversations. Build context. Use RAG when needed.

6. Security Layer

This isn't optional. AI apps are attack vectors waiting to happen.

What it includes:

  • API key management (never in code, never in frontend)
  • Authentication (who is this user?)
  • Authorization (what can they do?)
  • Input sanitization (prevent injection attacks)
  • Output filtering (prevent data leaks)
  • Rate limiting (prevent abuse)
  • Audit logging (who did what, when)

The mistake: Hardcoded keys, no auth, no rate limits.

The fix: Secrets in env vars. Auth on every endpoint. Rate limits everywhere.

7. Infra & DevOps Layer

How you deploy, monitor, and scale.

What it includes:

  • Environment separation (dev, staging, prod)
  • CI/CD (automated tests, deployments)
  • Observability (logs, errors, metrics)
  • Cost monitoring (track API spend)
  • Kill switches (turn off expensive features)
  • Rollback procedures (when things break)

The mistake: Deploying to production from localhost. No monitoring. No rollback plan.

The fix: Proper environments. Automated deployments. Real observability.

8. Monetization & Scaling Layer

Most AI apps never get here because they die earlier. But if you make it, this is critical.

What it includes:

  • Usage tracking (credits, tokens, requests)
  • Billing integration (Stripe, Paddle)
  • Paywall logic (free tier limits)
  • Subscription management
  • Cost allocation (what features cost what)

The mistake: Building features, then trying to monetize.

The fix: Design monetization into the product from day one.


IDE & Core Workflow

I use Cursor. Not because it's perfect, but because it's the best tool for shipping AI products fast. Here's how I use it without creating garbage code.

Why Cursor Works

Cursor understands your codebase. It can read multiple files, understand context, and make changes across your project. ChatGPT can't do that. GitHub Copilot can't do that. This is why Cursor wins for production work.

Rules for Prompting Cursor

1. Be specific about scope

Bad: "Add authentication"

Good: "Add NextAuth.js authentication to this Next.js app. Use email/password and Google OAuth. Store sessions in the existing PostgreSQL database. Add a protected route at /dashboard that requires auth."

2. Reference existing patterns

Bad: "Create a new API route"

Good: "Create a new API route following the same pattern as /api/users/route.ts. Use the same error handling and response format."

3. Specify file locations

Bad: "Add a component for user profiles"

Good: "Create a new component at components/user-profile.tsx that displays user information. Use the existing User type from lib/types.ts."

4. Include constraints

Bad: "Make it responsive"

Good: "Make it responsive using Tailwind breakpoints. Mobile-first design. Max width 1280px on desktop."

When to Let AI Generate Code

Let AI generate:

  • Boilerplate (API routes, CRUD operations)
  • Type definitions (from existing data structures)
  • Test cases (unit tests, integration tests)
  • Documentation (JSDoc comments, README sections)
  • Error handling patterns (try/catch, validation)

Don't let AI generate:

  • Business logic (you understand the domain better)
  • Security-critical code (auth, payments, secrets)
  • Performance-critical paths (AI doesn't optimize well)
  • Complex state management (AI creates overcomplicated solutions)

Folder-Level Prompting

When working on a feature that spans multiple files:

I'm building a feature for user onboarding. It needs:

1. A new API route at `/api/onboarding/route.ts` that:
   - Accepts POST requests with user data
   - Validates input using Zod
   - Creates a user record in the database
   - Sends a welcome email
   - Returns the created user

2. A new page at `app/onboarding/page.tsx` that:
   - Shows a multi-step form (3 steps)
   - Uses the existing form components from `components/forms/`
   - Calls the API route on submit
   - Handles errors and loading states

3. Update the database schema to include an `onboarding_completed` field

Follow existing patterns in the codebase. Use TypeScript. Use the existing error handling utilities.

File-Level Prompting

When editing a single file:

In this file, I need to:

1. Add a new function `validateUserInput` that takes user data and returns validation errors
2. Update the `createUser` function to use the new validator
3. Add error handling for database connection failures
4. Add JSDoc comments to all exported functions

Keep the existing code style. Don't change anything else.

Anti-Patterns That Cause Bad AI Code

1. Vague prompts

"Make it better" → AI will change random things.

2. No context

"Add a button" → AI doesn't know where, what style, what it does.

3. Too many changes at once

"Refactor the entire auth system and add OAuth and update the UI" → AI will break things.

4. Ignoring existing patterns

"Add a new API route" without showing existing routes → AI creates inconsistent code.

5. Not reviewing AI output

Accepting everything AI generates → Technical debt and bugs.

The Cursor Workflow I Use

  1. Plan the change (in my head or notes)
  2. Find similar code (grep for patterns)
  3. Prompt Cursor with context (reference existing code)
  4. Review the diff (does it make sense?)
  5. Test it (does it work?)
  6. Refine if needed (small follow-up prompts)

I never let Cursor make large architectural changes. I use it for implementation, not design.


Design & UX Rules for AI Products

AI products have different UX requirements than regular apps. Most people ignore this and build forms that submit to APIs. That's not good enough.

Why AI UX Is Different

Latency is unpredictable

A regular API call takes 100-500ms. An AI call takes 2-10 seconds. Sometimes 30 seconds. Users will think your app is broken.

Outputs are non-deterministic

The same input can produce different outputs. Users need to understand this.

Failures are common

Models hallucinate. APIs rate limit. Networks fail. Your UI must handle this gracefully.

Partial outputs are valuable

Users don't want to wait 10 seconds for nothing. Show progress. Stream responses.

Latency Masking Patterns

1. Streaming responses

Don't wait for the full response. Stream tokens as they arrive.

// Bad: Wait for everything
const response = await fetch('/api/generate');
const data = await response.json();
setOutput(data.text);

// Good: Stream it
const response = await fetch('/api/generate');
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  setOutput(prev => prev + chunk);
}

2. Optimistic UI

Show the expected result immediately, update when real data arrives.

// User submits form
setOptimisticResult(calculateExpectedResult(input));

// Then fetch real result
const realResult = await fetch('/api/process');
setOptimisticResult(realResult);

3. Skeleton states

Show the structure of what's coming, not a spinner.

// Bad: Spinner
{isLoading && <Spinner />}

// Good: Skeleton
{isLoading && <ResultSkeleton />}

4. Progressive enhancement

Show what you can, when you can.

// Show metadata first
setMetadata(extractMetadata(response));

// Then show full content
setContent(await streamFullContent(response));

Input Constraints > Prompt Engineering

Most people spend hours on prompts. They should spend hours on input validation.

Why constraints matter:

  • Shorter inputs = faster responses = lower costs
  • Validated inputs = fewer errors = better outputs
  • Constrained inputs = predictable outputs = better UX

What to constrain:

  • Length (max characters, max words)
  • Format (structured data, specific fields)
  • Content (no PII, no sensitive data)
  • Language (if you only support English, say so)

Example:

// Bad: Accept anything
const prompt = userInput;

// Good: Constrain it
const schema = z.object({
  topic: z.string().min(10).max(200),
  tone: z.enum(['professional', 'casual', 'friendly']),
  length: z.enum(['short', 'medium', 'long'])
});

const validated = schema.parse(userInput);
const prompt = buildPrompt(validated);

Designing for Failure

Your AI will fail. Design for it.

1. Retry logic (with limits)

async function generateWithRetry(input: string, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await generate(input);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(1000 * (i + 1)); // Exponential backoff
    }
  }
}

2. Fallback chains

async function generateWithFallback(input: string) {
  try {
    return await generateWithGPT4(input);
  } catch (error) {
    console.warn('GPT-4 failed, trying GPT-3.5');
    return await generateWithGPT35(input);
  }
}

3. Partial outputs

// If generation fails halfway, show what you got
try {
  const fullOutput = await streamGeneration(input);
  setOutput(fullOutput);
} catch (error) {
  // Keep partial output, show error message
  setError('Generation incomplete. Partial result shown.');
}

4. Clear error messages

// Bad: Generic error
setError('Something went wrong');

// Good: Specific error
if (error.code === 'RATE_LIMIT') {
  setError('Too many requests. Please wait a moment.');
} else if (error.code === 'INVALID_INPUT') {
  setError('Your input is too long. Please shorten it.');
} else {
  setError('Generation failed. Please try again.');
}

The UX Checklist

Before shipping, ask:

  • Can users see progress during long operations?
  • Are inputs validated before sending?
  • Are errors clear and actionable?
  • Is there a retry mechanism?
  • Can users cancel long-running operations?
  • Are partial outputs shown if generation fails?
  • Is the UI responsive during AI operations?
  • Are loading states informative (not just spinners)?

Backend & AI Orchestration

This is where most AI apps die. People build frontends that call OpenAI directly. That's a prototype, not a product.

Why Direct Frontend → OpenAI Is a Mistake

1. Security

You can't hide API keys in the frontend. They'll be exposed. Someone will find them. You'll get a $10k bill.

2. No control

You can't rate limit. You can't log requests. You can't track costs. You can't prevent abuse.

3. No orchestration

You can't chain multiple API calls. You can't use tool calling. You can't implement retries or fallbacks.

4. No business logic

You can't enforce usage limits. You can't check subscriptions. You can't add paywalls.

Always use a backend proxy.

Backend Proxy Patterns

Pattern 1: Simple Proxy

// app/api/generate/route.ts
export async function POST(req: Request) {
  const { input } = await req.json();
  
  // Validate input
  if (!input || input.length > 1000) {
    return Response.json({ error: 'Invalid input' }, { status: 400 });
  }
  
  // Check auth
  const user = await getCurrentUser(req);
  if (!user) {
    return Response.json({ error: 'Unauthorized' }, { status: 401 });
  }
  
  // Check rate limits
  const rateLimited = await checkRateLimit(user.id);
  if (rateLimited) {
    return Response.json({ error: 'Rate limited' }, { status: 429 });
  }
  
  // Call OpenAI
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: input }],
  });
  
  // Log usage
  await logUsage(user.id, response.usage);
  
  return Response.json({ output: response.choices[0].message.content });
}

Pattern 2: Streaming Proxy

export async function POST(req: Request) {
  const { input } = await req.json();
  
  // ... validation, auth, rate limits ...
  
  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: input }],
    stream: true,
  });
  
  // Create a readable stream
  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        controller.enqueue(encoder.encode(content));
      }
      controller.close();
    },
  });
  
  return new Response(readable, {
    headers: { 'Content-Type': 'text/event-stream' },
  });
}

Pattern 3: Tool Calling / Function Routing

export async function POST(req: Request) {
  const { input, tools } = await req.json();
  
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: input }],
    tools: [
      {
        type: 'function',
        function: {
          name: 'get_weather',
          description: 'Get weather for a location',
          parameters: {
            type: 'object',
            properties: {
              location: { type: 'string' },
            },
          },
        },
      },
    ],
  });
  
  const message = response.choices[0].message;
  
  // Check if model wants to call a function
  if (message.tool_calls) {
    for (const toolCall of message.tool_calls) {
      if (toolCall.function.name === 'get_weather') {
        const args = JSON.parse(toolCall.function.arguments);
        const weather = await fetchWeather(args.location);
        
        // Call model again with function result
        const secondResponse = await openai.chat.completions.create({
          model: 'gpt-4',
          messages: [
            { role: 'user', content: input },
            message,
            {
              role: 'tool',
              tool_call_id: toolCall.id,
              content: JSON.stringify(weather),
            },
          ],
        });
        
        return Response.json({
          output: secondResponse.choices[0].message.content,
        });
      }
    }
  }
  
  return Response.json({ output: message.content });
}

State Machines for Multi-Step Workflows

Complex AI workflows need state machines. Don't try to manage this with if/else.

type WorkflowState =
  | { type: 'idle' }
  | { type: 'validating'; input: string }
  | { type: 'generating'; validatedInput: string }
  | { type: 'post-processing'; output: string }
  | { type: 'complete'; finalOutput: string }
  | { type: 'error'; error: string };

async function runWorkflow(input: string): Promise<string> {
  let state: WorkflowState = { type: 'idle' };
  
  try {
    // Validate
    state = { type: 'validating', input };
    const validated = await validateInput(input);
    
    // Generate
    state = { type: 'generating', validatedInput: validated };
    const generated = await generate(validated);
    
    // Post-process
    state = { type: 'post-processing', output: generated };
    const processed = await postProcess(generated);
    
    // Complete
    state = { type: 'complete', finalOutput: processed };
    return processed;
  } catch (error) {
    state = { type: 'error', error: error.message };
    throw error;
  }
}

Managing Retries, Fallbacks, and Hallucination Control

Retry logic:

async function generateWithRetry(
  input: string,
  options: { maxRetries?: number; backoffMs?: number } = {}
): Promise<string> {
  const { maxRetries = 3, backoffMs = 1000 } = options;
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await generate(input);
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      
      const delay = backoffMs * Math.pow(2, attempt);
      await sleep(delay);
    }
  }
  
  throw new Error('Max retries exceeded');
}

Fallback chains:

async function generateWithFallback(input: string): Promise<string> {
  const models = ['gpt-4', 'gpt-3.5-turbo', 'claude-3-opus'];
  
  for (const model of models) {
    try {
      return await generate(input, { model });
    } catch (error) {
      console.warn(`${model} failed, trying next`);
      continue;
    }
  }
  
  throw new Error('All models failed');
}

Hallucination control:

async function generateWithValidation(input: string): Promise<string> {
  const output = await generate(input);
  
  // Check for hallucinations
  const validation = await validateOutput(output, input);
  
  if (!validation.isValid) {
    // Regenerate with stricter prompt
    return await generate(input, {
      systemPrompt: 'Be extremely factual. If unsure, say so.',
    });
  }
  
  return output;
}

Logging and Traceability as First-Class Concerns

Every AI request should be logged. You need to debug issues, track costs, and understand usage.

async function generateWithLogging(input: string, userId: string) {
  const requestId = crypto.randomUUID();
  const startTime = Date.now();
  
  try {
    const response = await openai.chat.completions.create({
      model: 'gpt-4',
      messages: [{ role: 'user', content: input }],
    });
    
    const duration = Date.now() - startTime;
    const tokens = response.usage?.total_tokens || 0;
    const cost = calculateCost(tokens, 'gpt-4');
    
    // Log success
    await logRequest({
      requestId,
      userId,
      input,
      output: response.choices[0].message.content,
      tokens,
      cost,
      duration,
      status: 'success',
    });
    
    return response.choices[0].message.content;
  } catch (error) {
    const duration = Date.now() - startTime;
    
    // Log failure
    await logRequest({
      requestId,
      userId,
      input,
      error: error.message,
      duration,
      status: 'error',
    });
    
    throw error;
  }
}

What to log:

  • Request ID (for tracing)
  • User ID (for attribution)
  • Input (for debugging)
  • Output (for quality analysis)
  • Tokens used (for cost tracking)
  • Duration (for performance)
  • Model used (for cost allocation)
  • Status (success/error)
  • Error messages (if failed)

Security Checklist (NON-NEGOTIABLE)

I've seen too many AI apps with hardcoded API keys, no authentication, and zero rate limiting. This section is non-negotiable. If you skip it, you're building a liability, not a product.

API Key Handling

Never do this:

// ❌ NEVER
const OPENAI_API_KEY = 'sk-...';

Always do this:

// ✅ ALWAYS
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;

if (!OPENAI_API_KEY) {
  throw new Error('OPENAI_API_KEY is not set');
}

Environment variables:

  • Use .env.local for local development (gitignored)
  • Use your hosting platform's secrets manager for production
  • Never commit secrets to git
  • Rotate keys regularly
  • Use different keys for dev/staging/prod

Secrets Management

For local development:

# .env.local (gitignored)
OPENAI_API_KEY=sk-...
DATABASE_URL=postgresql://...
NEXTAUTH_SECRET=...

For production (Vercel example):

# Set in Vercel dashboard
vercel env add OPENAI_API_KEY

For other platforms:

  • AWS: AWS Secrets Manager
  • GCP: Secret Manager
  • Azure: Key Vault
  • Railway/Render: Environment variables in dashboard

Never:

  • Hardcode secrets
  • Commit .env files
  • Share secrets in Slack/Discord
  • Log secrets (even in error messages)

Authentication vs Authorization

Authentication: Who is this user?

// Check if user is logged in
const user = await getCurrentUser(req);
if (!user) {
  return Response.json({ error: 'Unauthorized' }, { status: 401 });
}

Authorization: What can this user do?

// Check if user has permission
if (user.role !== 'admin') {
  return Response.json({ error: 'Forbidden' }, { status: 403 });
}

// Check if user has credits
if (user.credits < requiredCredits) {
  return Response.json({ error: 'Insufficient credits' }, { status: 402 });
}

Common patterns:

  • JWT for stateless auth (NextAuth.js, Clerk, Auth0)
  • Session-based auth for stateful apps
  • API keys for server-to-server (different from user auth)

JWT Usage (Where It Fits, Where It Doesn't)

Use JWT when:

  • Stateless authentication (no server-side sessions)
  • Microservices (token can be verified without DB lookup)
  • Mobile apps (token stored on device)

Don't use JWT when:

  • You need to revoke tokens immediately (JWT is valid until expiry)
  • You need server-side session management
  • Token size matters (JWTs can be large)

Example (NextAuth.js):

// app/api/auth/[...nextauth]/route.ts
import NextAuth from 'next-auth';

export const authOptions = {
  providers: [
    // ... providers
  ],
  callbacks: {
    async jwt({ token, user }) {
      if (user) {
        token.id = user.id;
        token.role = user.role;
      }
      return token;
    },
    async session({ session, token }) {
      session.user.id = token.id;
      session.user.role = token.role;
      return session;
    },
  },
};

export const handler = NextAuth(authOptions);

Rate Limiting & Abuse Prevention

Why it matters:

  • Prevents API key abuse
  • Prevents cost explosions
  • Prevents DDoS attacks
  • Ensures fair usage

Implementation:

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '10 s'), // 10 requests per 10 seconds
});

export async function POST(req: Request) {
  const ip = req.headers.get('x-forwarded-for') || 'unknown';
  const { success } = await ratelimit.limit(ip);
  
  if (!success) {
    return Response.json(
      { error: 'Rate limit exceeded' },
      { status: 429 }
    );
  }
  
  // ... rest of handler
}

Rate limit strategies:

  • Per IP (prevent abuse from single source)
  • Per user (prevent abuse from single account)
  • Per feature (different limits for different features)
  • Tiered (free users: 10/min, paid: 100/min)

Input Sanitization

Never trust user input:

// ❌ BAD
const prompt = userInput;
await openai.chat.completions.create({
  messages: [{ role: 'user', content: prompt }],
});

// ✅ GOOD
const sanitized = sanitizeInput(userInput);
const validated = validateInput(sanitized);
await openai.chat.completions.create({
  messages: [{ role: 'user', content: validated }],
});

What to sanitize:

  • Remove PII (emails, phone numbers, SSNs)
  • Remove sensitive data (passwords, API keys)
  • Limit length (prevent token bombs)
  • Validate format (structured inputs)
  • Escape special characters (prevent injection)

Output Filtering

Filter outputs before sending to users:

const output = await generate(input);

// Filter sensitive data
const filtered = filterOutput(output, {
  removePII: true,
  removeSecrets: true,
  maxLength: 10000,
});

return Response.json({ output: filtered });

Why "Vibe Coding" Without Security Is Dangerous

I've seen apps that:

  • Exposed API keys in client-side code → $5k OpenAI bill
  • Had no rate limits → DDoS'd themselves
  • Accepted unlimited input → Token bombs that cost $100s per request
  • Had no auth → Anyone could use the API
  • Logged sensitive data → Privacy violations

The cost of skipping security:

  • Financial (unexpected API bills)
  • Legal (data breaches, privacy violations)
  • Reputational (users lose trust)
  • Operational (downtime, abuse)

The fix:

Security isn't optional. Build it in from day one. It's easier to add security early than to retrofit it later.

Security Checklist

Before shipping, verify:

  • No hardcoded API keys or secrets
  • All secrets in environment variables
  • Authentication on all protected endpoints
  • Authorization checks for user permissions
  • Rate limiting implemented
  • Input validation and sanitization
  • Output filtering for sensitive data
  • Error messages don't leak secrets
  • HTTPS only (no HTTP in production)
  • CORS configured correctly
  • SQL injection prevention (parameterized queries)
  • XSS prevention (sanitize user input)
  • Audit logging for sensitive operations

DevOps & Deployment Setup

Most AI apps are deployed like demos: push to main, hope it works. That's not how you ship production software.

Environment Separation

Three environments minimum:

  1. Development (local)

    • Your machine
    • .env.local for secrets
    • Can break freely
  2. Staging (pre-production)

    • Mirrors production
    • Test deployments here first
    • Real API keys (but test accounts)
  3. Production (live)

    • Real users
    • Real money
    • Zero tolerance for breaks

Why this matters:

  • Test changes before production
  • Catch bugs before users see them
  • Safe rollbacks
  • Different API keys (so staging doesn't affect production costs)

Implementation:

// lib/config.ts
const env = process.env.NODE_ENV;

export const config = {
  env,
  isDev: env === 'development',
  isStaging: env === 'staging',
  isProd: env === 'production',
  
  openai: {
    apiKey: process.env.OPENAI_API_KEY!,
    model: env === 'production' ? 'gpt-4' : 'gpt-3.5-turbo', // Cheaper in dev
  },
  
  database: {
    url: process.env.DATABASE_URL!,
  },
  
  rateLimit: {
    requests: env === 'production' ? 100 : 1000, // Stricter in prod
    window: '1m',
  },
};

CI/CD Expectations for MVPs vs Scale

For MVPs (shipping fast):

  • Automated tests (unit tests for critical paths)
  • Automated deployments (push to main = deploy)
  • Basic monitoring (errors, logs)

For scale (shipping safely):

  • Comprehensive test suite (unit, integration, E2E)
  • Staged deployments (staging → production)
  • Code review requirements
  • Automated security scans
  • Performance testing
  • Canary deployments
  • Rollback automation

MVP CI/CD example (GitHub Actions):

# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '20'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run tests
        run: npm test
      
      - name: Deploy to Vercel
        uses: amondnet/vercel-action@v20
        with:
          vercel-token: ${{ secrets.VERCEL_TOKEN }}
          vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
          vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}

Observability Basics

What to monitor:

  1. Errors

    • Unhandled exceptions
    • API failures
    • Database errors
  2. Performance

    • Response times
    • API latency
    • Database query times
  3. Usage

    • Request volume
    • User activity
    • Feature usage
  4. Costs

    • API token usage
    • Cost per request
    • Daily/weekly/monthly spend

Implementation:

// lib/monitoring.ts
import * as Sentry from '@sentry/nextjs';

export function logError(error: Error, context?: Record<string, any>) {
  console.error(error);
  Sentry.captureException(error, { extra: context });
}

export function logEvent(name: string, data?: Record<string, any>) {
  console.log(`[EVENT] ${name}`, data);
  Sentry.captureMessage(name, { level: 'info', extra: data });
}

export function trackCost(feature: string, tokens: number, cost: number) {
  logEvent('cost_tracked', {
    feature,
    tokens,
    cost,
    timestamp: new Date().toISOString(),
  });
}

Tools:

  • Errors: Sentry, Rollbar, Bugsnag
  • Logs: Vercel Logs, Datadog, Logtail
  • Metrics: Vercel Analytics, PostHog, Mixpanel
  • APM: New Relic, Datadog APM

Cost Explosions and How to Prevent Them

Common causes:

  1. No rate limiting → Users spam requests
  2. No input validation → Token bombs (100k token inputs)
  3. Wrong model → Using GPT-4 for everything
  4. No caching → Regenerating same content
  5. No kill switches → Can't turn off expensive features

Prevention:

// lib/cost-control.ts
export async function generateWithCostControl(
  input: string,
  userId: string
): Promise<string> {
  // 1. Validate input length
  if (input.length > 10000) {
    throw new Error('Input too long');
  }
  
  // 2. Check user credits
  const user = await getUser(userId);
  if (user.credits < 10) {
    throw new Error('Insufficient credits');
  }
  
  // 3. Check daily limit
  const dailyUsage = await getDailyUsage(userId);
  if (dailyUsage.cost > 100) {
    throw new Error('Daily limit exceeded');
  }
  
  // 4. Use appropriate model
  const model = user.plan === 'premium' ? 'gpt-4' : 'gpt-3.5-turbo';
  
  // 5. Generate
  const response = await generate(input, { model });
  
  // 6. Track cost
  const cost = calculateCost(response.usage.total_tokens, model);
  await deductCredits(userId, cost);
  await logCost(userId, cost);
  
  // 7. Check for anomalies
  if (cost > 10) {
    logEvent('high_cost_request', { userId, cost, input: input.substring(0, 100) });
  }
  
  return response.choices[0].message.content;
}

Kill switches:

// lib/feature-flags.ts
export const featureFlags = {
  expensiveFeature: process.env.ENABLE_EXPENSIVE_FEATURE === 'true',
  experimentalModel: process.env.ENABLE_EXPERIMENTAL_MODEL === 'true',
};

export async function generate(input: string) {
  if (!featureFlags.expensiveFeature) {
    throw new Error('Feature temporarily disabled');
  }
  
  // ... generate
}

Rollbacks and Kill Switches

When things go wrong:

  1. Immediate: Kill switch (disable feature)
  2. Short-term: Rollback deployment
  3. Long-term: Fix and redeploy

Kill switch implementation:

// app/api/generate/route.ts
export async function POST(req: Request) {
  // Kill switch check
  if (process.env.KILL_SWITCH_GENERATE === 'true') {
    return Response.json(
      { error: 'Service temporarily unavailable' },
      { status: 503 }
    );
  }
  
  // ... rest of handler
}

Rollback procedure:

  1. Identify the bad deployment
  2. Revert to previous version (Git + Vercel/Railway/etc.)
  3. Verify fix
  4. Investigate root cause
  5. Deploy fix

Automated rollback (advanced):

# Monitor error rate, auto-rollback if > threshold
# (Use your platform's health checks + automation)

Monetization & Paywalls

Most AI apps never monetize. They build features, then try to add billing later. That's backwards. Design monetization into the product from day one.

Why Most AI Apps Fail to Monetize

1. No value proposition

Users don't understand why they should pay. The free tier does everything.

2. Wrong pricing model

Charging per month when usage varies wildly.

3. No enforcement

Free tier limits exist on paper, not in code.

4. Poor paywall UX

Paywalls block users instead of converting them.

Credits vs Subscriptions

Credits (usage-based):

  • Good for: Variable usage, pay-as-you-go
  • Example: 1000 credits = $10, each generation costs 10 credits
  • Pros: Fair, scales with usage
  • Cons: Harder to predict revenue

Subscriptions (recurring):

  • Good for: Predictable usage, SaaS model
  • Example: $29/month = unlimited generations
  • Pros: Predictable revenue, better for users
  • Cons: Heavy users cost you money

Hybrid (best of both):

  • Base subscription + usage-based overage
  • Example: $19/month = 1000 generations, $0.01 per extra

Usage-Based Pricing Logic

Track everything:

// lib/usage-tracking.ts
export async function trackUsage(
  userId: string,
  feature: string,
  cost: number
) {
  await db.usage.create({
    data: {
      userId,
      feature,
      cost,
      timestamp: new Date(),
    },
  });
  
  // Update user's credit balance
  await db.user.update({
    where: { id: userId },
    data: {
      credits: { decrement: cost },
    },
  });
}

Calculate costs:

export function calculateCost(tokens: number, model: string): number {
  const pricing = {
    'gpt-4': { input: 0.03 / 1000, output: 0.06 / 1000 },
    'gpt-3.5-turbo': { input: 0.0015 / 1000, output: 0.002 / 1000 },
  };
  
  const rates = pricing[model] || pricing['gpt-3.5-turbo'];
  // Simplified: assume 50/50 input/output split
  return (tokens * rates.input) + (tokens * rates.output);
}

Enforcing Limits Server-Side

Never trust the client:

// ❌ BAD: Client-side check
if (user.credits < 10) {
  alert('Not enough credits');
  return;
}
await generate();

// ✅ GOOD: Server-side check
export async function POST(req: Request) {
  const user = await getCurrentUser(req);
  
  if (user.credits < 10) {
    return Response.json(
      { error: 'Insufficient credits' },
      { status: 402 }
    );
  }
  
  // Deduct before generation (atomic)
  await db.user.update({
    where: { id: user.id },
    data: { credits: { decrement: 10 } },
  });
  
  try {
    const result = await generate();
    return Response.json({ result });
  } catch (error) {
    // Refund on error
    await db.user.update({
      where: { id: user.id },
      data: { credits: { increment: 10 } },
    });
    throw error;
  }
}

Free tier limits:

export async function checkLimit(userId: string, feature: string): Promise<boolean> {
  const user = await getUser(userId);
  
  if (user.plan === 'free') {
    const dailyUsage = await getDailyUsage(userId, feature);
    
    // Free tier: 10 generations per day
    if (dailyUsage.count >= 10) {
      return false;
    }
  }
  
  return true;
}

Designing Paywalls Without Killing Activation

Bad paywall:

// Blocks user immediately
{user.credits === 0 && <PaywallModal />}

Good paywall:

// Shows value first, then paywall
{user.credits > 0 && <GenerateButton />}
{user.credits === 0 && (
  <div>
    <p>You've used your free credits! Upgrade to continue.</p>
    <UpgradeButton />
    <p>Or share on Twitter for 10 free credits</p>
  </div>
)}

Paywall best practices:

  1. Show value first (let users try before paying)
  2. Clear pricing (no hidden fees)
  3. Multiple options (free, pro, enterprise)
  4. Social proof (testimonials, usage stats)
  5. Easy upgrade (one click, no friction)
  6. Transparent limits (show what they get)

Example paywall component:

export function Paywall({ user, feature }: { user: User; feature: string }) {
  const limits = {
    free: { generations: 10, features: ['basic'] },
    pro: { generations: 1000, features: ['basic', 'advanced', 'api'] },
  };
  
  return (
    <div className="paywall">
      <h2>Upgrade to continue</h2>
      <p>You've reached your free tier limit for {feature}</p>
      
      <div className="plans">
        <Plan
          name="Pro"
          price="$29/month"
          features={limits.pro.features}
          current={user.plan === 'pro'}
        />
      </div>
      
      <Button onClick={handleUpgrade}>Upgrade now</Button>
    </div>
  );
}

The Exact Workflow I Use (Step-by-Step)

This is the workflow I use to ship AI products. It's not theoretical. It's what I do, in order, every time.

Step 1: Idea → Validation

Don't build yet. Validate first.

  1. Define the outcome (not the feature)

    • Bad: "Build an AI writing assistant"
    • Good: "Help users write blog posts 10x faster"
  2. Find 5 people who have this problem

    • Talk to them
    • Understand their current solution
    • Validate they'd pay for a better solution
  3. Build a landing page (no code yet)

    • Explain the outcome
    • Show mockups
    • Collect emails
    • If 50+ people sign up, proceed

Why this matters:

Most ideas are bad. Validation filters them out before you waste weeks building.

Step 2: UX First, Not Model First

Design the experience before choosing the model.

  1. Map the user journey

    • Where do they start?
    • What do they input?
    • What do they see?
    • What happens when it fails?
  2. Design the UI (Figma, sketches, whatever)

    • Input form
    • Loading states
    • Output display
    • Error states
  3. Define the API contract

    • What does the request look like?
    • What does the response look like?
    • What are the error cases?

Why this matters:

The model is a detail. The experience is the product. Design the experience first.

Step 3: API Contracts Before UI Polish

Build the backend first. Make it work. Then make it pretty.

  1. Create the API route

    • Input validation
    • Auth check
    • Rate limiting
    • AI generation
    • Response formatting
  2. Test with curl/Postman

    • Valid requests
    • Invalid requests
    • Edge cases
    • Error handling
  3. Only then build the UI

    • Call the API
    • Handle responses
    • Handle errors
    • Polish the design

Why this matters:

Backend defines what's possible. Frontend is presentation. Build the foundation first.

Step 4: Build Thin, Ship Early, Harden Later

Ship the minimum that delivers value. Improve based on feedback.

v1 (Week 1):

  • Basic input/output
  • No streaming
  • No retries
  • Basic error handling
  • Deploy to production

v2 (Week 2):

  • Add streaming
  • Add retries
  • Better error messages
  • Improve UX

v3 (Week 3):

  • Add paywall
  • Add usage tracking
  • Add analytics
  • Optimize costs

Why this matters:

Perfect is the enemy of shipped. Ship something that works, then improve it.

Step 5: What I Deliberately Ignore in v1

Things I skip in the first version:

  • Comprehensive test suite (unit tests for critical paths only)
  • Perfect error handling (basic try/catch is enough)
  • Advanced monitoring (basic logs are enough)
  • Multiple models (pick one, stick with it)
  • Complex state management (keep it simple)
  • Optimizations (premature optimization is evil)
  • Documentation (code should be self-documenting)

Things I never skip:

  • Authentication (who is this user?)
  • Rate limiting (prevent abuse)
  • Input validation (prevent token bombs)
  • Error boundaries (don't crash the app)
  • Basic logging (need to debug)
  • Environment variables (never hardcode secrets)

Why this matters:

Focus on what matters. Ignore what doesn't. Ship faster.

The Complete Workflow Timeline

Week 1: Foundation

  • Day 1-2: Validation (landing page, user interviews)
  • Day 3-4: UX design (journey map, UI mockups)
  • Day 5-6: API development (backend, testing)
  • Day 7: Basic UI (connect to API, deploy)

Week 2: Polish

  • Day 8-9: Streaming, better UX
  • Day 10-11: Error handling, retries
  • Day 12-13: Testing, bug fixes
  • Day 14: Launch (share with initial users)

Week 3: Scale

  • Day 15-16: Paywall, usage tracking
  • Day 17-18: Analytics, monitoring
  • Day 19-20: Optimizations, cost control
  • Day 21: Iterate based on feedback

This is realistic. Not "ship in 2 hours." Not "ship in 6 months." Three weeks to something real.


Common Mistakes I See After Reviewing Dozens of AI Apps

I've reviewed dozens of AI apps. The same mistakes show up over and over. Here's what to avoid.

Mistake 1: Hardcoded Keys

The mistake:

const apiKey = 'sk-...';

Why it's bad:

  • Keys get committed to git
  • Keys get exposed in client-side code
  • Keys get shared in screenshots
  • Result: $10k OpenAI bill

The fix:

const apiKey = process.env.OPENAI_API_KEY;

Always use environment variables. Always.

Mistake 2: No Backend

The mistake:

// Frontend calling OpenAI directly
const response = await openai.chat.completions.create({
  apiKey: 'sk-...', // Exposed!
  // ...
});

Why it's bad:

  • Can't hide API keys
  • Can't rate limit
  • Can't track usage
  • Can't enforce paywalls
  • Can't prevent abuse

The fix:

Always use a backend proxy. Always.

Mistake 3: No Abuse Protection

The mistake:

// No rate limiting, no input validation
export async function POST(req: Request) {
  const { input } = await req.json();
  return await generate(input); // Anything goes!
}

Why it's bad:

  • Users can send 100k token inputs
  • Users can spam requests
  • Users can DDoS your API
  • Result: $1000s in unexpected costs

The fix:

// Rate limit + input validation
export async function POST(req: Request) {
  // Rate limit
  const { success } = await ratelimit.limit(userId);
  if (!success) return Response.json({ error: 'Rate limited' }, { status: 429 });
  
  // Validate input
  const { input } = await req.json();
  if (input.length > 10000) {
    return Response.json({ error: 'Input too long' }, { status: 400 });
  }
  
  return await generate(input);
}

Mistake 4: Over-Engineering Infra Too Early

The mistake:

  • Kubernetes for an MVP
  • Microservices for 100 users
  • Complex CI/CD for a weekend project
  • Over-architecting before you have users

Why it's bad:

  • Wastes time
  • Adds complexity
  • Slows iteration
  • Premature optimization

The fix:

Start simple. Vercel/Railway/Render for hosting. PostgreSQL for database. Add complexity when you need it.

Mistake 5: Shipping Features Instead of Outcomes

The mistake:

  • Building 10 features before launching
  • Perfecting the UI before validating the idea
  • Adding "nice to have" features before core works

Why it's bad:

  • Wastes time on things users don't want
  • Delays learning
  • Delays revenue
  • Builds the wrong product

The fix:

Ship the minimum that delivers value. One feature that works is better than ten features that don't.

Mistake 6: No Error Handling

The mistake:

const result = await generate(input);
return result; // What if it fails?

Why it's bad:

  • App crashes on errors
  • Users see cryptic error messages
  • No way to debug issues
  • Bad user experience

The fix:

try {
  const result = await generate(input);
  return Response.json({ result });
} catch (error) {
  console.error(error);
  return Response.json(
    { error: 'Generation failed. Please try again.' },
    { status: 500 }
  );
}

Mistake 7: Ignoring Costs

The mistake:

  • Using GPT-4 for everything
  • No cost tracking
  • No usage limits
  • No kill switches

Why it's bad:

  • Unexpected bills
  • Can't optimize
  • Can't price correctly
  • Can go bankrupt

The fix:

Track every request. Log costs. Set limits. Use cheaper models when possible.

Mistake 8: No Observability

The mistake:

  • No error tracking
  • No usage analytics
  • No performance monitoring
  • Flying blind

Why it's bad:

  • Can't debug issues
  • Can't understand usage
  • Can't optimize
  • Can't make data-driven decisions

The fix:

Add error tracking (Sentry). Add analytics (PostHog). Add logging. Know what's happening.


Final Cheat Sheet (Skimmable)

Print this. Keep it handy. Reference it before shipping.

Architecture Checklist

  • Backend proxy (never frontend → OpenAI directly)
  • Environment variables for all secrets
  • Authentication on all protected endpoints
  • Rate limiting implemented
  • Input validation and sanitization
  • Error handling and logging
  • Cost tracking and limits
  • Observability (errors, logs, metrics)

Security Checklist

  • No hardcoded API keys
  • Secrets in environment variables
  • Authentication implemented
  • Authorization checks
  • Rate limiting
  • Input sanitization
  • Output filtering
  • HTTPS only
  • CORS configured
  • Audit logging

UX Checklist

  • Streaming responses (not spinners)
  • Input constraints (length, format)
  • Clear error messages
  • Retry mechanisms
  • Loading states (skeletons, not spinners)
  • Optimistic UI where possible
  • Mobile responsive
  • Accessible (keyboard navigation, screen readers)

Deployment Checklist

  • Environment separation (dev/staging/prod)
  • CI/CD pipeline
  • Error tracking (Sentry)
  • Logging (structured logs)
  • Monitoring (uptime, performance)
  • Cost monitoring (API spend)
  • Kill switches for expensive features
  • Rollback procedure documented

Monetization Checklist

  • Usage tracking implemented
  • Credits/subscriptions system
  • Paywall designed (not blocking)
  • Limits enforced server-side
  • Billing integration (Stripe/Paddle)
  • Cost calculation accurate
  • Free tier limits clear

Development Workflow

  1. Validate → Landing page, user interviews
  2. Design → UX first, model second
  3. Build → API contracts before UI polish
  4. Ship → Thin v1, improve based on feedback
  5. Iterate → Data-driven improvements

Tools I Use

  • IDE: Cursor
  • Framework: Next.js
  • Database: PostgreSQL (Supabase/Railway)
  • Auth: NextAuth.js / Clerk
  • Hosting: Vercel / Railway
  • Error Tracking: Sentry
  • Analytics: PostHog / Vercel Analytics
  • Rate Limiting: Upstash
  • Billing: Stripe

Rules to Live By

  1. Backend always. Never frontend → AI directly.
  2. Security first. Never skip it.
  3. Ship thin. Perfect is the enemy of shipped.
  4. Track costs. Every request, every token.
  5. Design for failure. AI will fail. Handle it.
  6. Validate early. Don't build in a vacuum.
  7. Monitor everything. You can't fix what you can't see.

Closing

If you've read this far, you're serious about building real AI products. That's good. The world needs more builders, fewer demos.

This workflow isn't theoretical. It's what I use to ship software that works. It's opinionated. It's specific. It assumes you can code but haven't shipped production AI software before.

The gap between "AI demo" and "production AI app" is massive. Most people never cross it. They build demos, get excited, then hit a wall when real users show up.

You don't have to hit that wall.

Follow this system. Use these patterns. Avoid these mistakes. Ship something real.

The models are good enough. The tools are good enough. The only thing missing is the system. Now you have it.

Build something people actually use. Charge for it. Make it work.

If you're building something real and want to talk shop, find me on Twitter. I review AI apps and give honest feedback. No fluff. No corporate speak. Just real talk about what works and what doesn't.

Now go ship.

Need a build partner?

Launch your AI app development workflow with DreamLaunch

We deliver production-grade products in 28 days with research, design, engineering, and launch support handled end-to-end. Our team blends production AI software, MVP development with senior founders so you can stay focused on growth.

Ready to Build Your MVP?

Turn your idea into a revenue-ready product in just 28 days.

Dreamlaunch

START YOUR NEW PROJECT

WITH DREAMLAUNCH

TODAY!

Or send us a mail at → harshil@dreamlaunch.studio

© DreamLaunch LLC