Loading...
Loading...

The demo works beautifully. You show it to your team. Everyone is impressed. Then you put it in front of real users and it falls apart.
Real users ask questions you didn't anticipate. They provide context in unexpected ways. They ask follow-up questions in the middle of different topics. They paste in long text. They send one-word messages. They try to make the chatbot say inappropriate things. They get frustrated when it doesn't understand them.
Building a chatbot that survives this is a different project from building a chatbot that works in your demo. This guide covers the version that works for real users.
A production-ready customer support chatbot for a SaaS product. Features:
Tech stack:
Most chatbot tutorials skip architecture and jump straight to code. This is where they fail. Let me show the architecture that scales.
User Browser
|
| (streaming SSE)
v
Next.js API Route
| |
| v
| Rate Limiter
| |
v v
Claude API Abuse Detection
|
v
Convex Database
|
|---> Conversation History
|---> Message Analytics
'---> User Sessions
Key principle: separate the stateless AI call from the stateful conversation management. The AI model doesn't hold state. Your database does. The API layer connects them.
// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";
export default defineSchema({
conversations: defineTable({
sessionId: v.string(),
userId: v.optional(v.string()),
createdAt: v.number(),
updatedAt: v.number(),
status: v.union(v.literal("active"), v.literal("resolved"), v.literal("escalated")),
metadata: v.optional(v.object({
page: v.optional(v.string()),
userAgent: v.optional(v.string()),
referrer: v.optional(v.string()),
})),
}).index("by_session", ["sessionId"])
.index("by_user", ["userId"]),
messages: defineTable({
conversationId: v.id("conversations"),
role: v.union(v.literal("user"), v.literal("assistant"), v.literal("system")),
content: v.string(),
timestamp: v.number(),
metadata: v.optional(v.object({
tokensUsed: v.optional(v.number()),
model: v.optional(v.string()),
latencyMs: v.optional(v.number()),
})),
}).index("by_conversation", ["conversationId"]),
rateLimits: defineTable({
identifier: v.string(), // IP or userId
messageCount: v.number(),
windowStart: v.number(),
}).index("by_identifier", ["identifier"]),
});// app/api/chat/route.ts
import { NextRequest } from "next/server";
import Anthropic from "@anthropic-ai/sdk";
import { ConvexHttpClient } from "convex/browser";
import { api } from "@/convex/_generated/api";
const anthropic = new Anthropic();
const convex = new ConvexHttpClient(process.env.NEXT_PUBLIC_CONVEX_URL!);
const SYSTEM_PROMPT = `You are a helpful customer support assistant for Acme SaaS.
Your role:
- Answer questions about product features, pricing, and troubleshooting
- Be concise and helpful
- If you don't know something, say so and offer to connect them with the team
- Never make up information about features or pricing
Escalation triggers (respond with [ESCALATE] prefix):
- Customer expresses significant frustration or mentions legal action
- Technical issues you cannot resolve
- Requests for refunds or account modifications
- Any security concerns
Knowledge base context will be provided when available.`;
const RATE_LIMIT = { messages: 20, windowMs: 60000 }; // 20 messages per minute
async function checkRateLimit(identifier: string): Promise<boolean> {
const now = Date.now();
const windowStart = now - RATE_LIMIT.windowMs;
const existing = await convex.query(api.rateLimit.get, { identifier });
if (!existing || existing.windowStart < windowStart) {
await convex.mutation(api.rateLimit.set, {
identifier,
messageCount: 1,
windowStart: now,
});
return true;
}
if (existing.messageCount >= RATE_LIMIT.messages) {
return false;
}
await convex.mutation(api.rateLimit.increment, { identifier });
return true;
}
export async function POST(req: NextRequest) {
const ip = req.headers.get("x-forwarded-for") ?? "unknown";
// Rate limiting
const allowed = await checkRateLimit(ip);
if (!allowed) {
return new Response("Too many requests. Please wait before sending another message.", {
status: 429,
});
}
const { message, sessionId, conversationHistory } = await req.json();
// Input validation
if (!message || typeof message !== "string" || message.trim().length === 0) {
return new Response("Invalid message", { status: 400 });
}
if (message.length > 2000) {
return new Response("Message too long. Please keep messages under 2000 characters.", { status: 400 });
}
// Get or create conversation
let conversationId = req.headers.get("x-conversation-id");
if (!conversationId) {
conversationId = await convex.mutation(api.conversations.create, {
sessionId,
metadata: {
userAgent: req.headers.get("user-agent") ?? undefined,
referrer: req.headers.get("referer") ?? undefined,
},
});
}
// Save user message
await convex.mutation(api.messages.add, {
conversationId: conversationId as any,
role: "user",
content: message,
timestamp: Date.now(),
});
// Build message history (last 10 messages for context)
const recentHistory = conversationHistory.slice(-10);
const startTime = Date.now();
// Stream the response
const stream = anthropic.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: SYSTEM_PROMPT,
messages: [
...recentHistory,
{ role: "user", content: message },
],
});
// Collect full response for storage
let fullResponse = "";
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
if (
chunk.type === "content_block_delta" &&
chunk.delta.type === "text_delta"
) {
const text = chunk.delta.text;
fullResponse += text;
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
}
}
// Save assistant response
const latencyMs = Date.now() - startTime;
const usage = (await stream.finalMessage()).usage;
await convex.mutation(api.messages.add, {
conversationId: conversationId as any,
role: "assistant",
content: fullResponse,
timestamp: Date.now(),
metadata: {
tokensUsed: usage.input_tokens + usage.output_tokens,
model: "claude-sonnet-4-20250514",
latencyMs,
},
});
// Check for escalation trigger
if (fullResponse.startsWith("[ESCALATE]")) {
await convex.mutation(api.conversations.escalate, {
conversationId: conversationId as any,
});
}
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ done: true, conversationId })}\n\n`));
controller.close();
} catch (error) {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({ error: "Stream error" })}\n\n`)
);
controller.close();
}
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
"X-Conversation-Id": conversationId,
},
});
}// components/ChatWidget.tsx
"use client";
import { useState, useRef, useEffect } from "react";
import { nanoid } from "nanoid";
interface Message {
role: "user" | "assistant";
content: string;
timestamp: Date;
}
export function ChatWidget() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState("");
const [isStreaming, setIsStreaming] = useState(false);
const [conversationId, setConversationId] = useState<string | null>(null);
const sessionId = useRef(nanoid());
const messagesEndRef = useRef<HTMLDivElement>(null);
useEffect(() => {
messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);
async function sendMessage() {
if (!input.trim() || isStreaming) return;
const userMessage: Message = {
role: "user",
content: input.trim(),
timestamp: new Date(),
};
setMessages(prev => [...prev, userMessage]);
setInput("");
setIsStreaming(true);
// Add placeholder assistant message
const assistantMessage: Message = {
role: "assistant",
content: "",
timestamp: new Date(),
};
setMessages(prev => [...prev, assistantMessage]);
try {
const response = await fetch("/api/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
...(conversationId ? { "x-conversation-id": conversationId } : {}),
},
body: JSON.stringify({
message: userMessage.content,
sessionId: sessionId.current,
conversationHistory: messages.map(m => ({
role: m.role,
content: m.content,
})),
}),
});
if (!response.ok) {
const errorText = await response.text();
setMessages(prev => [
...prev.slice(0, -1),
{ ...assistantMessage, content: errorText || "Something went wrong. Please try again." },
]);
return;
}
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let fullContent = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split("\n").filter(l => l.startsWith("data: "));
for (const line of lines) {
const data = JSON.parse(line.slice(6));
if (data.text) {
fullContent += data.text;
setMessages(prev => [
...prev.slice(0, -1),
{ ...assistantMessage, content: fullContent },
]);
}
if (data.done && data.conversationId) {
setConversationId(data.conversationId);
}
}
}
} catch (error) {
setMessages(prev => [
...prev.slice(0, -1),
{ ...assistantMessage, content: "Connection error. Please check your internet and try again." },
]);
} finally {
setIsStreaming(false);
}
}
return (
<div className="flex flex-col h-[500px] border rounded-lg overflow-hidden">
<div className="bg-primary p-4">
<h3 className="text-primary-foreground font-semibold">Support Chat</h3>
</div>
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.length === 0 && (
<p className="text-muted-foreground text-sm text-center">
How can I help you today?
</p>
)}
{messages.map((msg, i) => (
<div key={i} className={`flex ${msg.role === "user" ? "justify-end" : "justify-start"}`}>
<div
className={`max-w-[80%] rounded-lg px-4 py-2 text-sm ${
msg.role === "user"
? "bg-primary text-primary-foreground"
: "bg-muted"
}`}
>
{msg.content || <span className="animate-pulse">...</span>}
</div>
</div>
))}
<div ref={messagesEndRef} />
</div>
<div className="border-t p-4 flex gap-2">
<input
className="flex-1 text-sm border rounded px-3 py-2 focus:outline-none focus:ring-2 focus:ring-primary"
value={input}
onChange={e => setInput(e.target.value)}
onKeyDown={e => e.key === "Enter" && !e.shiftKey && sendMessage()}
placeholder="Type your message..."
disabled={isStreaming}
maxLength={2000}
/>
<button
onClick={sendMessage}
disabled={isStreaming || !input.trim()}
className="bg-primary text-primary-foreground px-4 py-2 rounded text-sm disabled:opacity-50"
>
Send
</button>
</div>
</div>
);
}When the chatbot can't help, it needs to hand off gracefully. The [ESCALATE] prefix in the system prompt triggers a flag in Convex.
Your human support queue should:
status: "escalated"The implementation details vary by your support tooling, but the pattern is: agent detects limitation, sets a flag, your backend routes to human queue.
Three things separate this from a demo:
Rate limiting. Without it, a single automated script can exhaust your API budget in minutes. The sliding window approach handles bursts while preventing abuse.
Error boundaries everywhere. Every async operation can fail. The streaming connection can drop. The AI API can timeout. Each failure mode has a graceful fallback that doesn't leave the user staring at a spinner.
Conversation persistence. When the user refreshes the page, their conversation history is still there. This sounds obvious but most chatbot tutorials don't implement it, and users notice immediately when history disappears.

Basic RAG works in demos and breaks in production. Here's what naive implementations get wrong and how to build the version that handles real users, complex queries, and diverse documents.

Cold outreach is broken. Spray and pray is dead. Here's how to build an AI lead generation system that finds the right people and says the right thing.

Stop watching tutorials about tutorials. Here's how to actually build an AI agent that does something useful, from zero, in one sitting.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.