← Back to Projects
LLM Observability & Cost Optimization
CompletedApril 2026
A practical guide to observability for LLM applications using Langfuse. Covers tracing, cost optimization, privacy compliance, and production monitoring with real examples across Claude and OpenAI.
Key Features
- LLM Observability — trace every LLM call, nested span, and event across Claude and OpenAI
- Cost Optimization — model routing, prompt optimization, and semantic caching (30–50% cost reduction)
- Monitoring & Alerting — webhook alerts on cost spikes, extensible to Slack/PagerDuty
- Privacy & Compliance — PII redaction before logging for GDPR/HIPAA/SOC2
LLM Observability
Three levels of instrumentation:
- Level 1 (OpenAI wrapper) — zero code changes, wrap the client
- Level 2 (
@observedecorator) — trace any function; nested calls become child spans automatically - Level 3 (OpenTelemetry) — automatic tracing for LangChain with no decorators needed
Three types of observations:
- Generation — LLM API calls: completions, token counts, model costs
- Span — any operation with duration: DB queries, retrieval steps, processing
- Event — point-in-time occurrences: cache hits, errors, milestones
Cost Optimization Strategy
- Smart model routing — classifies task type (simple, code, complex, creative) and routes to the cheapest right-fit model (Haiku → Sonnet → GPT-4o)
- Prompt optimization — strips filler phrases and deduplicates instructions to reduce input tokens
- Semantic caching — uses sentence-transformers + ChromaDB to return cached responses for queries with >92% similarity; persists across restarts
- Combined approach targets 50–70% reduction in LLM spend
Monitoring & Alerting
- Checks hourly cost against a configurable threshold
- Fires webhook alerts (webhook.site, Slack, Discord, PagerDuty)
Privacy & Compliance
- Redacts PII (email, phone, SSN, credit card, IP) from prompts and responses before sending to Langfuse
- Recursive dictionary redaction for nested payloads
RAG Pipeline
End-to-end retrieval-augmented generation with full trace visibility: document indexing → embedding → semantic retrieval → Claude generation, with token counts and cost tracked per step.
LangfuseLangchainPythonAnthropicOpenAIDebuggingPrivacyOpentelemetryChromaDBCost Optimization