Langfuse LLM Tracing with the @observe Decorator
What? (Concept Overview)
Langfuse is an LLM-observability platform that captures every prompt, completion, latency, token-cost, and trace-context line for any function decorated with @observe(name=...). Wrapping the agent’s classify/produce/respond method emits a span into Langfuse’s trace tree, while the surrounding FastAPI request automatically becomes the parent span — no manual context plumbing required.
Project Context
The FCA Support Agent wires Langfuse into every LLM-bound agent method:
IntentClassifierAgent.classify(app/agents/intent_classifier.py) —@observe(name="intent_classifier.classify")emits one span per classificationBaseAgent(app/agents/base.py) — initialises aLangfuseclient during__init__fromSettings.langfuse_public_keyandlangfuse_secret_keyis_observability_enabledcomputed property onSettings— single switch to disable globally when keys are missing in dev
How? (Quick Reference Blocks)
3.1 The Decorator on an Agent Method
# app/agents/intent_classifier.py
from langfuse import observe
from groq import Groq
class IntentClassifierAgent(BaseAgent):
@observe(name="intent_classifier.classify")
async def classify(self, message: str, history=None) -> dict:
prompt = (
"You are an FCA-compliant intent classifier.\n"
"Choose exactly one label from: product_acquisition | "
"account_data | knowledge_inquiry | complaint | general_inquiry.\n"
f"Customer message: {message!r}\n"
)
chat = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0.0,
response_format={"type": "json_object"},
)
return json.loads(chat.choices[0].message.content)3.2 Initialising the Langfuse Client in BaseAgent
# app/agents/base.py — BaseAgent.__init__
from langfuse import Langfuse
from app.config import settings
class BaseAgent:
name = "base"
role = "generic"
def __init__(self) -> None:
if settings.is_observability_enabled:
self.langfuse = Langfuse(
public_key=settings.langfuse_public_key,
secret_key=settings.langfuse_secret_key,
host=settings.langfuse_host,
)
else:
self.langfuse = None3.3 Initialising the Langchain Callback Handler
# inside an agent that uses LangChain, not raw OpenAI/Groq
from langfuse.callback import CallbackHandler
@observe(name="rag.search")
async def search(self, query: str) -> list[str]:
handler = CallbackHandler(
public_key=settings.langfuse_public_key,
secret_key=settings.langfuse_secret_key,
)
result = await self.vector_store.asimilarity_search(
query, k=4, callbacks=[handler], # ← wires spans automatically
)
return [doc.page_content for doc in result]Why? (Parameter Breakdown
@observe(name="...")decorator — Auto-creates a span for every invocation. Manuallangfuse.span(...)calls would require try/finally cleanup; the decorator wraps the call site so spans always close.- Naming convention:
agent_name.method_name— Groups trace spans by agent in the Langfuse UI. Without this, every span shows the raw function name (classify) and you can’t filter by agent. settings.is_observability_enabledgate — Langfuse SDK raises if you instantiate it without keys. Property-driven gate avoids littering every agent withtry/except ImportError.- Langchain
callbacks=[handler]for vector stores —@observeonly wraps Python functions; the underlying Langchain call (similarity_search) emits its own spans through the callback system. BOTH must be plumbed. host=settings.langfuse_host— Defaults tocloud.langfuse.com; override to a self-hosted instance for compliance / VPC deployment.
Common Pitfalls
- Triggers KeyError without checking
is_observability_enabled. Wrap the constructor with the gate so production never crashes on missing keys. - Decorating methods that don’t actually call an LLM.
@observeadds HTTP round-trips to Langfuse; decorating a pure-Python utility wastes 20-50ms per call and inflates trace volume.
Real-World Interview Prep
Q1: How do you calculate LLM cost from Langfuse traces?
A: Each trace records usage.prompt_tokens and usage.completion_tokens per span. Multiply by per-model pricing (e.g., llama-3.1-8b-instant = $0.05/$0.08 per 1M tokens). Most teams add a daily cron that queries Langfuse’s /api/public/v2/usage endpoint and posts to a metrics store. Then dashboard sum(rate(agent_token_usage_total[1d])) * price from Prometheus.
Q2: Why use Langfuse rather than LangSmith?
A: Open-source core (self-hostable in your VPC for compliance), multi-provider (works with Groq, OpenAI, Anthropic, Bedrock, local models) and SOC2 / GDPR ready. LangSmith is OpenAI-only at the SDK level. For multi-model stacks Langfuse is dominant. Pick LangSmith if you’re 100% OpenAI and want zero setup.
Q3: How do you debug a trace where the agent “hallucinated”?
A: Open the trace; inspect (a) prompt sent (chromium trace shows token highlighting), (b) full completion text (no truncation), (c) metadata.docs_retrieved if the agent was RAG-backed. The hallucination’s root cause is almost always (1) wrong doc retrieved (RAG gap) or (2) prompt instruction conflict. Edit the prompt, re-run on the same input — Langfuse tracks the version diff so you can verify the fix worked.
Top-to-Bottom Code Walkthrough (app/agents/base.py — BaseAgent._observe_decorator())
Tracing is the first thing you wire when working with LLMs — every prompt and response is a black-box experiment otherwise. This file specialises the @observe() decorator so per-agent method spans land in Langfuse correctly.
Imports
from langfuse import observe— the decorator.from langfuse.callback import CallbackHandler— the callback that hooks into LangChain’s run-time so every prompt/response pair becomes a trace event.from app.config import settings—is_observability_enabledis the kill-switch.
BaseAgent.__init__(...) — observability wiring
if settings.is_observability_enabled:
self.langfuse_handler = CallbackHandler()
else:
self.langfuse_handler = None # No-op when keys missingKey design choice: never crash at import time when keys are missing. Just no-op. The CI/dev team can run the whole stack without Langfuse.
@observe(name="agent.<agent_name>.<method>") — per-call span
Every agent method (IntentClassifier.execute, etc.) is decorated:
class IntentClassifier(BaseAgent):
@observe(name="intent_classifier.classify")
async def classify(self, message: str) -> dict:
...What it produces in Langfuse’s UI:
- One span per call:
name,start_time,end_time,latency. - Auto-captured inputs: the function’s arguments (
message). - Auto-captured outputs: the function’s return value (
{"intent": "loan_inquiry", ...}). - Nested hierarchies: a parent
@observe("process_message")on the orchestrator captures child spans forintent_classify,account_agent,compliance_check, etc.
LangChain integration: config={"callbacks": [self.langfuse_handler]}
Inside the actual LLM call:
response = await self.llm.ainvoke(
messages,
config={"callbacks": [self.langfuse_handler] if self.langfuse_handler else []},
)Why pass callbacks explicitly instead of relying on global state: LangChain picks up LangChain callbacks from the config, not via global langchain.callbacks. Forgetting this is the #1 reason “Langfuse shows nothing”.
Sampling
For high-traffic agents, bracket the decorator with a sample rate:
if random.random() < settings.langfuse_sample_rate: # e.g. 0.1
@observe(name="...")
async def classify(...): ...
else:
async def classify(...): ...Capturing 100% of calls in production can balloon your Langfuse bill. Sampling 10% gives statistical insight at 1/10 the cost.
Common Pitfalls
Decorating an async generator (async def stream()) with @observe works, but if you yield text bytes the trace shows nothing because Langfuse tries to serialise each yield. Use a wrapper async function instead.
Calling observe twice on the same function opens nested spans with the same name. If a parent span is open it duplicates the trace metadata; pick one decorator.
Forgetting await on the decorated method if it’s a coroutine misleads the trace to think it completed instantly.
Real-World Interview Prep
Q1: When should observability be optional vs mandatory?
A: Always optional. CI runs without keys; dev sandboxes same. Make it a property on Settings (is_observability_enabled) and never raise at import when keys are missing — just skip opening spans.
Q2: How does @observe differ from manual langfuse.trace()?
A: @observe is declarative. The decorator captures inputs (function args) and outputs (return value) automatically — including exceptions. Manual traces require you to pass every field by hand. Default to @observe unless you need a custom span name or to attach pre-captured metadata only available inside the function.
Q3: Why sample 10% in production?
A: Langfuse (and every trace store) bills by event volume. A LangGraph graph with 8 nodes × 100 conversations/sec = 800 trace-events/sec at full sample. Sampling 10% gives you 80/sec — same statistical insight for 1/10th the cost.