Skip to Content
BackendLangfuse LLM Tracing with the @observe Decorator

Langfuse LLM Tracing with the @observe Decorator

What? (Concept Overview)

Langfuse is an LLM-observability platform that captures every prompt, completion, latency, token-cost, and trace-context line for any function decorated with @observe(name=...). Wrapping the agent’s classify/produce/respond method emits a span into Langfuse’s trace tree, while the surrounding FastAPI request automatically becomes the parent span — no manual context plumbing required.

Project Context

The FCA Support Agent wires Langfuse into every LLM-bound agent method:

  • IntentClassifierAgent.classify (app/agents/intent_classifier.py) — @observe(name="intent_classifier.classify") emits one span per classification
  • BaseAgent (app/agents/base.py) — initialises a Langfuse client during __init__ from Settings.langfuse_public_key and langfuse_secret_key
  • is_observability_enabled computed property on Settings — single switch to disable globally when keys are missing in dev

How? (Quick Reference Blocks)

3.1 The Decorator on an Agent Method

# app/agents/intent_classifier.py from langfuse import observe from groq import Groq class IntentClassifierAgent(BaseAgent): @observe(name="intent_classifier.classify") async def classify(self, message: str, history=None) -> dict: prompt = ( "You are an FCA-compliant intent classifier.\n" "Choose exactly one label from: product_acquisition | " "account_data | knowledge_inquiry | complaint | general_inquiry.\n" f"Customer message: {message!r}\n" ) chat = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}], temperature=0.0, response_format={"type": "json_object"}, ) return json.loads(chat.choices[0].message.content)

3.2 Initialising the Langfuse Client in BaseAgent

# app/agents/base.py — BaseAgent.__init__ from langfuse import Langfuse from app.config import settings class BaseAgent: name = "base" role = "generic" def __init__(self) -> None: if settings.is_observability_enabled: self.langfuse = Langfuse( public_key=settings.langfuse_public_key, secret_key=settings.langfuse_secret_key, host=settings.langfuse_host, ) else: self.langfuse = None

3.3 Initialising the Langchain Callback Handler

# inside an agent that uses LangChain, not raw OpenAI/Groq from langfuse.callback import CallbackHandler @observe(name="rag.search") async def search(self, query: str) -> list[str]: handler = CallbackHandler( public_key=settings.langfuse_public_key, secret_key=settings.langfuse_secret_key, ) result = await self.vector_store.asimilarity_search( query, k=4, callbacks=[handler], # ← wires spans automatically ) return [doc.page_content for doc in result]

Why? (Parameter Breakdown

  • @observe(name="...") decorator — Auto-creates a span for every invocation. Manual langfuse.span(...) calls would require try/finally cleanup; the decorator wraps the call site so spans always close.
  • Naming convention: agent_name.method_name — Groups trace spans by agent in the Langfuse UI. Without this, every span shows the raw function name (classify) and you can’t filter by agent.
  • settings.is_observability_enabled gate — Langfuse SDK raises if you instantiate it without keys. Property-driven gate avoids littering every agent with try/except ImportError.
  • Langchain callbacks=[handler] for vector stores@observe only wraps Python functions; the underlying Langchain call (similarity_search) emits its own spans through the callback system. BOTH must be plumbed.
  • host=settings.langfuse_host — Defaults to cloud.langfuse.com; override to a self-hosted instance for compliance / VPC deployment.

Common Pitfalls

  1. Triggers KeyError without checking is_observability_enabled. Wrap the constructor with the gate so production never crashes on missing keys.
  2. Decorating methods that don’t actually call an LLM. @observe adds HTTP round-trips to Langfuse; decorating a pure-Python utility wastes 20-50ms per call and inflates trace volume.

Real-World Interview Prep

Q1: How do you calculate LLM cost from Langfuse traces?

A: Each trace records usage.prompt_tokens and usage.completion_tokens per span. Multiply by per-model pricing (e.g., llama-3.1-8b-instant = $0.05/$0.08 per 1M tokens). Most teams add a daily cron that queries Langfuse’s /api/public/v2/usage endpoint and posts to a metrics store. Then dashboard sum(rate(agent_token_usage_total[1d])) * price from Prometheus.

Q2: Why use Langfuse rather than LangSmith?

A: Open-source core (self-hostable in your VPC for compliance), multi-provider (works with Groq, OpenAI, Anthropic, Bedrock, local models) and SOC2 / GDPR ready. LangSmith is OpenAI-only at the SDK level. For multi-model stacks Langfuse is dominant. Pick LangSmith if you’re 100% OpenAI and want zero setup.

Q3: How do you debug a trace where the agent “hallucinated”?

A: Open the trace; inspect (a) prompt sent (chromium trace shows token highlighting), (b) full completion text (no truncation), (c) metadata.docs_retrieved if the agent was RAG-backed. The hallucination’s root cause is almost always (1) wrong doc retrieved (RAG gap) or (2) prompt instruction conflict. Edit the prompt, re-run on the same input — Langfuse tracks the version diff so you can verify the fix worked.

Top-to-Bottom Code Walkthrough (app/agents/base.pyBaseAgent._observe_decorator())

Tracing is the first thing you wire when working with LLMs — every prompt and response is a black-box experiment otherwise. This file specialises the @observe() decorator so per-agent method spans land in Langfuse correctly.

Imports

  • from langfuse import observe — the decorator.
  • from langfuse.callback import CallbackHandler — the callback that hooks into LangChain’s run-time so every prompt/response pair becomes a trace event.
  • from app.config import settingsis_observability_enabled is the kill-switch.

BaseAgent.__init__(...) — observability wiring

if settings.is_observability_enabled: self.langfuse_handler = CallbackHandler() else: self.langfuse_handler = None # No-op when keys missing

Key design choice: never crash at import time when keys are missing. Just no-op. The CI/dev team can run the whole stack without Langfuse.

@observe(name="agent.<agent_name>.<method>") — per-call span

Every agent method (IntentClassifier.execute, etc.) is decorated:

class IntentClassifier(BaseAgent): @observe(name="intent_classifier.classify") async def classify(self, message: str) -> dict: ...

What it produces in Langfuse’s UI:

  • One span per call: name, start_time, end_time, latency.
  • Auto-captured inputs: the function’s arguments (message).
  • Auto-captured outputs: the function’s return value ({"intent": "loan_inquiry", ...}).
  • Nested hierarchies: a parent @observe("process_message") on the orchestrator captures child spans for intent_classify, account_agent, compliance_check, etc.

LangChain integration: config={"callbacks": [self.langfuse_handler]}

Inside the actual LLM call:

response = await self.llm.ainvoke( messages, config={"callbacks": [self.langfuse_handler] if self.langfuse_handler else []}, )

Why pass callbacks explicitly instead of relying on global state: LangChain picks up LangChain callbacks from the config, not via global langchain.callbacks. Forgetting this is the #1 reason “Langfuse shows nothing”.

Sampling

For high-traffic agents, bracket the decorator with a sample rate:

if random.random() < settings.langfuse_sample_rate: # e.g. 0.1 @observe(name="...") async def classify(...): ... else: async def classify(...): ...

Capturing 100% of calls in production can balloon your Langfuse bill. Sampling 10% gives statistical insight at 1/10 the cost.

Common Pitfalls

Decorating an async generator (async def stream()) with @observe works, but if you yield text bytes the trace shows nothing because Langfuse tries to serialise each yield. Use a wrapper async function instead.

Calling observe twice on the same function opens nested spans with the same name. If a parent span is open it duplicates the trace metadata; pick one decorator.

Forgetting await on the decorated method if it’s a coroutine misleads the trace to think it completed instantly.

Real-World Interview Prep

Q1: When should observability be optional vs mandatory?

A: Always optional. CI runs without keys; dev sandboxes same. Make it a property on Settings (is_observability_enabled) and never raise at import when keys are missing — just skip opening spans.

Q2: How does @observe differ from manual langfuse.trace()?

A: @observe is declarative. The decorator captures inputs (function args) and outputs (return value) automatically — including exceptions. Manual traces require you to pass every field by hand. Default to @observe unless you need a custom span name or to attach pre-captured metadata only available inside the function.

Q3: Why sample 10% in production?

A: Langfuse (and every trace store) bills by event volume. A LangGraph graph with 8 nodes × 100 conversations/sec = 800 trace-events/sec at full sample. Sampling 10% gives you 80/sec — same statistical insight for 1/10th the cost.

Last updated on