Langfuse LLM Tracing with the `@observe` Decorator

What? (Concept Overview)

Langfuse is an LLM-observability platform that captures every prompt, completion, latency, token-cost, and trace-context line for any function decorated with @observe(name=...). Wrapping the agent’s classify/produce/respond method emits a span into Langfuse’s trace tree, while the surrounding FastAPI request automatically becomes the parent span — no manual context plumbing required.

Project Context

The FCA Support Agent wires Langfuse into every LLM-bound agent method:

IntentClassifierAgent.classify (app/agents/intent_classifier.py) — @observe(name="intent_classifier.classify") emits one span per classification
BaseAgent (app/agents/base.py) — initialises a Langfuse client during __init__ from Settings.langfuse_public_key and langfuse_secret_key
is_observability_enabled computed property on Settings — single switch to disable globally when keys are missing in dev

How? (Quick Reference Blocks)

3.1 The Decorator on an Agent Method


# app/agents/intent_classifier.py
from langfuse import observe
from groq import Groq
 
class IntentClassifierAgent(BaseAgent):
    @observe(name="intent_classifier.classify")
    async def classify(self, message: str, history=None) -> dict:
        prompt = (
            "You are an FCA-compliant intent classifier.\n"
            "Choose exactly one label from: product_acquisition | "
            "account_data | knowledge_inquiry | complaint | general_inquiry.\n"
            f"Customer message: {message!r}\n"
        )
        chat = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0,
            response_format={"type": "json_object"},
        )
        return json.loads(chat.choices[0].message.content)

3.2 Initialising the Langfuse Client in `BaseAgent`


# app/agents/base.py — BaseAgent.__init__
from langfuse import Langfuse
from app.config import settings
 
class BaseAgent:
    name = "base"
    role = "generic"
 
    def __init__(self) -> None:
        if settings.is_observability_enabled:
            self.langfuse = Langfuse(
                public_key=settings.langfuse_public_key,
                secret_key=settings.langfuse_secret_key,
                host=settings.langfuse_host,
            )
        else:
            self.langfuse = None

3.3 Initialising the Langchain Callback Handler


# inside an agent that uses LangChain, not raw OpenAI/Groq
from langfuse.callback import CallbackHandler
 
@observe(name="rag.search")
async def search(self, query: str) -> list[str]:
    handler = CallbackHandler(
        public_key=settings.langfuse_public_key,
        secret_key=settings.langfuse_secret_key,
    )
    result = await self.vector_store.asimilarity_search(
        query, k=4, callbacks=[handler],    # ← wires spans automatically
    )
    return [doc.page_content for doc in result]

Why? (Parameter Breakdown

@observe(name="...") decorator — Auto-creates a span for every invocation. Manual langfuse.span(...) calls would require try/finally cleanup; the decorator wraps the call site so spans always close.
Naming convention: agent_name.method_name — Groups trace spans by agent in the Langfuse UI. Without this, every span shows the raw function name (classify) and you can’t filter by agent.
settings.is_observability_enabled gate — Langfuse SDK raises if you instantiate it without keys. Property-driven gate avoids littering every agent with try/except ImportError.
Langchain callbacks=[handler] for vector stores — @observe only wraps Python functions; the underlying Langchain call (similarity_search) emits its own spans through the callback system. BOTH must be plumbed.
host=settings.langfuse_host — Defaults to cloud.langfuse.com; override to a self-hosted instance for compliance / VPC deployment.

Common Pitfalls

Triggers KeyError without checking is_observability_enabled. Wrap the constructor with the gate so production never crashes on missing keys.
Decorating methods that don’t actually call an LLM. @observe adds HTTP round-trips to Langfuse; decorating a pure-Python utility wastes 20-50ms per call and inflates trace volume.

Real-World Interview Prep

Q1: How do you calculate LLM cost from Langfuse traces?

A: Each trace records usage.prompt_tokens and usage.completion_tokens per span. Multiply by per-model pricing (e.g., llama-3.1-8b-instant = $0.05/$0.08 per 1M tokens). Most teams add a daily cron that queries Langfuse’s /api/public/v2/usage endpoint and posts to a metrics store. Then dashboard sum(rate(agent_token_usage_total[1d])) * price from Prometheus.

Q2: Why use Langfuse rather than LangSmith?

A: Open-source core (self-hostable in your VPC for compliance), multi-provider (works with Groq, OpenAI, Anthropic, Bedrock, local models) and SOC2 / GDPR ready. LangSmith is OpenAI-only at the SDK level. For multi-model stacks Langfuse is dominant. Pick LangSmith if you’re 100% OpenAI and want zero setup.

Q3: How do you debug a trace where the agent “hallucinated”?

A: Open the trace; inspect (a) prompt sent (chromium trace shows token highlighting), (b) full completion text (no truncation), (c) metadata.docs_retrieved if the agent was RAG-backed. The hallucination’s root cause is almost always (1) wrong doc retrieved (RAG gap) or (2) prompt instruction conflict. Edit the prompt, re-run on the same input — Langfuse tracks the version diff so you can verify the fix worked.

Top-to-Bottom Code Walkthrough (`app/agents/base.py` — `BaseAgent._observe_decorator()`)

Tracing is the first thing you wire when working with LLMs — every prompt and response is a black-box experiment otherwise. This file specialises the @observe() decorator so per-agent method spans land in Langfuse correctly.

Imports

from langfuse import observe — the decorator.
from langfuse.callback import CallbackHandler — the callback that hooks into LangChain’s run-time so every prompt/response pair becomes a trace event.
from app.config import settings — is_observability_enabled is the kill-switch.

`BaseAgent.init(...)` — observability wiring


if settings.is_observability_enabled:
    self.langfuse_handler = CallbackHandler()
else:
    self.langfuse_handler = None  # No-op when keys missing

Key design choice: never crash at import time when keys are missing. Just no-op. The CI/dev team can run the whole stack without Langfuse.

`@observe(name="agent.<agent_name>.<method>")` — per-call span

Every agent method (IntentClassifier.execute, etc.) is decorated:


class IntentClassifier(BaseAgent):
    @observe(name="intent_classifier.classify")
    async def classify(self, message: str) -> dict:
        ...

What it produces in Langfuse’s UI:

One span per call: name, start_time, end_time, latency.
Auto-captured inputs: the function’s arguments (message).
Auto-captured outputs: the function’s return value ({"intent": "loan_inquiry", ...}).
Nested hierarchies: a parent @observe("process_message") on the orchestrator captures child spans for intent_classify, account_agent, compliance_check, etc.

LangChain integration: `config={"callbacks": [self.langfuse_handler]}`

Inside the actual LLM call:


response = await self.llm.ainvoke(
    messages,
    config={"callbacks": [self.langfuse_handler] if self.langfuse_handler else []},
)

Why pass callbacks explicitly instead of relying on global state: LangChain picks up LangChain callbacks from the config, not via global langchain.callbacks. Forgetting this is the #1 reason “Langfuse shows nothing”.

Sampling

For high-traffic agents, bracket the decorator with a sample rate:


if random.random() < settings.langfuse_sample_rate:  # e.g. 0.1
    @observe(name="...")
    async def classify(...): ...
else:
    async def classify(...): ...

Capturing 100% of calls in production can balloon your Langfuse bill. Sampling 10% gives statistical insight at 1/10 the cost.

Common Pitfalls

Decorating an async generator (async def stream()) with @observe works, but if you yield text bytes the trace shows nothing because Langfuse tries to serialise each yield. Use a wrapper async function instead.

Calling observe twice on the same function opens nested spans with the same name. If a parent span is open it duplicates the trace metadata; pick one decorator.

Forgetting await on the decorated method if it’s a coroutine misleads the trace to think it completed instantly.

Real-World Interview Prep

Q1: When should observability be optional vs mandatory?

A: Always optional. CI runs without keys; dev sandboxes same. Make it a property on Settings (is_observability_enabled) and never raise at import when keys are missing — just skip opening spans.

Q2: How does `@observe` differ from manual `langfuse.trace()`?

A: @observe is declarative. The decorator captures inputs (function args) and outputs (return value) automatically — including exceptions. Manual traces require you to pass every field by hand. Default to @observe unless you need a custom span name or to attach pre-captured metadata only available inside the function.

Q3: Why sample 10% in production?

A: Langfuse (and every trace store) bills by event volume. A LangGraph graph with 8 nodes × 100 conversations/sec = 800 trace-events/sec at full sample. Sampling 10% gives you 80/sec — same statistical insight for 1/10th the cost.

Langfuse LLM Tracing with the @observe Decorator

What? (Concept Overview)

Project Context

How? (Quick Reference Blocks)

3.1 The Decorator on an Agent Method

3.2 Initialising the Langfuse Client in BaseAgent

3.3 Initialising the Langchain Callback Handler

Why? (Parameter Breakdown

Common Pitfalls

Real-World Interview Prep

Q1: How do you calculate LLM cost from Langfuse traces?

Q2: Why use Langfuse rather than LangSmith?

Q3: How do you debug a trace where the agent “hallucinated”?

Top-to-Bottom Code Walkthrough (app/agents/base.py — BaseAgent._observe_decorator())

Imports

BaseAgent.__init__(...) — observability wiring

@observe(name="agent.<agent_name>.<method>") — per-call span

LangChain integration: config={"callbacks": [self.langfuse_handler]}

Sampling

Common Pitfalls

Real-World Interview Prep

Q1: When should observability be optional vs mandatory?

Q2: How does @observe differ from manual langfuse.trace()?

Q3: Why sample 10% in production?

Langfuse LLM Tracing with the `@observe` Decorator

3.2 Initialising the Langfuse Client in `BaseAgent`

Top-to-Bottom Code Walkthrough (`app/agents/base.py` — `BaseAgent._observe_decorator()`)

`BaseAgent.init(...)` — observability wiring

`@observe(name="agent.<agent_name>.<method>")` — per-call span

LangChain integration: `config={"callbacks": [self.langfuse_handler]}`

Q2: How does `@observe` differ from manual `langfuse.trace()`?