LangGraph Multi-Agent Supervisor Routing

What? (Concept Overview)

A supervisor-style LangGraph topology uses one classifier node plus a set of specialised agent nodes with conditional routing edges that forward control to the agent whose domain matches the customer’s intent, gated by a confidence threshold. The pattern keeps each agent narrowly scoped (one responsibility, one prompt, one tool surface) while the graph orchestrates the dispatch.

Project Context

The FCA Support Agent uses an IntentClassifierAgent (Groq + Langfuse) at the classify node of MessageWorkflow, then routes the request to one of account, product, general, or human based on the returned intent string and confidence. Product recommendations take a mandatory second stop at compliance. This staircase of conditional edges is the canonical topology for production LLM systems where every node must be auditable and replaceable.

How? (Quick Reference Blocks)

3.1 The Classifier Node Source


# app/agents/intent_classifier.py — IntentClassifierAgent
import json
from langfuse import observe
from groq import Groq
from app.agents.base import BaseAgent
from app.config import settings
 
INTENT_LABELS = [
    "product_acquisition",
    "account_data",
    "knowledge_inquiry",
    "complaint",
    "general_inquiry",
]
 
class IntentClassifierAgent(BaseAgent):
    name = "intent_classifier"
    role = "intent_routing"
 
    def __init__(self) -> None:
        super().__init__()
        self.client = Groq(api_key=settings.groq_api_key)
        self.model = settings.groq_model
 
    @observe(name="intent_classifier.classify")
    async def classify(self, message: str, history: list[dict] | None = None) -> dict:
        prompt = (
            "You are an FCA-compliant intent classifier.\n"
            f"Choose exactly one label from: {INTENT_LABELS}.\n"
            "Respond with strict JSON: {\"intent\": ..., \"confidence\": 0.0-1.0}.\n"
            f"Customer message: {message!r}\n"
        )
        chat = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0,                # deterministic routing
            response_format={"type": "json_object"},
        )
        parsed = json.loads(chat.choices[0].message.content)
        return {
            "intent": parsed["intent"],
            "confidence": float(parsed["confidence"]),
        }

3.2 Conditional Routing Function

The routing function reads the full state and returns the name of the next node. Confidence gating lives here, not in the classifier, so the threshold can be tuned without re-training prompts.


# app/workflows/message_workflow.py — _route_by_intent
CONF_ROUTING_THRESHOLD = 0.55
 
async def _route_by_intent(self, state: dict) -> str:
    intent = state.get("intent", "general_inquiry")
    confidence = float(state.get("confidence", 0.0))
    if confidence < CONF_ROUTING_THRESHOLD:
        return "human"      # too uncertain → operator
    return {
        "product_acquisition": "product",
        "account_data": "account",
        "knowledge_inquiry": "general",
        "complaint": "human",
        "general_inquiry": "general",
    }.get(intent, "general")

3.3 Wiring the Conditional Edges


# app/workflows/message_workflow.py
workflow.add_node("classify", self._node_classify)
workflow.add_node("account",  self._node_account_agent)
workflow.add_node("product",  self._node_product_recommender)
workflow.add_node("general",  self._node_general_agent)
workflow.add_node("human",    self._node_human_agent)
 
workflow.add_conditional_edges(
    "classify",
    self._route_by_intent,
    {
        "account": "account",
        "product": "product",
        "general": "general",
        "human":   "human",
    },
)

3.4 Compliance Gate After Product Recommender

Product recommendations run an extra compliance check before terminating; flagged ones funnel into HITL (see Human-in-the-Loop page).


# app/workflows/message_workflow.py
workflow.add_edge("product", "compliance")
workflow.add_conditional_edges(
    "compliance",
    self._route_compliance,
    {"approved": "end", "review": "human_approval"},
)

Why? (Parameter Breakdown)

temperature=0.0 for the classifier — Routing must be deterministic across pods; any seed variance will cause A/B routing drift in production.
response_format={"type": "json_object"} — Forces the LLM to emit valid JSON, eliminating the trailing-token / partial-JSON validation branch.
@observe(name="intent_classifier.classify") — Langfuse decorator creates a span for every classification. Without it, you cannot attribute hallucinations to a specific model/prompt in traceback. The span name also drives Langfuse dashboards grouped by agent.
Confidence below threshold routes to human, not general — Low confidence is qualitatively different from “general inquiry”; funneling it into a generic reply erodes regulator-trust signals. Operators on the human queue can confirm the right agent before responding.
Compliance as a SEPARATE node, not a flag inside the product node — Compliance predicates can be unit-tested, swapped, and traced independently. Inlining them mixes business logic with regulation and breaks compliance auditability.
DEFAULT in dict.get(intent, "general") — Prevents KeyError from a new/unknown intent; ensures the graph never crashes on a newly-observed intent label.

Common Pitfalls

Routing on intent string alone. Without a confidence gate, low-confidence classifications (“I’m not sure which agent”) silently downstream to the wrong specialist, producing confident-sounding but unrelated replies. Always gate with confidence >= threshold.
Putting the threshold in the prompt. “Reply with low confidence if uncertain” sounds clever but is unreliable — the model will calibrate to your training data, not your product’s needs. Threshold on a number returned by the JSON, not on prose.

Real-World Interview Prep

Q1: How would you evolve this graph to support fallback routing — i.e., “if agent X fails, try agent Y before escalating”?

A: Wrap each agent node in a fallback edge. Two approaches: (a) per-node try/except inside _node_X that, on RetryableError, calls _node_Y directly (simple, but mixes retry and routing); (b) add a route_fallback conditional after each agent that checks state["last_error"] and returns "Y" if recoverable. Approach (b) preserves a clean node graph (agent_X → fallback_router → agent_Y | human), making fallbacks observable and testable in isolation. Combine with tenacity (@retry(stop=stop_after_attempt(3), wait=wait_exponential)) for transient infra failures, but reserve application-level fallbacks (LLM error → rule-based reply) for the conditional router.

Q2: How do you debug a case where the customer complains the agent “answered the wrong question”?

A: Inspect three layers in order. (1) Classifier span: pull intent, confidence, and the raw prompt from Langfuse — is the LLM misreading the message? (2) Routing decision: from the same snapshot, read state["intent"], state["confidence"], and _route_by_intent source — was the threshold logic wrong? Was the new intent label missing from the routing dict? (3) Agent span: trace into the downstream _node_X execution to see which retrieval results and prompt were used. Most “wrong answer” reports are actually correct routing into an under-trained agent; the fix is rarely a routing bug, it’s a prompt/data fix in the targeted agent.

Q3: Why bother with a separate `compliance` node instead of running the check inside the `product` agent?

A: Three reasons. (1) Auditability — regulators and security reviewers can read _route_compliance and the compliance prompt independently, without parsing the entire product agent. (2) Replaceability — swap the LLM, swap the heuristic, or route to a human without touching the product agent’s prompt. (3) Cold-cache locality — the LLM token cost of a compliance check is small but predictable; bundling it inside the product agent hides the spend in the agent’s larger completion. Isolated nodes also let you budget per-node token spend and detect regressions when compliance costs spike.

Top-to-Bottom Code Walkthrough (`app/coordinator/agent_coordinator.py` + `app/workflows/message_workflow.py`)

The “supervisor routing” pattern is a layered graph where one node decides which specialised agent to invoke next. In this project the routing decision lives in route_intent(state).

The graph assembly (`app/workflows/message_workflow.py`)


from langgraph.graph import StateGraph, END
from app.schemas.common import WorkflowState
 
graph = StateGraph(MessageWorkflowState)
 
# Nodes
graph.add_node("intent_classifier", IntentClassifier().run)
graph.add_node("account_agent", AccountAgent().run)
graph.add_node("product_recommender", ProductRecommender().run)
graph.add_node("general_agent", GeneralAgent().run)
graph.add_node("compliance_checker", ComplianceChecker().run)
graph.add_node("human_agent", HumanAgent().run)
 
# Edges
graph.set_entry_point("intent_classifier")
graph.add_conditional_edges(
    "intent_classifier",
    route_intent,  # this is the supervisor
    {
        "account_balance": "account_agent",
        "transaction_history": "account_agent",
        "loan_inquiry": "product_recommender",
        "general_faq": "general_agent",
        "escalation": "human_agent",
    },
)
graph.add_edge("account_agent", "compliance_checker")
graph.add_edge("product_recommender", "compliance_checker")
graph.add_edge("general_agent", "compliance_checker")
graph.add_conditional_edges(
    "compliance_checker",
    route_compliance,  # second supervisor
    {"OK": END, "FAIL": "human_agent"},
)

`route_intent(state)` — the supervisor


def route_intent(state: MessageWorkflowState) -> str:
    intent = state.intent  # filled by intent_classifier
    confidence = state.confidence_scores.get(intent, 0.0)
 
    if confidence < 0.6:
        return "general_agent"  # route low-confidence to fallback
 
    if state.intent in ["account_balance", "transaction_history"]:
        return "account_agent"
    if state.intent in ["loan_inquiry", "credit_card"]:
        return "product_recommender"
    if intent == "escalation":
        return "human_agent"
    return "general_agent"

Why a low-confidence fallback: the classifier has known accuracy ceilings; below 60% confidence the right answer is not to pick a specific specialist but to use the most general agent. This is the “drift to generalist” pattern.

The `state` object

MessageWorkflowState extends WorkflowState(BaseModel):

Adds intent: str
Adds confidence_scores: dict[str, float]
Adds agent_outputs: dict[str, dict]
Adds escalation_required: bool

Every node reads and extends it. The supervisor reads one field and returns a string (next-node name).

Coordinator wrapper (`app/coordinator/agent_coordinator.py`)

A method-level convenience: coordinator.process_message(...) builds the state, calls graph.ainvoke(state), and returns state.final_response. The inside-graph complexity is hidden from the route layer.

Common Pitfalls

Returning nodes from route_intent that aren’t in the graph — LangGraph raises KeyError at compile time.

Forgetting confidence_scores in the default case — state.confidence_scores.get(intent, 0.0) defaults to 0 which silently routes to fallback. Add a default.

Forgetting compliance_checker on the edge between a responder and END — agents send unmoderated text. Always pass through.

Real-World Interview Prep

Q1: Supervisor vs hand-coded `if/else` — what’s the difference?

A: Hand-coded logic is procedural: state machine in your head, easy to mix in business logic. Supervisor pattern is declarative: each branch becomes a node with its own observability, retry, checkpoint. The graph compiles to a visual diagram.

Q2: Why does the supervisor have a confidence threshold (0.6)?

A: The classifier can hallucinate. When confidence is low, the most general agent is the safest choice — it can recognise “I don’t know” and route back to a human. Sending low-confidence input to a specialist amplifies the mistake.

Q3: How many agents can a supervisor route to before it becomes too complex?

A: Empirically, 5-7. Beyond that, route composition (e.g. parallel chains merging into a final node) beats single-supervisor. Use subgraphs — define a small supervisor for related agents, then have the top-level supervisor route to the subgraph.

LangGraph Multi-Agent Supervisor Routing

What? (Concept Overview)

Project Context

How? (Quick Reference Blocks)

3.1 The Classifier Node Source

3.2 Conditional Routing Function

3.3 Wiring the Conditional Edges

3.4 Compliance Gate After Product Recommender

Why? (Parameter Breakdown)

Common Pitfalls

Real-World Interview Prep

Q1: How would you evolve this graph to support fallback routing — i.e., “if agent X fails, try agent Y before escalating”?

Q2: How do you debug a case where the customer complains the agent “answered the wrong question”?

Q3: Why bother with a separate compliance node instead of running the check inside the product agent?

Top-to-Bottom Code Walkthrough (app/coordinator/agent_coordinator.py + app/workflows/message_workflow.py)

The graph assembly (app/workflows/message_workflow.py)

route_intent(state) — the supervisor

The state object

Coordinator wrapper (app/coordinator/agent_coordinator.py)

Common Pitfalls

Real-World Interview Prep

Q1: Supervisor vs hand-coded if/else — what’s the difference?

Q2: Why does the supervisor have a confidence threshold (0.6)?

Q3: How many agents can a supervisor route to before it becomes too complex?

See also: Specialist Agent Deep Dives

Q3: Why bother with a separate `compliance` node instead of running the check inside the `product` agent?

Top-to-Bottom Code Walkthrough (`app/coordinator/agent_coordinator.py` + `app/workflows/message_workflow.py`)

The graph assembly (`app/workflows/message_workflow.py`)

`route_intent(state)` — the supervisor

The `state` object

Coordinator wrapper (`app/coordinator/agent_coordinator.py`)

Q1: Supervisor vs hand-coded `if/else` — what’s the difference?