LangGraph Multi-Agent Supervisor Routing
What? (Concept Overview)
A supervisor-style LangGraph topology uses one classifier node plus a set of specialised agent nodes with conditional routing edges that forward control to the agent whose domain matches the customer’s intent, gated by a confidence threshold. The pattern keeps each agent narrowly scoped (one responsibility, one prompt, one tool surface) while the graph orchestrates the dispatch.
Project Context
The FCA Support Agent uses an IntentClassifierAgent (Groq + Langfuse) at the classify node of MessageWorkflow, then routes the request to one of account, product, general, or human based on the returned intent string and confidence. Product recommendations take a mandatory second stop at compliance. This staircase of conditional edges is the canonical topology for production LLM systems where every node must be auditable and replaceable.
How? (Quick Reference Blocks)
3.1 The Classifier Node Source
# app/agents/intent_classifier.py — IntentClassifierAgent
import json
from langfuse import observe
from groq import Groq
from app.agents.base import BaseAgent
from app.config import settings
INTENT_LABELS = [
"product_acquisition",
"account_data",
"knowledge_inquiry",
"complaint",
"general_inquiry",
]
class IntentClassifierAgent(BaseAgent):
name = "intent_classifier"
role = "intent_routing"
def __init__(self) -> None:
super().__init__()
self.client = Groq(api_key=settings.groq_api_key)
self.model = settings.groq_model
@observe(name="intent_classifier.classify")
async def classify(self, message: str, history: list[dict] | None = None) -> dict:
prompt = (
"You are an FCA-compliant intent classifier.\n"
f"Choose exactly one label from: {INTENT_LABELS}.\n"
"Respond with strict JSON: {\"intent\": ..., \"confidence\": 0.0-1.0}.\n"
f"Customer message: {message!r}\n"
)
chat = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0.0, # deterministic routing
response_format={"type": "json_object"},
)
parsed = json.loads(chat.choices[0].message.content)
return {
"intent": parsed["intent"],
"confidence": float(parsed["confidence"]),
}3.2 Conditional Routing Function
The routing function reads the full state and returns the name of the next node. Confidence gating lives here, not in the classifier, so the threshold can be tuned without re-training prompts.
# app/workflows/message_workflow.py — _route_by_intent
CONF_ROUTING_THRESHOLD = 0.55
async def _route_by_intent(self, state: dict) -> str:
intent = state.get("intent", "general_inquiry")
confidence = float(state.get("confidence", 0.0))
if confidence < CONF_ROUTING_THRESHOLD:
return "human" # too uncertain → operator
return {
"product_acquisition": "product",
"account_data": "account",
"knowledge_inquiry": "general",
"complaint": "human",
"general_inquiry": "general",
}.get(intent, "general")3.3 Wiring the Conditional Edges
# app/workflows/message_workflow.py
workflow.add_node("classify", self._node_classify)
workflow.add_node("account", self._node_account_agent)
workflow.add_node("product", self._node_product_recommender)
workflow.add_node("general", self._node_general_agent)
workflow.add_node("human", self._node_human_agent)
workflow.add_conditional_edges(
"classify",
self._route_by_intent,
{
"account": "account",
"product": "product",
"general": "general",
"human": "human",
},
)3.4 Compliance Gate After Product Recommender
Product recommendations run an extra compliance check before terminating; flagged ones funnel into HITL (see Human-in-the-Loop page).
# app/workflows/message_workflow.py
workflow.add_edge("product", "compliance")
workflow.add_conditional_edges(
"compliance",
self._route_compliance,
{"approved": "end", "review": "human_approval"},
)Why? (Parameter Breakdown)
temperature=0.0for the classifier — Routing must be deterministic across pods; any seed variance will cause A/B routing drift in production.response_format={"type": "json_object"}— Forces the LLM to emit valid JSON, eliminating the trailing-token / partial-JSON validation branch.@observe(name="intent_classifier.classify")— Langfuse decorator creates a span for every classification. Without it, you cannot attribute hallucinations to a specific model/prompt in traceback. The span name also drives Langfuse dashboards grouped by agent.- Confidence below threshold routes to
human, notgeneral— Low confidence is qualitatively different from “general inquiry”; funneling it into a generic reply erodes regulator-trust signals. Operators on the human queue can confirm the right agent before responding. - Compliance as a SEPARATE node, not a flag inside the product node — Compliance predicates can be unit-tested, swapped, and traced independently. Inlining them mixes business logic with regulation and breaks compliance auditability.
- DEFAULT in
dict.get(intent, "general")— PreventsKeyErrorfrom a new/unknown intent; ensures the graph never crashes on a newly-observed intent label.
Common Pitfalls
- Routing on intent string alone. Without a confidence gate, low-confidence classifications (“I’m not sure which agent”) silently downstream to the wrong specialist, producing confident-sounding but unrelated replies. Always gate with
confidence >= threshold. - Putting the threshold in the prompt. “Reply with low confidence if uncertain” sounds clever but is unreliable — the model will calibrate to your training data, not your product’s needs. Threshold on a number returned by the JSON, not on prose.
Real-World Interview Prep
Q1: How would you evolve this graph to support fallback routing — i.e., “if agent X fails, try agent Y before escalating”?
A: Wrap each agent node in a fallback edge. Two approaches: (a) per-node try/except inside _node_X that, on RetryableError, calls _node_Y directly (simple, but mixes retry and routing); (b) add a route_fallback conditional after each agent that checks state["last_error"] and returns "Y" if recoverable. Approach (b) preserves a clean node graph (agent_X → fallback_router → agent_Y | human), making fallbacks observable and testable in isolation. Combine with tenacity (@retry(stop=stop_after_attempt(3), wait=wait_exponential)) for transient infra failures, but reserve application-level fallbacks (LLM error → rule-based reply) for the conditional router.
Q2: How do you debug a case where the customer complains the agent “answered the wrong question”?
A: Inspect three layers in order. (1) Classifier span: pull intent, confidence, and the raw prompt from Langfuse — is the LLM misreading the message? (2) Routing decision: from the same snapshot, read state["intent"], state["confidence"], and _route_by_intent source — was the threshold logic wrong? Was the new intent label missing from the routing dict? (3) Agent span: trace into the downstream _node_X execution to see which retrieval results and prompt were used. Most “wrong answer” reports are actually correct routing into an under-trained agent; the fix is rarely a routing bug, it’s a prompt/data fix in the targeted agent.
Q3: Why bother with a separate compliance node instead of running the check inside the product agent?
A: Three reasons. (1) Auditability — regulators and security reviewers can read _route_compliance and the compliance prompt independently, without parsing the entire product agent. (2) Replaceability — swap the LLM, swap the heuristic, or route to a human without touching the product agent’s prompt. (3) Cold-cache locality — the LLM token cost of a compliance check is small but predictable; bundling it inside the product agent hides the spend in the agent’s larger completion. Isolated nodes also let you budget per-node token spend and detect regressions when compliance costs spike.
Top-to-Bottom Code Walkthrough (app/coordinator/agent_coordinator.py + app/workflows/message_workflow.py)
The “supervisor routing” pattern is a layered graph where one node decides which specialised agent to invoke next. In this project the routing decision lives in route_intent(state).
The graph assembly (app/workflows/message_workflow.py)
from langgraph.graph import StateGraph, END
from app.schemas.common import WorkflowState
graph = StateGraph(MessageWorkflowState)
# Nodes
graph.add_node("intent_classifier", IntentClassifier().run)
graph.add_node("account_agent", AccountAgent().run)
graph.add_node("product_recommender", ProductRecommender().run)
graph.add_node("general_agent", GeneralAgent().run)
graph.add_node("compliance_checker", ComplianceChecker().run)
graph.add_node("human_agent", HumanAgent().run)
# Edges
graph.set_entry_point("intent_classifier")
graph.add_conditional_edges(
"intent_classifier",
route_intent, # this is the supervisor
{
"account_balance": "account_agent",
"transaction_history": "account_agent",
"loan_inquiry": "product_recommender",
"general_faq": "general_agent",
"escalation": "human_agent",
},
)
graph.add_edge("account_agent", "compliance_checker")
graph.add_edge("product_recommender", "compliance_checker")
graph.add_edge("general_agent", "compliance_checker")
graph.add_conditional_edges(
"compliance_checker",
route_compliance, # second supervisor
{"OK": END, "FAIL": "human_agent"},
)route_intent(state) — the supervisor
def route_intent(state: MessageWorkflowState) -> str:
intent = state.intent # filled by intent_classifier
confidence = state.confidence_scores.get(intent, 0.0)
if confidence < 0.6:
return "general_agent" # route low-confidence to fallback
if state.intent in ["account_balance", "transaction_history"]:
return "account_agent"
if state.intent in ["loan_inquiry", "credit_card"]:
return "product_recommender"
if intent == "escalation":
return "human_agent"
return "general_agent"Why a low-confidence fallback: the classifier has known accuracy ceilings; below 60% confidence the right answer is not to pick a specific specialist but to use the most general agent. This is the “drift to generalist” pattern.
The state object
MessageWorkflowState extends WorkflowState(BaseModel):
- Adds
intent: str - Adds
confidence_scores: dict[str, float] - Adds
agent_outputs: dict[str, dict] - Adds
escalation_required: bool
Every node reads and extends it. The supervisor reads one field and returns a string (next-node name).
Coordinator wrapper (app/coordinator/agent_coordinator.py)
A method-level convenience: coordinator.process_message(...) builds the state, calls graph.ainvoke(state), and returns state.final_response. The inside-graph complexity is hidden from the route layer.
Common Pitfalls
Returning nodes from route_intent that aren’t in the graph — LangGraph raises KeyError at compile time.
Forgetting confidence_scores in the default case — state.confidence_scores.get(intent, 0.0) defaults to 0 which silently routes to fallback. Add a default.
Forgetting compliance_checker on the edge between a responder and END — agents send unmoderated text. Always pass through.
Real-World Interview Prep
Q1: Supervisor vs hand-coded if/else — what’s the difference?
A: Hand-coded logic is procedural: state machine in your head, easy to mix in business logic. Supervisor pattern is declarative: each branch becomes a node with its own observability, retry, checkpoint. The graph compiles to a visual diagram.
Q2: Why does the supervisor have a confidence threshold (0.6)?
A: The classifier can hallucinate. When confidence is low, the most general agent is the safest choice — it can recognise “I don’t know” and route back to a human. Sending low-confidence input to a specialist amplifies the mistake.
Q3: How many agents can a supervisor route to before it becomes too complex?
A: Empirically, 5-7. Beyond that, route composition (e.g. parallel chains merging into a final node) beats single-supervisor. Use subgraphs — define a small supervisor for related agents, then have the top-level supervisor route to the subgraph.
See also: Specialist Agent Deep Dives
The four specialist agents that the supervisor routes between (IntentClassifier, AccountAgent, ProductRecommender, GeneralAgent) are walked through line-by-line in Specialist Agent Deep Dives & LangGraph Flow. That page covers the full nine-node LangGraph flow (guardrail → classify → {account|general|product} → compliance → END), the per-agent process() methods, the tiered cache pattern of GeneralAgent, the FCA-strict prompt of ProductRecommender, and the stateless AgentCoordinator that wraps the whole graph.