Presidio PII Redaction with Lakera Prompt-Injection Defense
What
A two-layer outbound safety net for any agent that ingests user text: Microsoft Presidio labels and redacts PII before the text reaches the LLM, and Lakera Guard screens prompts for jailbreak patterns with a configurable confidence threshold.
Project Context
In full_project_context_updated.txt -> app/config.py, the security settings block exposes:
pii_redaction_enabled: bool = Truelakera_guard_api_key: Optional[str] = Nonesecurity_jailbreak_threshold: float = 0.8security_enabled: bool = True(master kill-switch)
In Dockerfile, the third build stage downloads the Spacy en_core_web_lg wheel from the explosion-models GitHub release. That Spacy model is required by presidio-analyzer on first invocation, so it must be baked into the image rather than fetched at runtime.
How
Settings block with master kill-switch and bounds-checked threshold
security_enabled: bool = Field(default=True, description="Enable security guardrails")
pii_redaction_enabled: bool = Field(default=True, description="Redact PII from logs and DB")
security_jailbreak_threshold: float = Field(default=0.8, ge=0.0, le=1.0)
lakera_guard_api_key: Optional[str] = Field(
default=None,
description="API Key for Lakera Guard (Advanced Prompt Injection Defense)",
)- The master
security_enabledswitch short-circuits BOTH checks in one place when devs want raw prompts. ge=0.0, le=1.0enforces threshold bounds — Lakera returns confidence scores in the [0.0, 1.0] range, so any value outside is a misconfiguration.
Presidio analyzer with the Spacy en_core_web_lg engine
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
def redact_pii(prompt: str) -> str:
if not settings.pii_redaction_enabled:
return prompt
results = analyzer.analyze(text=prompt, language="en")
anonymized = anonymizer.anonymize(text=prompt, analyzer_results=results)
return anonymized.text- The
AnalyzerEngine()constructor reads the Spacy model pipeline; the wheel installed in the Dockerfile ensures first-call latency stays low. anonymizer.anonymize(...)replaces entities with canonical placeholders such asEMAILandPHONE_NUMBER, rendered at runtime as the familiar entity-tag blocks. This preserves enough structure for downstream agents to reason about entity positions without keeping the raw values.- Calling
analyze(...)per prompt is the simple version; in hot paths cache theAnalyzerEngineonce at module load.
Lakera Guard jailbreak check
import httpx
async def check_prompt_safety(prompt: str) -> bool:
if not settings.lakera_guard_api_key:
return True # fail-open when no key configured
async with httpx.AsyncClient() as client:
r = await client.post(
"https://api.lakera.ai/v1/prompt_injection",
json={"input": prompt},
headers={"Authorization": f"Bearer {settings.lakera_guard_api_key}"},
timeout=settings.groq_timeout,
)
data = r.json()
return data.get("score", 0.0) < settings.security_jailbreak_threshold- Fail-open when the key is missing is the correct dev default; production deployments must add an explicit alarm when the API key is absent.
- The threshold is round-tripped from config so A/B-tuning thresholds can be done without a redeploy.
Common Pitfalls
Running Presidio with the wrong Spacy wheel raises OSError: [E050] Can't find model 'en_core_web_lg' on first call. Pin the wheel URL in the Dockerfile so the image is self-sufficient.
Setting the threshold too high (above 0.95) makes Lakera permissive and lets jailbreaks slip through; setting it too low (below 0.5) blocks legitimate queries. Benchmark before deployment.
Calling Presidio per-token inside the agent loop is O(n²) overhead — invoke once on the final user message before the LLM call, not on every chunk of streaming output.
Real-World Interview Prep
Q1: How accurate is Presidio on banking-domain PII?
A: Presidio achieves 90-95% recall on standard entities (PERSON, EMAIL, PHONE, CREDIT_CARD) when the Spacy model is en_core_web_lg (as in this stack). The 5-10% miss rate is concentrated in (a) non-Western names not in the Spacy vocabulary, (b) domain-specific entities not in the default presets (e.g., UK National Insurance, sort codes, IBAN) — the inline PatternRecognizer for UK_NINO adds a regex recogniser to cover this gap, (c) heavily obfuscated formats (john[at]bank[dot]com). For high-stakes compliance, follow Presidio with a custom regex pass for the regex-foolable categories. Precision remains high (very few false positives) because Presidio’s scoring weights Spacy + recogniser agreement.
Q2: How does Lakera Guard decide if a prompt is a jailbreak?
A: Lakera Guard runs a classifier network (a fine-tuned BERT-like model) over the prompt and returns a score in [0.0, 1.0] (or flagged: bool per the v2 API). It’s been trained on a corpus of jailbreak patterns: persona-shift (“you are now X”), prompt-extraction (“repeat your system prompt”), base64-decoded attacks, etc. Comparison vs heuristic: Lakera catches heuristics-blind patterns (multi-language jailbreaks, semantic-only attacks) and is far more robust against obfuscation like leet-speak or unicode tricks. The trade-off is cost — Lakera is a paid API billed per call; for low-latency hot paths pre-screen with a fast regex pass and only call Lakera on the suspicious ~5%.
Q3: When should a security layer “fail open” vs “fail closed”?
A: Fail closed (block on error) when (a) the user-facing impact of a false negative (a leaked PII / a successful jailbreak) is severe, (b) you have regulatory obligation (PCI-DSS, FCA), (c) the user can re-attempt safely. Fail open (allow on error) when (a) the dependency is non-critical (e.g., Lakera is down but Presidio still runs), (b) a false positive (blocked legit user) is worse than a false negative, (c) the user cannot retry (e.g., in-chat escalation). The FCA stack defaults to fail-open with explicit logger.error(...) so outages are visible; in production a stronger default is fail-closed on PII but fail-open on jailbreak (lower user friction, higher compliance fidelity).
Top-to-Bottom Code Walkthrough (app/services/security_service.py — sanitize_input, check_prompt_injection, redact_pii)
This is the outbound defense-in-depth for an FCA-grade chatbot. Three layers, each catching what the others miss.
Layer 1 — Presidio (local PII detection)
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
entities = analyzer.analyze(text=text, language="en")
anonymized = anonymizer.anonymize(text=text, analyzer_results=entities)Built-in entity types: PERSON, EMAIL, PHONE_NUMBER, IBAN, UK_NINO, CREDIT_CARD, IP_ADDRESS, URL, ORGANIZATION. Custom regex recognisers (for UK-specific):
from presidio_analyzer import Pattern, PatternRecognizer
uk_nino_pattern = PatternRecognizer(
supported_entity="UK_NINO",
patterns=[Pattern(name="nino", regex=r"\b[A-CEGHJ-PR-TW-Z]{2}\d{6}[A-D]\b", score=0.85)],
)
analyzer.registry.add_recognizer(uk_nino_pattern)Layer 2 — Lakera Guard (cloud prompt-injection detection)
import httpx
async with httpx.AsyncClient() as client:
r = await client.post(
"https://api.lakera.ai/v1/prompt_injection",
headers={"Authorization": f"Bearer {settings.lakera_guard_api_key}"},
json={"input": user_text},
timeout=2.0,
)
data = r.json()
is_injection = data["injection"] > settings.security_jailbreak_threshold # default 0.8timeout=2.0 — Lakera is called inline; a slow response would block the user message. 2 seconds is the right budget; if Lakera is unhealthy, fall back to regex + heuristic.
Layer 3 — Local regex (always-on fallback)
import re
JAILBREAK_REGEXES = [
r"(?i)ignore (?:all|previous|above) instructions",
r"(?i)you are now (?:DAN|jailbroken)",
r"(?i)system prompt",
r"(?i)prompt injection",
]
def local_jailbreak_check(text):
for pattern in JAILBREAK_REGEXES:
if re.search(pattern, text):
return True
return FalseWhy three layers
- Presidio alone: misses novel injection patterns. Strong on PII, weak on jailbreaks.
- Lakera alone: misses PII. Strong on jailbreaks. Requires API key + connectivity.
- Local regex alone: limited vocabularly; too many false negatives.
Together: Presidio scrubs PII in 50ms. Lakera flags jailbreaks in 200ms-2s. Regex catches what Lakera misses (e.g., Lakera downtime).
The combined pipeline in sanitize_input
async def sanitize_input(self, text: str) -> dict:
# Layer 1: PII
entities = self.analyzer.analyze(text=text, language="en")
anonymized = self.anonymizer.anonymize(text=text, analyzer_results=entities)
# Layer 2 & 3: Jailbreak (parallel)
if settings.lakera_guard_api_key:
is_inj_lakera = await self._lakera_check(anonymized.text)
else:
is_inj_lakera = self.local_jailbreak_check(text)
return {
"safe_text": anonymized.text,
"is_injection": is_inj_lakera,
"redacted_entities": [e.to_dict() for e in entities],
}Critical sequence: PII redaction runs BEFORE jailbreak check. Why? Because lazera should never see raw PII (data sovereignty).
redact_pii(text) — log-safe variant
def redact_pii(self, text: str) -> str:
entities = self.analyzer.analyze(text=text, language="en")
return self.anonymizer.anonymize(text=text, analyzer_results=entities).textUsed by logger.info(f"User said: {redact_pii(user_input)}"). Never log the raw prompt.
Kill switches
if not settings.security_enabled:
return {"safe_text": text, "is_injection": False}Two booleans:
security_enabled— master switch for ALL sanitisation (set to false ONLY in dev).pii_redaction_enabled— Presidio only, Lakera still runs.
Performance characteristics
- Presidio NLP: 50-300ms (spaCy model overhead). Cached instance at startup.
- Lakera: 100-500ms. Async httpx.
- Regex: <1ms.
Throughput: ~3-5 user messages/sec per worker. Sufficient for hundreds of concurrent users with a few workers.
Common Pitfalls
Sending raw PII to Lakera — Presidio must redact first. Lakera has its own SOC2; data still leaves your VPC.
Not running Presidio on the LLM responses — agent output might echo back entered PII. Anonymiser runs BOTH on input and output.
Caching Lakera results without TTL — attack patterns evolve; a 30-day cache could re-allow a now-known-bad prompt. TTL 5 min.
Real-World Interview Prep
Q1: Why is UK_NINO not a built-in Presidio entity?
A: Presidio ships with US-centric recognisers (US_SSN, US_PASSPORT). UK-specific identifiers need custom regex recognisers, registered via PatternRecognizer and added to the analyzer’s registry.
Q2: When would you fail-open vs fail-closed on Lakera downtime?
A: Fail-open (let request through with a warning) for inbound prompt-injection — blocking every request because Lakera is down is worse than running with degraded checks. Fail-closed for outbound PII redaction if Presidio is down — sending PII to LLM models is a regulatory violation.
Q3: How do you measure the false-positive rate of the jailbreak detector?
A: Build a labelled test corpus of 1000 legitimate + 500 attack prompts. Run each through the detector. Compute F1. Production tuning targets F1 ≈ 0.95. False positives are worse than misses — they block legitimate users.