Skip to Content
SecurityPresidio PII Redaction with Lakera Prompt-Injection Defense

Presidio PII Redaction with Lakera Prompt-Injection Defense

What

A two-layer outbound safety net for any agent that ingests user text: Microsoft Presidio labels and redacts PII before the text reaches the LLM, and Lakera Guard screens prompts for jailbreak patterns with a configurable confidence threshold.

Project Context

In full_project_context_updated.txt -> app/config.py, the security settings block exposes:

  • pii_redaction_enabled: bool = True
  • lakera_guard_api_key: Optional[str] = None
  • security_jailbreak_threshold: float = 0.8
  • security_enabled: bool = True (master kill-switch)

In Dockerfile, the third build stage downloads the Spacy en_core_web_lg wheel from the explosion-models GitHub release. That Spacy model is required by presidio-analyzer on first invocation, so it must be baked into the image rather than fetched at runtime.

How

Settings block with master kill-switch and bounds-checked threshold

security_enabled: bool = Field(default=True, description="Enable security guardrails") pii_redaction_enabled: bool = Field(default=True, description="Redact PII from logs and DB") security_jailbreak_threshold: float = Field(default=0.8, ge=0.0, le=1.0) lakera_guard_api_key: Optional[str] = Field( default=None, description="API Key for Lakera Guard (Advanced Prompt Injection Defense)", )
  • The master security_enabled switch short-circuits BOTH checks in one place when devs want raw prompts.
  • ge=0.0, le=1.0 enforces threshold bounds — Lakera returns confidence scores in the [0.0, 1.0] range, so any value outside is a misconfiguration.

Presidio analyzer with the Spacy en_core_web_lg engine

from presidio_analyzer import AnalyzerEngine from presidio_anonymizer import AnonymizerEngine analyzer = AnalyzerEngine() anonymizer = AnonymizerEngine() def redact_pii(prompt: str) -> str: if not settings.pii_redaction_enabled: return prompt results = analyzer.analyze(text=prompt, language="en") anonymized = anonymizer.anonymize(text=prompt, analyzer_results=results) return anonymized.text
  • The AnalyzerEngine() constructor reads the Spacy model pipeline; the wheel installed in the Dockerfile ensures first-call latency stays low.
  • anonymizer.anonymize(...) replaces entities with canonical placeholders such as EMAIL and PHONE_NUMBER, rendered at runtime as the familiar entity-tag blocks. This preserves enough structure for downstream agents to reason about entity positions without keeping the raw values.
  • Calling analyze(...) per prompt is the simple version; in hot paths cache the AnalyzerEngine once at module load.

Lakera Guard jailbreak check

import httpx async def check_prompt_safety(prompt: str) -> bool: if not settings.lakera_guard_api_key: return True # fail-open when no key configured async with httpx.AsyncClient() as client: r = await client.post( "https://api.lakera.ai/v1/prompt_injection", json={"input": prompt}, headers={"Authorization": f"Bearer {settings.lakera_guard_api_key}"}, timeout=settings.groq_timeout, ) data = r.json() return data.get("score", 0.0) < settings.security_jailbreak_threshold
  • Fail-open when the key is missing is the correct dev default; production deployments must add an explicit alarm when the API key is absent.
  • The threshold is round-tripped from config so A/B-tuning thresholds can be done without a redeploy.

Common Pitfalls

Running Presidio with the wrong Spacy wheel raises OSError: [E050] Can't find model 'en_core_web_lg' on first call. Pin the wheel URL in the Dockerfile so the image is self-sufficient.

Setting the threshold too high (above 0.95) makes Lakera permissive and lets jailbreaks slip through; setting it too low (below 0.5) blocks legitimate queries. Benchmark before deployment.

Calling Presidio per-token inside the agent loop is O(n²) overhead — invoke once on the final user message before the LLM call, not on every chunk of streaming output.

Real-World Interview Prep

Q1: How accurate is Presidio on banking-domain PII?

A: Presidio achieves 90-95% recall on standard entities (PERSON, EMAIL, PHONE, CREDIT_CARD) when the Spacy model is en_core_web_lg (as in this stack). The 5-10% miss rate is concentrated in (a) non-Western names not in the Spacy vocabulary, (b) domain-specific entities not in the default presets (e.g., UK National Insurance, sort codes, IBAN) — the inline PatternRecognizer for UK_NINO adds a regex recogniser to cover this gap, (c) heavily obfuscated formats (john[at]bank[dot]com). For high-stakes compliance, follow Presidio with a custom regex pass for the regex-foolable categories. Precision remains high (very few false positives) because Presidio’s scoring weights Spacy + recogniser agreement.

Q2: How does Lakera Guard decide if a prompt is a jailbreak?

A: Lakera Guard runs a classifier network (a fine-tuned BERT-like model) over the prompt and returns a score in [0.0, 1.0] (or flagged: bool per the v2 API). It’s been trained on a corpus of jailbreak patterns: persona-shift (“you are now X”), prompt-extraction (“repeat your system prompt”), base64-decoded attacks, etc. Comparison vs heuristic: Lakera catches heuristics-blind patterns (multi-language jailbreaks, semantic-only attacks) and is far more robust against obfuscation like leet-speak or unicode tricks. The trade-off is cost — Lakera is a paid API billed per call; for low-latency hot paths pre-screen with a fast regex pass and only call Lakera on the suspicious ~5%.

Q3: When should a security layer “fail open” vs “fail closed”?

A: Fail closed (block on error) when (a) the user-facing impact of a false negative (a leaked PII / a successful jailbreak) is severe, (b) you have regulatory obligation (PCI-DSS, FCA), (c) the user can re-attempt safely. Fail open (allow on error) when (a) the dependency is non-critical (e.g., Lakera is down but Presidio still runs), (b) a false positive (blocked legit user) is worse than a false negative, (c) the user cannot retry (e.g., in-chat escalation). The FCA stack defaults to fail-open with explicit logger.error(...) so outages are visible; in production a stronger default is fail-closed on PII but fail-open on jailbreak (lower user friction, higher compliance fidelity).

Top-to-Bottom Code Walkthrough (app/services/security_service.pysanitize_input, check_prompt_injection, redact_pii)

This is the outbound defense-in-depth for an FCA-grade chatbot. Three layers, each catching what the others miss.

Layer 1 — Presidio (local PII detection)

from presidio_analyzer import AnalyzerEngine from presidio_anonymizer import AnonymizerEngine analyzer = AnalyzerEngine() anonymizer = AnonymizerEngine() entities = analyzer.analyze(text=text, language="en") anonymized = anonymizer.anonymize(text=text, analyzer_results=entities)

Built-in entity types: PERSON, EMAIL, PHONE_NUMBER, IBAN, UK_NINO, CREDIT_CARD, IP_ADDRESS, URL, ORGANIZATION. Custom regex recognisers (for UK-specific):

from presidio_analyzer import Pattern, PatternRecognizer uk_nino_pattern = PatternRecognizer( supported_entity="UK_NINO", patterns=[Pattern(name="nino", regex=r"\b[A-CEGHJ-PR-TW-Z]{2}\d{6}[A-D]\b", score=0.85)], ) analyzer.registry.add_recognizer(uk_nino_pattern)

Layer 2 — Lakera Guard (cloud prompt-injection detection)

import httpx async with httpx.AsyncClient() as client: r = await client.post( "https://api.lakera.ai/v1/prompt_injection", headers={"Authorization": f"Bearer {settings.lakera_guard_api_key}"}, json={"input": user_text}, timeout=2.0, ) data = r.json() is_injection = data["injection"] > settings.security_jailbreak_threshold # default 0.8

timeout=2.0 — Lakera is called inline; a slow response would block the user message. 2 seconds is the right budget; if Lakera is unhealthy, fall back to regex + heuristic.

Layer 3 — Local regex (always-on fallback)

import re JAILBREAK_REGEXES = [ r"(?i)ignore (?:all|previous|above) instructions", r"(?i)you are now (?:DAN|jailbroken)", r"(?i)system prompt", r"(?i)prompt injection", ] def local_jailbreak_check(text): for pattern in JAILBREAK_REGEXES: if re.search(pattern, text): return True return False

Why three layers

  • Presidio alone: misses novel injection patterns. Strong on PII, weak on jailbreaks.
  • Lakera alone: misses PII. Strong on jailbreaks. Requires API key + connectivity.
  • Local regex alone: limited vocabularly; too many false negatives.

Together: Presidio scrubs PII in 50ms. Lakera flags jailbreaks in 200ms-2s. Regex catches what Lakera misses (e.g., Lakera downtime).

The combined pipeline in sanitize_input

async def sanitize_input(self, text: str) -> dict: # Layer 1: PII entities = self.analyzer.analyze(text=text, language="en") anonymized = self.anonymizer.anonymize(text=text, analyzer_results=entities) # Layer 2 & 3: Jailbreak (parallel) if settings.lakera_guard_api_key: is_inj_lakera = await self._lakera_check(anonymized.text) else: is_inj_lakera = self.local_jailbreak_check(text) return { "safe_text": anonymized.text, "is_injection": is_inj_lakera, "redacted_entities": [e.to_dict() for e in entities], }

Critical sequence: PII redaction runs BEFORE jailbreak check. Why? Because lazera should never see raw PII (data sovereignty).

redact_pii(text) — log-safe variant

def redact_pii(self, text: str) -> str: entities = self.analyzer.analyze(text=text, language="en") return self.anonymizer.anonymize(text=text, analyzer_results=entities).text

Used by logger.info(f"User said: {redact_pii(user_input)}"). Never log the raw prompt.

Kill switches

if not settings.security_enabled: return {"safe_text": text, "is_injection": False}

Two booleans:

  • security_enabled — master switch for ALL sanitisation (set to false ONLY in dev).
  • pii_redaction_enabled — Presidio only, Lakera still runs.

Performance characteristics

  • Presidio NLP: 50-300ms (spaCy model overhead). Cached instance at startup.
  • Lakera: 100-500ms. Async httpx.
  • Regex: <1ms.

Throughput: ~3-5 user messages/sec per worker. Sufficient for hundreds of concurrent users with a few workers.

Common Pitfalls

Sending raw PII to Lakera — Presidio must redact first. Lakera has its own SOC2; data still leaves your VPC.

Not running Presidio on the LLM responses — agent output might echo back entered PII. Anonymiser runs BOTH on input and output.

Caching Lakera results without TTL — attack patterns evolve; a 30-day cache could re-allow a now-known-bad prompt. TTL 5 min.

Real-World Interview Prep

Q1: Why is UK_NINO not a built-in Presidio entity?

A: Presidio ships with US-centric recognisers (US_SSN, US_PASSPORT). UK-specific identifiers need custom regex recognisers, registered via PatternRecognizer and added to the analyzer’s registry.

Q2: When would you fail-open vs fail-closed on Lakera downtime?

A: Fail-open (let request through with a warning) for inbound prompt-injection — blocking every request because Lakera is down is worse than running with degraded checks. Fail-closed for outbound PII redaction if Presidio is down — sending PII to LLM models is a regulatory violation.

Q3: How do you measure the false-positive rate of the jailbreak detector?

A: Build a labelled test corpus of 1000 legitimate + 500 attack prompts. Run each through the detector. Compute F1. Production tuning targets F1 ≈ 0.95. False positives are worse than misses — they block legitimate users.

Last updated on