feat(guardrails): add ATR (Agent Threat Rules) guardrail integration by eeee2345 · Pull Request #28050 · BerriAI/litellm

eeee2345 · 2026-05-16T09:46:01Z

Adds ATR (Agent Threat Rules) as a guardrail integration for LiteLLM proxy.

ATR is an MIT-licensed open detection rule format for AI agent security threats: prompt injection, tool poisoning, credential exfiltration, context manipulation, and other categories. Same family as Sigma/YARA but targeted at LLM I/O and agent runtime events. Detection runs locally via the pyatr reference engine, so no request data leaves the proxy.

What this adds

litellm/proxy/guardrails/guardrail_hooks/atr/atr.py plus __init__.py registration: ATRGuardrail class with async_pre_call_hook and async_post_call_success_hook, mirroring the per-package layout used by Lasso, Aporia, XecGuard, and the other recent guardrail integrations
litellm/types/proxy/guardrails/guardrail_hooks/atr.py: ATRGuardrailConfigModel for the UI / config surface
litellm/types/guardrails.py: SupportedGuardrailIntegrations.ATR enum value and ATRGuardrailLitellmParams mixin exposing rules_path on the proxy config (severity_threshold already lives on ContentFilterConfigModel and is reused)
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_atr.py: 7 unit tests covering missing-dependency, missing-path, invalid-severity, rule loading, severity filtering, pre-call blocking, and pre-call passing
docs/my-website/docs/proxy/guardrails/atr.md: install + config docs

The package __init__.py exports guardrail_class_registry and guardrail_initializer_registry, so it is auto-discovered by litellm/proxy/guardrails/guardrail_registry.py with no central-registry edits required.

Usage

guardrails:
  - guardrail_name: "atr-pre-call"
    litellm_params:
      guardrail: atr
      mode: "pre_call"
      rules_path: "./rules"
      severity_threshold: "high"

Requires pip install pyatr (optional dependency, not added to LiteLLM's own requirements).

Verification

make lint checks locally: black --check and ruff check . both pass on the new files and across litellm/
python tests/documentation_tests/test_circular_imports.py exits 0 with no new violations
from litellm import * succeeds
All 7 ATR tests pass; existing Lasso tests still pass (sanity regression on the shared LitellmParams mixin)

Production context

The ATR rule set this guardrail consumes is deployed at Microsoft Agent Governance Toolkit (PRs #908 and #1277, merged 2026-04), Cisco AI Defense skill-scanner (PRs #79 and #99, merged 2026-04), MISP / CIRCL via misp-taxonomies #323 and misp-galaxy #1207 (merged 2026-05), Gen Digital Sage (PR #33, merged 2026-05), and OWASP Agent-Security-Regression-Harness (PR #74, merged 2026-05). pyatr v0.2.4 is on PyPI.

Rule format and rules: https://github.com/Agent-Threat-Rule/agent-threat-rules

Happy to adjust the scope, severity mapping, or hook signature if you prefer a different pattern.

greptile-apps · 2026-05-16T09:48:21Z

Greptile Summary

Adds ATRGuardrail, a local-only guardrail that scans LLM input and output against the open-source Agent Threat Rules detection format via the pyatr engine. The implementation covers async_pre_call_hook, async_post_call_success_hook, and async_post_call_streaming_hook, addressing the streaming bypass gap flagged in the initial review.

atr.py implements all three hooks; unknown/None severity is now treated conservatively (rank 0), include_tags filtering is wired end-to-end, and the streaming hook follows the established per-chunk-with-accumulator pattern used by Azure text moderation.
litellm/types/guardrails.py adds the ATR enum value and ATRGuardrailLitellmParams mixin; __init__.py auto-registers the guardrail without any central-registry edits.
tests/test_atr.py adds 14 fully-mocked unit tests covering all hooks and edge cases (None severity, unknown severity, include_tags scoping, streaming block/pass/empty).

Confidence Score: 5/5

Safe to merge; all three guardrail hooks are correctly implemented and previously flagged issues are resolved.

Well-isolated new guardrail package with no risk to existing code paths. Streaming hook follows the same pattern as Azure text moderation, severity and tag-filtering logic is correct and tested, and no changes touch critical proxy infrastructure. The only outstanding item is documentation file placement, which does not affect runtime behavior.

docs/my-website/docs/proxy/guardrails/atr.md should be moved to the litellm-docs repo per repository policy.

Important Files Changed

Filename	Overview
litellm/proxy/guardrails/guardrail_hooks/atr/atr.py	Core guardrail implementation; all three hooks present; severity handling, None-severity conservatism, and include_tags filtering correctly implemented; docstring for streaming hook overstates aggregation guarantee
litellm/proxy/guardrails/guardrail_hooks/atr/init.py	Correctly registers ATRGuardrail in both registries; include_tags forwarded via getattr with safe fallback
litellm/types/guardrails.py	Adds ATR enum value and ATRGuardrailLitellmParams mixin; cleanly extends LitellmParams without touching existing fields
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_atr.py	14 fully-mocked unit tests covering all hooks and edge cases including None severity, unknown severity, include_tags scoping, and streaming
docs/my-website/docs/proxy/guardrails/atr.md	New guardrail documentation; should be placed in the litellm-docs repo per repository policy

_{Reviews (5): Last reviewed commit: "feat(atr-guardrail): add async_post_call..." | Re-trigger Greptile}

codecov · 2026-05-16T09:48:43Z

Codecov Report

❌ Patch coverage is 81.64794% with 49 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...itellm/proxy/guardrails/guardrail_hooks/atr/atr.py	81.48%	45 Missing ⚠️
...m/proxy/guardrails/guardrail_hooks/atr/__init__.py	60.00%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

veria-ai · 2026-05-16T09:53:24Z

PR overview

This PR adds an ATR (Agent Threat Rules) guardrail integration for the LiteLLM proxy, wiring ATR evaluation into request and response handling. The touched code focuses on extracting model-visible content from chat, responses, tools, and related payload fields for guardrail processing.

There are still three open gaps where model-visible or client-visible fields are not included in ATR evaluation: Responses API instructions and prompt variables, legacy chat function definitions, and tool-call/function-call arguments in model outputs. These allow callers or model responses to place relevant content outside the currently scanned fields, reducing the effectiveness of the new guardrail integration. Four issues have already been addressed, so the PR is moving in the right direction but still needs coverage fixes before the guardrail can be considered complete.

Open issues (3)

Medium: Responses instructions bypass ATR scanning — litellm/proxy/guardrails/guardrail_hooks/atr/atr.py:362
Medium: Legacy function definitions bypass input scanning — litellm/proxy/guardrails/guardrail_hooks/atr/atr.py:331
Medium: Tool call arguments bypass output scanning — litellm/proxy/guardrails/guardrail_hooks/atr/atr.py:381

Fixed/addressed: 4 · PR risk: 6/10

oss-pr-review-agent-shin · 2026-05-16T09:54:33Z

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 2/5 ❌

Why blocked:

1 PR-related CI failure (Greptile gate: score 3/5 below required 4/5 — request a Greptile review (@greptileai) and resolve its comments before maintainer review.) (pr_related_failures, -2 pts)
Greptile 3/5 (greptile_low, -1 pts)

Details: Score docked for: 1 PR-related CI failure (Greptile gate: score 3/5 below required 4/5 — request a Greptile review (@greptileai) and resolve its comments before maintainer review.); Greptile 3/5.

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

eeee2345 · 2026-05-18T00:48:17Z

Pushed dc394457 addressing both the veria-ai review and the coverage gap:

non-chat completion payloads: _extract_request_content now also reads data["prompt"] (str or list[str]), covering /v1/completions in addition to /v1/chat/completions messages
text completion responses: _extract_response_content now also reads choice.text alongside the existing choice.message.content path
coverage: added 5 tests (post-call block, post-call pass, text completion request as str, text completion request as list, text completion response via choice.text)

@greptileai

eeee2345 · 2026-05-18T09:30:25Z

Pushed 514f108 addressing the three Greptile findings:

P1 (severity=None AttributeError): guarded against None severity attribute using (raw_severity or "").lower() so the call never fails when severity is explicitly set to None

P2 (unknown severity silently excluded): changed the fallback rank from len(_SEVERITY_RANK) (4, never matches) to 0 (critical, always included) so rules with unrecognised severity strings are conservative-blocked rather than silently dropped

P1 (include_tags not wired): added include_tags parameter to ATRGuardrail.__init__, stored as self.include_tags, forwarded through initialize_guardrail via getattr(litellm_params, "include_tags", None), and applied in _scan before the severity check

Three new unit tests: test_scan_include_tags_filters_rules, test_scan_none_severity_treated_conservatively, test_scan_unknown_severity_treated_conservatively

@greptileai

eeee2345 · 2026-05-21T05:12:17Z

@veria-ai — wanted to address the streaming-bypass finding explicitly since the recent pushes (dc394457, 514f108) focused on the non-streaming and Greptile threads and the streaming question deserves its own reply.

Acknowledged that the current implementation hooks async_pre_call_hook and async_post_call_success_hook only, not async_post_call_streaming_iterator_hook (LiteLLM's per-chunk streaming surface). The trade-off I made for v1 of this guardrail:

Per-chunk scanning is unsafe for ATR's rule shape. ATR rules match against complete content (a full LLM response, a full SKILL.md document, a full MCP tool descriptor). A chunk-by-chunk scan against a regex would emit false negatives (the attack pattern split across two chunks never appears in either) and inconsistent false positives (a benign string that happens to look adversarial mid-emission triggers, then doesn't trigger when context arrives). Either failure mode is worse than no streaming-side detection.

The correct shape for streaming is buffered post-stream scan, which LiteLLM supports via async_post_call_streaming_hook (full-response, post-aggregation). I'll add that hook in this PR. It scans the assembled streamed response against the same output_block_threshold policy as the non-streaming path, so the policy is uniform whether the caller opts into streaming or not. The latency cost is one regex pass on response completion, not per-chunk.

What this does not cover: an attacker who streams a long-running response specifically to inject content before the buffered post-stream hook fires (e.g. a tool-call interleaved mid-stream that the agent acts on). That requires per-chunk inspection with a stateful aggregator, which is out of scope for a regex-based guardrail and belongs in a separate semantic-gate layer. I'll document this limitation explicitly in the guardrail's docstring rather than silently leaving the gap.

Will push the streaming-hook addition shortly. After that, the parity is:

Pre-call hook: scans input (chat messages + completion prompts)
Post-call hook (non-streaming): scans aggregated response
Post-call streaming hook (new): scans aggregated streamed response

@greptileai — please re-review after the streaming push. The three Greptile P1/P2/P3 findings from the earlier review are addressed in 514f108 with unit tests; the streaming hook will land as a separate commit so it's reviewable independently.

eeee2345 · 2026-05-26T18:46:33Z

Pushed the async_post_call_streaming_hook in commit on the branch.

The hook buffers the complete aggregated output and scans it once at end-of-stream. The design rationale for not scanning per-chunk is in the docstring: attack patterns split across chunks cause false negatives (missed detections) and premature blocking causes false positives. LiteLLM accumulates the full stream before calling this hook, so ATR evaluates the complete response in a single engine.evaluate() call.

@greptileai please re-review — this addresses the streaming bypass gap flagged in the initial review.

Signed-off-by: Adam Lin <adam@agentthreatrule.org>

…ion responses + add coverage - _extract_request_content: also reads data["prompt"] (str or list[str]) so /v1/completions payloads are scanned, not only chat messages - _extract_response_content: also reads choice.text for text completion responses alongside the existing choice.message.content path - tests: add 5 tests covering post-call hook (block + pass), text completion request (str prompt, list prompt), and text completion response (choice.text) to address coverage gap flagged in review

- include_tags: wire config param through __init__ and initialize_guardrail so tag-based rule filtering is honoured at runtime - severity=None: guard against AttributeError when match.severity is explicitly set to None rather than missing (getattr default is bypassed) - unknown severity: treat unrecognised severity strings conservatively (rank 0 = critical) so they are always included in scan results rather than silently dropped - tests: add three new unit tests covering include_tags filtering, None severity, and unknown severity strings

Addresses the veria-ai streaming-bypass finding. Scans the aggregated streamed response after stream completion using LiteLLM's existing post-call streaming surface; per-chunk scanning would emit false negatives for split-across-chunk attack patterns, so we wait for the aggregated text. Three new tests covering the streaming hook: block-on-match, pass-when-no-match, and no-op-on-empty-response. Signed-off-by: Adam Lin <adam@agentthreatrule.org>

…BerriAI#28050 review 2026-05-27) Addresses the two open medium findings veria-ai flagged on BerriAI#28050: 1. tool content bypasses scanning (atr.py:267) _extract_request_content now walks data['tools'] and concatenates function.name + function.description + json.dumps(function.parameters) into the scanned text. Same path applied to tool_choice when carrying a description. Anthropic / Claude tool shape (name + description directly on the tool object) also covered. 2. Responses API content bypasses scanning (atr.py:281) _extract_request_content branches on data['input'] alongside the existing data['messages'] / data['prompt'] paths. Supports both the string-input shape and the content-part-list shape used by /v1/responses. _extract_response_content mirrors this for the response side: walks response.output[*].content[*].text + the top-level response.output_text convenience field. 3. doc file removal docs/my-website/docs/proxy/guardrails/atr.md is removed from this PR per Greptile's repository-policy nit. Will open the equivalent in BerriAI/litellm-docs as a follow-up. Three new tests pin the behaviour: - test_scan_tools_function_description_blocked: tool.function.description with hidden instructions reaches the engine and triggers a block. - test_scan_responses_api_input_blocked: data['input'] content-part shape reaches the engine. - test_scan_responses_api_output_blocked: response['output'][*].content[*].text reaches the engine. All 21 tests pass locally (was 18 before).

Same code paths, same tests; refactored into four helper methods so the top-level extractor stays under Ruff's PLR0915 statement-count limit. _extract_messages_content chat completions messages[] _extract_prompt_content text completions prompt str | list[str] _extract_responses_input OpenAI Responses API data['input'] _extract_tools_content tool definitions + tool_choice _extract_request_content composes the above All 21 tests still pass locally; ruff check clean.

eeee2345 · 2026-05-29T13:05:13Z

@veria-ai pushed addressing both open findings.

Tool content (atr.py:267): _extract_request_content now walks data["tools"]
and concatenates function.name + function.description + json.dumps(parameters)
into the scanned text. Anthropic-shape tools and tool_choice with a
forced-function description are both covered.

Responses API (atr.py:281): the request side now branches on data["input"]
alongside the existing messages / prompt paths and handles both the string
shape and the content-part list shape used by /v1/responses. The response
side mirrors this in _extract_response_content for response.output[].content[].text
plus the top-level response.output_text convenience field.

Three new tests pin the behaviour:

test_scan_tools_function_description_blocked
test_scan_responses_api_input_blocked
test_scan_responses_api_output_blocked

All 21 ATR-guardrail tests pass locally; ruff clean.

Also removed docs/my-website/docs/proxy/guardrails/atr.md from this PR
per Greptile's repository-policy nit; will open the equivalent in
BerriAI/litellm-docs as a follow-up.

The PLR0915 ruff violation that surfaced on the first push (function
length over 50 statements) is fixed in b6df3fd by splitting
_extract_request_content into four small helper methods. Same code
paths, same tests.

veria-ai · 2026-05-29T13:11:41Z

+                    parts.append(desc)
+        return parts
+
+    def _extract_request_content(self, data: dict) -> str:


Medium: Responses instructions bypass ATR scanning

A caller can send a benign /v1/responses input and put the blocked prompt in top-level instructions or prompt template variables; LiteLLM forwards those fields to the model, but _extract_request_content never includes them in the ATR scan. Include model-visible Responses fields such as instructions and string values under prompt.variables before evaluating the request.

… atr-guardrail # Conflicts: # litellm/types/guardrails.py

veria-ai · 2026-06-04T22:44:50Z

+        Anthropic / Claude direct shape.
+        """
+        parts: List[str] = []
+        for tool in data.get("tools") or []:


Medium: Legacy function definitions bypass input scanning

LiteLLM accepts functions / function_call on chat completions and forwards them, or folds them into the provider prompt for unsupported models. This extractor only walks tools, so a caller can put the blocked prompt in functions[0].description or functions[0].parameters while keeping messages benign and pass the pre-call ATR scan. Include the legacy functions array in the scanned parts as well.

veria-ai · 2026-06-04T22:44:50Z

+            if message is None and isinstance(choice, dict):
+                message = choice.get("message", {})
+            if message is not None:
+                content: Optional[str] = getattr(message, "content", None)


Medium: Tool call arguments bypass output scanning

Chat model tool calls are returned in message.tool_calls[].function.arguments or legacy message.function_call.arguments, but this method only appends message.content and choice.text. A prompt that makes the model return a blocked command or secret in tool-call arguments is returned to the client without ATR post-call scanning. Extract and scan those fields before returning the response.

eeee2345 · 2026-06-11T18:43:36Z

Status: this is green and ready for review. CI passes (lint, semgrep, secret-scan, guardrails tests, codecov), and Greptile's re-review is at 5/5 "safe to merge" after the earlier P1/P2 fixes. The first-pass blocker score predates those fixes. Could a maintainer take a look? Happy to fold the remaining response-field coverage suggestions into this PR if you'd like them in scope.

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

Comment thread litellm/types/proxy/guardrails/guardrail_hooks/atr.py

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py Outdated

veria-ai Bot reviewed May 16, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py Outdated

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py Outdated

veria-ai Bot reviewed May 18, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py

eeee2345 mentioned this pull request May 21, 2026

docs(cookbook): add Agent Threat Rules detection callback example #27522

Open

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py

veria-ai Bot reviewed May 21, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py

eeee2345 and others added 5 commits May 28, 2026 02:45

feat(guardrails): add ATR (Agent Threat Rules) guardrail integration

7f6ab95

Signed-off-by: Adam Lin <adam@agentthreatrule.org>

chore: apply black formatting

f58694e

eeee2345 force-pushed the atr-guardrail branch from ba0dae9 to b33290f Compare May 27, 2026 18:45

veria-ai Bot reviewed May 27, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py

eeee2345 added 2 commits May 29, 2026 20:50

veria-ai Bot reviewed May 29, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/litellm_internal_staging' into…

aa52a34

… atr-guardrail # Conflicts: # litellm/types/guardrails.py

veria-ai Bot reviewed Jun 4, 2026

View reviewed changes

Uh oh!

Conversation

eeee2345 commented May 16, 2026

What this adds

Usage

Verification

Production context

Uh oh!

greptile-apps Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

veria-ai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Open issues (3)

Uh oh!

Uh oh!

oss-pr-review-agent-shin Bot commented May 16, 2026

Uh oh!

eeee2345 commented May 18, 2026

Uh oh!

Uh oh!

eeee2345 commented May 18, 2026

Uh oh!

Uh oh!

eeee2345 commented May 21, 2026

Uh oh!

Uh oh!

Uh oh!

eeee2345 commented May 26, 2026

Uh oh!

Uh oh!

eeee2345 commented May 29, 2026

Uh oh!

veria-ai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

eeee2345 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 16, 2026 •

edited

Loading

codecov Bot commented May 16, 2026 •

edited

Loading

veria-ai Bot commented May 16, 2026 •

edited

Loading