Skip to content

feat(guardrails): add ATR (Agent Threat Rules) guardrail integration#28050

Open
eeee2345 wants to merge 8 commits into
BerriAI:litellm_internal_stagingfrom
eeee2345:atr-guardrail
Open

feat(guardrails): add ATR (Agent Threat Rules) guardrail integration#28050
eeee2345 wants to merge 8 commits into
BerriAI:litellm_internal_stagingfrom
eeee2345:atr-guardrail

Conversation

@eeee2345

Copy link
Copy Markdown

Adds ATR (Agent Threat Rules) as a guardrail integration for LiteLLM proxy.

ATR is an MIT-licensed open detection rule format for AI agent security threats: prompt injection, tool poisoning, credential exfiltration, context manipulation, and other categories. Same family as Sigma/YARA but targeted at LLM I/O and agent runtime events. Detection runs locally via the pyatr reference engine, so no request data leaves the proxy.

What this adds

  • litellm/proxy/guardrails/guardrail_hooks/atr/atr.py plus __init__.py registration: ATRGuardrail class with async_pre_call_hook and async_post_call_success_hook, mirroring the per-package layout used by Lasso, Aporia, XecGuard, and the other recent guardrail integrations
  • litellm/types/proxy/guardrails/guardrail_hooks/atr.py: ATRGuardrailConfigModel for the UI / config surface
  • litellm/types/guardrails.py: SupportedGuardrailIntegrations.ATR enum value and ATRGuardrailLitellmParams mixin exposing rules_path on the proxy config (severity_threshold already lives on ContentFilterConfigModel and is reused)
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/test_atr.py: 7 unit tests covering missing-dependency, missing-path, invalid-severity, rule loading, severity filtering, pre-call blocking, and pre-call passing
  • docs/my-website/docs/proxy/guardrails/atr.md: install + config docs

The package __init__.py exports guardrail_class_registry and guardrail_initializer_registry, so it is auto-discovered by litellm/proxy/guardrails/guardrail_registry.py with no central-registry edits required.

Usage

guardrails:
  - guardrail_name: "atr-pre-call"
    litellm_params:
      guardrail: atr
      mode: "pre_call"
      rules_path: "./rules"
      severity_threshold: "high"

Requires pip install pyatr (optional dependency, not added to LiteLLM's own requirements).

Verification

  • make lint checks locally: black --check and ruff check . both pass on the new files and across litellm/
  • python tests/documentation_tests/test_circular_imports.py exits 0 with no new violations
  • from litellm import * succeeds
  • All 7 ATR tests pass; existing Lasso tests still pass (sanity regression on the shared LitellmParams mixin)

Production context

The ATR rule set this guardrail consumes is deployed at Microsoft Agent Governance Toolkit (PRs #908 and #1277, merged 2026-04), Cisco AI Defense skill-scanner (PRs #79 and #99, merged 2026-04), MISP / CIRCL via misp-taxonomies #323 and misp-galaxy #1207 (merged 2026-05), Gen Digital Sage (PR #33, merged 2026-05), and OWASP Agent-Security-Regression-Harness (PR #74, merged 2026-05). pyatr v0.2.4 is on PyPI.

Rule format and rules: https://github.com/Agent-Threat-Rule/agent-threat-rules

Happy to adjust the scope, severity mapping, or hook signature if you prefer a different pattern.

@greptile-apps

greptile-apps Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds ATRGuardrail, a local-only guardrail that scans LLM input and output against the open-source Agent Threat Rules detection format via the pyatr engine. The implementation covers async_pre_call_hook, async_post_call_success_hook, and async_post_call_streaming_hook, addressing the streaming bypass gap flagged in the initial review.

  • atr.py implements all three hooks; unknown/None severity is now treated conservatively (rank 0), include_tags filtering is wired end-to-end, and the streaming hook follows the established per-chunk-with-accumulator pattern used by Azure text moderation.
  • litellm/types/guardrails.py adds the ATR enum value and ATRGuardrailLitellmParams mixin; __init__.py auto-registers the guardrail without any central-registry edits.
  • tests/test_atr.py adds 14 fully-mocked unit tests covering all hooks and edge cases (None severity, unknown severity, include_tags scoping, streaming block/pass/empty).

Confidence Score: 5/5

Safe to merge; all three guardrail hooks are correctly implemented and previously flagged issues are resolved.

Well-isolated new guardrail package with no risk to existing code paths. Streaming hook follows the same pattern as Azure text moderation, severity and tag-filtering logic is correct and tested, and no changes touch critical proxy infrastructure. The only outstanding item is documentation file placement, which does not affect runtime behavior.

docs/my-website/docs/proxy/guardrails/atr.md should be moved to the litellm-docs repo per repository policy.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/atr/atr.py Core guardrail implementation; all three hooks present; severity handling, None-severity conservatism, and include_tags filtering correctly implemented; docstring for streaming hook overstates aggregation guarantee
litellm/proxy/guardrails/guardrail_hooks/atr/init.py Correctly registers ATRGuardrail in both registries; include_tags forwarded via getattr with safe fallback
litellm/types/guardrails.py Adds ATR enum value and ATRGuardrailLitellmParams mixin; cleanly extends LitellmParams without touching existing fields
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_atr.py 14 fully-mocked unit tests covering all hooks and edge cases including None severity, unknown severity, include_tags scoping, and streaming
docs/my-website/docs/proxy/guardrails/atr.md New guardrail documentation; should be placed in the litellm-docs repo per repository policy

Reviews (5): Last reviewed commit: "feat(atr-guardrail): add async_post_call..." | Re-trigger Greptile

Comment thread litellm/types/proxy/guardrails/guardrail_hooks/atr.py
Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py Outdated
@codecov

codecov Bot commented May 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.64794% with 49 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...itellm/proxy/guardrails/guardrail_hooks/atr/atr.py 81.48% 45 Missing ⚠️
...m/proxy/guardrails/guardrail_hooks/atr/__init__.py 60.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@veria-ai

veria-ai Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

PR overview

This PR adds an ATR (Agent Threat Rules) guardrail integration for the LiteLLM proxy, wiring ATR evaluation into request and response handling. The touched code focuses on extracting model-visible content from chat, responses, tools, and related payload fields for guardrail processing.

There are still three open gaps where model-visible or client-visible fields are not included in ATR evaluation: Responses API instructions and prompt variables, legacy chat function definitions, and tool-call/function-call arguments in model outputs. These allow callers or model responses to place relevant content outside the currently scanned fields, reducing the effectiveness of the new guardrail integration. Four issues have already been addressed, so the PR is moving in the right direction but still needs coverage fixes before the guardrail can be considered complete.

Open issues (3)

Fixed/addressed: 4 · PR risk: 6/10

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py Outdated
@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 2/5

Why blocked:

  • 1 PR-related CI failure (Greptile gate: score 3/5 below required 4/5 — request a Greptile review (@greptileai) and resolve its comments before maintainer review.) (pr_related_failures, -2 pts)
  • Greptile 3/5 (greptile_low, -1 pts)

Details: Score docked for: 1 PR-related CI failure (Greptile gate: score 3/5 below required 4/5 — request a Greptile review (@greptileai) and resolve its comments before maintainer review.); Greptile 3/5.

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

@eeee2345

Copy link
Copy Markdown
Author

Pushed dc394457 addressing both the veria-ai review and the coverage gap:

  1. non-chat completion payloads: _extract_request_content now also reads data["prompt"] (str or list[str]), covering /v1/completions in addition to /v1/chat/completions messages

  2. text completion responses: _extract_response_content now also reads choice.text alongside the existing choice.message.content path

  3. coverage: added 5 tests (post-call block, post-call pass, text completion request as str, text completion request as list, text completion response via choice.text)

@greptileai

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py Outdated
@eeee2345

Copy link
Copy Markdown
Author

Pushed 514f108 addressing the three Greptile findings:

P1 (severity=None AttributeError): guarded against None severity attribute using (raw_severity or "").lower() so the call never fails when severity is explicitly set to None

P2 (unknown severity silently excluded): changed the fallback rank from len(_SEVERITY_RANK) (4, never matches) to 0 (critical, always included) so rules with unrecognised severity strings are conservative-blocked rather than silently dropped

P1 (include_tags not wired): added include_tags parameter to ATRGuardrail.__init__, stored as self.include_tags, forwarded through initialize_guardrail via getattr(litellm_params, "include_tags", None), and applied in _scan before the severity check

Three new unit tests: test_scan_include_tags_filters_rules, test_scan_none_severity_treated_conservatively, test_scan_unknown_severity_treated_conservatively

@greptileai

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py
@eeee2345

Copy link
Copy Markdown
Author

@veria-ai — wanted to address the streaming-bypass finding explicitly since the recent pushes (dc394457, 514f108) focused on the non-streaming and Greptile threads and the streaming question deserves its own reply.

Acknowledged that the current implementation hooks async_pre_call_hook and async_post_call_success_hook only, not async_post_call_streaming_iterator_hook (LiteLLM's per-chunk streaming surface). The trade-off I made for v1 of this guardrail:

Per-chunk scanning is unsafe for ATR's rule shape. ATR rules match against complete content (a full LLM response, a full SKILL.md document, a full MCP tool descriptor). A chunk-by-chunk scan against a regex would emit false negatives (the attack pattern split across two chunks never appears in either) and inconsistent false positives (a benign string that happens to look adversarial mid-emission triggers, then doesn't trigger when context arrives). Either failure mode is worse than no streaming-side detection.

The correct shape for streaming is buffered post-stream scan, which LiteLLM supports via async_post_call_streaming_hook (full-response, post-aggregation). I'll add that hook in this PR. It scans the assembled streamed response against the same output_block_threshold policy as the non-streaming path, so the policy is uniform whether the caller opts into streaming or not. The latency cost is one regex pass on response completion, not per-chunk.

What this does not cover: an attacker who streams a long-running response specifically to inject content before the buffered post-stream hook fires (e.g. a tool-call interleaved mid-stream that the agent acts on). That requires per-chunk inspection with a stateful aggregator, which is out of scope for a regex-based guardrail and belongs in a separate semantic-gate layer. I'll document this limitation explicitly in the guardrail's docstring rather than silently leaving the gap.

Will push the streaming-hook addition shortly. After that, the parity is:

  • Pre-call hook: scans input (chat messages + completion prompts)
  • Post-call hook (non-streaming): scans aggregated response
  • Post-call streaming hook (new): scans aggregated streamed response

@greptileai — please re-review after the streaming push. The three Greptile P1/P2/P3 findings from the earlier review are addressed in 514f108 with unit tests; the streaming hook will land as a separate commit so it's reviewable independently.

Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py
Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py
@eeee2345

Copy link
Copy Markdown
Author

Pushed the async_post_call_streaming_hook in commit on the branch.

The hook buffers the complete aggregated output and scans it once at end-of-stream. The design rationale for not scanning per-chunk is in the docstring: attack patterns split across chunks cause false negatives (missed detections) and premature blocking causes false positives. LiteLLM accumulates the full stream before calling this hook, so ATR evaluates the complete response in a single engine.evaluate() call.

@greptileai please re-review — this addresses the streaming bypass gap flagged in the initial review.

eeee2345 and others added 5 commits May 28, 2026 02:45
Signed-off-by: Adam Lin <adam@agentthreatrule.org>
…ion responses + add coverage

- _extract_request_content: also reads data["prompt"] (str or list[str])
  so /v1/completions payloads are scanned, not only chat messages
- _extract_response_content: also reads choice.text for text completion
  responses alongside the existing choice.message.content path
- tests: add 5 tests covering post-call hook (block + pass), text
  completion request (str prompt, list prompt), and text completion
  response (choice.text) to address coverage gap flagged in review
- include_tags: wire config param through __init__ and initialize_guardrail
  so tag-based rule filtering is honoured at runtime
- severity=None: guard against AttributeError when match.severity is
  explicitly set to None rather than missing (getattr default is bypassed)
- unknown severity: treat unrecognised severity strings conservatively
  (rank 0 = critical) so they are always included in scan results rather
  than silently dropped
- tests: add three new unit tests covering include_tags filtering,
  None severity, and unknown severity strings
Addresses the veria-ai streaming-bypass finding. Scans the aggregated
streamed response after stream completion using LiteLLM's existing
post-call streaming surface; per-chunk scanning would emit false
negatives for split-across-chunk attack patterns, so we wait for the
aggregated text.

Three new tests covering the streaming hook: block-on-match,
pass-when-no-match, and no-op-on-empty-response.

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
Comment thread litellm/proxy/guardrails/guardrail_hooks/atr/atr.py
eeee2345 added 2 commits May 29, 2026 20:50
…BerriAI#28050 review 2026-05-27)

Addresses the two open medium findings veria-ai flagged on
BerriAI#28050:

1. tool content bypasses scanning (atr.py:267)
   _extract_request_content now walks data['tools'] and concatenates
   function.name + function.description + json.dumps(function.parameters)
   into the scanned text. Same path applied to tool_choice when carrying
   a description. Anthropic / Claude tool shape (name + description
   directly on the tool object) also covered.

2. Responses API content bypasses scanning (atr.py:281)
   _extract_request_content branches on data['input'] alongside the
   existing data['messages'] / data['prompt'] paths. Supports both the
   string-input shape and the content-part-list shape used by
   /v1/responses. _extract_response_content mirrors this for the
   response side: walks response.output[*].content[*].text + the
   top-level response.output_text convenience field.

3. doc file removal
   docs/my-website/docs/proxy/guardrails/atr.md is removed from this
   PR per Greptile's repository-policy nit. Will open the equivalent
   in BerriAI/litellm-docs as a follow-up.

Three new tests pin the behaviour:
- test_scan_tools_function_description_blocked: tool.function.description
  with hidden instructions reaches the engine and triggers a block.
- test_scan_responses_api_input_blocked: data['input'] content-part shape
  reaches the engine.
- test_scan_responses_api_output_blocked: response['output'][*].content[*].text
  reaches the engine.

All 21 tests pass locally (was 18 before).
Same code paths, same tests; refactored into four helper methods so the
top-level extractor stays under Ruff's PLR0915 statement-count limit.

  _extract_messages_content    chat completions messages[]
  _extract_prompt_content      text completions prompt str | list[str]
  _extract_responses_input     OpenAI Responses API data['input']
  _extract_tools_content       tool definitions + tool_choice
  _extract_request_content     composes the above

All 21 tests still pass locally; ruff check clean.
@eeee2345

Copy link
Copy Markdown
Author

@veria-ai pushed addressing both open findings.

Tool content (atr.py:267): _extract_request_content now walks data["tools"]
and concatenates function.name + function.description + json.dumps(parameters)
into the scanned text. Anthropic-shape tools and tool_choice with a
forced-function description are both covered.

Responses API (atr.py:281): the request side now branches on data["input"]
alongside the existing messages / prompt paths and handles both the string
shape and the content-part list shape used by /v1/responses. The response
side mirrors this in _extract_response_content for response.output[].content[].text
plus the top-level response.output_text convenience field.

Three new tests pin the behaviour:

  • test_scan_tools_function_description_blocked
  • test_scan_responses_api_input_blocked
  • test_scan_responses_api_output_blocked

All 21 ATR-guardrail tests pass locally; ruff clean.

Also removed docs/my-website/docs/proxy/guardrails/atr.md from this PR
per Greptile's repository-policy nit; will open the equivalent in
BerriAI/litellm-docs as a follow-up.

The PLR0915 ruff violation that surfaced on the first push (function
length over 50 statements) is fixed in b6df3fd by splitting
_extract_request_content into four small helper methods. Same code
paths, same tests.

parts.append(desc)
return parts

def _extract_request_content(self, data: dict) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Responses instructions bypass ATR scanning

A caller can send a benign /v1/responses input and put the blocked prompt in top-level instructions or prompt template variables; LiteLLM forwards those fields to the model, but _extract_request_content never includes them in the ATR scan. Include model-visible Responses fields such as instructions and string values under prompt.variables before evaluating the request.

… atr-guardrail

# Conflicts:
#	litellm/types/guardrails.py
Anthropic / Claude direct shape.
"""
parts: List[str] = []
for tool in data.get("tools") or []:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Legacy function definitions bypass input scanning

LiteLLM accepts functions / function_call on chat completions and forwards them, or folds them into the provider prompt for unsupported models. This extractor only walks tools, so a caller can put the blocked prompt in functions[0].description or functions[0].parameters while keeping messages benign and pass the pre-call ATR scan. Include the legacy functions array in the scanned parts as well.

if message is None and isinstance(choice, dict):
message = choice.get("message", {})
if message is not None:
content: Optional[str] = getattr(message, "content", None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Tool call arguments bypass output scanning

Chat model tool calls are returned in message.tool_calls[].function.arguments or legacy message.function_call.arguments, but this method only appends message.content and choice.text. A prompt that makes the model return a blocked command or secret in tool-call arguments is returned to the client without ATR post-call scanning. Extract and scan those fields before returning the response.

@eeee2345

Copy link
Copy Markdown
Author

Status: this is green and ready for review. CI passes (lint, semgrep, secret-scan, guardrails tests, codecov), and Greptile's re-review is at 5/5 "safe to merge" after the earlier P1/P2 fixes. The first-pass blocker score predates those fixes. Could a maintainer take a look? Happy to fold the remaining response-field coverage suggestions into this PR if you'd like them in scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant