detectors: add Agent Threat Rules#1676
Conversation
|
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
|
please sign dco for review |
c960ed7 to
90c887d
Compare
|
I have read the DCO Document and I hereby sign the DCO |
|
recheck |
jmartin-tech
left a comment
There was a problem hiding this comment.
Thanks for submission.
As an extra detector to mix into a run this could be useful. The helper methods that have been placed in the detector should likely be extracted as separate tooling.
| subprocess.run( | ||
| ["git", "clone", "--depth", "1", "-b", branch, | ||
| f"https://github.com/{repo}.git", tmpdir], | ||
| check=True, capture_output=True, | ||
| ) |
There was a problem hiding this comment.
subprocess.run is not an acceptable method of retrieving data at runtime.
| def generate_rule_from_probe( | ||
| probe_outputs: list[str], | ||
| category: str = "prompt-injection", | ||
| severity: str = "high", | ||
| min_common_length: int = 8, | ||
| ) -> str: | ||
| """Generate an ATR rule YAML draft from successful Garak probe outputs. | ||
|
|
||
| Takes a list of strings that bypassed defenses (successful attacks) | ||
| and extracts common substrings as detection patterns. Returns a | ||
| YAML rule string ready for review and submission to ATR. | ||
|
|
||
| This is a starting point -- generated rules should be reviewed by | ||
| a human before being added to the ATR ruleset. | ||
|
|
||
| Usage:: | ||
|
|
||
| from garak.detectors.atr import generate_rule_from_probe | ||
| attacks = ["ignore previous instructions and ...", "forget all rules and ..."] | ||
| rule_yaml = generate_rule_from_probe(attacks, category="prompt-injection") | ||
| print(rule_yaml) | ||
| """ | ||
| if not probe_outputs: | ||
| return "" | ||
|
|
||
| # Extract keywords that appear in 50%+ of outputs | ||
| word_counts: dict[str, int] = {} | ||
| for text in probe_outputs: | ||
| words = set(re.findall(r"\b[a-zA-Z]{4,}\b", text.lower())) | ||
| for w in words: | ||
| word_counts[w] = word_counts.get(w, 0) + 1 | ||
|
|
||
| threshold = max(2, len(probe_outputs) // 2) | ||
| common_words = sorted( | ||
| [w for w, c in word_counts.items() if c >= threshold], | ||
| key=lambda w: word_counts[w], | ||
| reverse=True, | ||
| )[:6] | ||
|
|
||
| if not common_words: | ||
| return "" | ||
|
|
||
| # Build regex pattern from common words | ||
| pattern = r"(?i)\b" + r"\b.*\b".join(re.escape(w) for w in common_words[:4]) + r"\b" | ||
|
|
||
| date = datetime.now().strftime("%Y/%m/%d") | ||
| rule_id = f"ATR-DRAFT-{hash(pattern) % 100000:05d}" | ||
|
|
||
| return f"""title: "Garak-generated: {common_words[0]} pattern" | ||
| id: {rule_id} | ||
| rule_version: 1 | ||
| status: draft | ||
| description: > | ||
| Auto-generated from {len(probe_outputs)} successful Garak probe outputs. | ||
| Common keywords: {', '.join(common_words[:6])}. | ||
| REVIEW REQUIRED before adding to production ruleset. | ||
| author: "garak + ATR" | ||
| date: "{date}" | ||
| schema_version: "0.1" | ||
| detection_tier: pattern | ||
| maturity: experimental | ||
| severity: {severity} | ||
| tags: | ||
| category: {category} | ||
| subcategory: garak-generated | ||
| confidence: low | ||
| agent_source: | ||
| type: mcp_exchange | ||
| framework: [any] | ||
| provider: [any] | ||
| detection: | ||
| conditions: | ||
| - field: content | ||
| operator: regex | ||
| value: '{pattern}' | ||
| description: "Pattern from {len(probe_outputs)} Garak probe outputs" | ||
| condition: any | ||
| response: | ||
| actions: [alert] | ||
| test_cases: | ||
| true_positives: | ||
| - input: "{probe_outputs[0][:100].replace(chr(34), chr(39))}" | ||
| expected: triggered | ||
| """ |
There was a problem hiding this comment.
This is an interesting utility for creating yaml files to contribute to defensive tooling. It is never called by code in this PR. I would suggest it should be extracted as a utility that could be used to post process either a report.jsonl or hitlog.jsonl. I could see this being placed in the tools path for use only in a repo based install or exposed as an analyze module to be available as a package provided utility shipped as part of installed package similar to how report_digest is exposed.
| def sync_rules_from_github( | ||
| repo: str = "Agent-Threat-Rule/agent-threat-rules", | ||
| branch: str = "main", | ||
| output: Path | None = None, | ||
| ) -> int: | ||
| """Fetch latest ATR rules from GitHub and update the bundled JSON. | ||
|
|
||
| Requires: git, PyYAML (pip install pyyaml). | ||
| Returns the number of patterns synced. | ||
|
|
||
| Usage:: | ||
|
|
||
| from garak.detectors.atr import sync_rules_from_github | ||
| count = sync_rules_from_github() | ||
| print(f"Synced {count} patterns") | ||
| """ | ||
| import yaml # PyYAML -- optional dependency | ||
|
|
||
| dest = output or _RULES_PATH | ||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| subprocess.run( | ||
| ["git", "clone", "--depth", "1", "-b", branch, | ||
| f"https://github.com/{repo}.git", tmpdir], | ||
| check=True, capture_output=True, | ||
| ) | ||
| rules_dir = Path(tmpdir) / "rules" | ||
| if not rules_dir.exists(): | ||
| raise FileNotFoundError(f"No rules/ directory in {repo}") | ||
|
|
||
| result: dict[str, list[list[str]]] = {} | ||
| for yaml_file in sorted(rules_dir.rglob("*.yaml")): | ||
| doc = yaml.safe_load(yaml_file.read_text()) | ||
| if not doc or not doc.get("detection", {}).get("conditions"): | ||
| continue | ||
| cat = doc.get("tags", {}).get("category", "unknown") | ||
| if cat not in result: | ||
| result[cat] = [] | ||
| for cond in doc["detection"]["conditions"]: | ||
| if cond.get("operator") == "regex" and cond.get("value"): | ||
| pat = re.sub(r"^\(\?[imsx]+\)", "", cond["value"]) | ||
| result[cat].append([doc["id"], doc.get("severity", "medium"), pat]) | ||
|
|
||
| dest.write_text(json.dumps(result, indent=2, ensure_ascii=True)) | ||
| total = sum(len(v) for v in result.values()) | ||
| logger.info("ATR sync: %d patterns across %d categories -> %s", total, len(result), dest) | ||
| return total |
There was a problem hiding this comment.
I would suggest this also is likely better extracted into the tools path as a separate utility to be executed independently to configuration the user's system. The utility should likely write the generated configuration to the user's XDG based data_path by default or to stdout so the user can place it in the correct location in their XDG_DATA_HOME for the detector to pick it up in place of the shipped version.
| _RULES_PATH = Path(__file__).parent / "atr_rules.json" | ||
| _ALL_RULES: dict[str, list[list[str]]] = {} | ||
| if _RULES_PATH.exists(): | ||
| with open(_RULES_PATH) as f: | ||
| _ALL_RULES = json.load(f) | ||
| else: | ||
| logger.warning("ATR rules file not found: %s", _RULES_PATH) |
There was a problem hiding this comment.
This should use the data_path access pattern see:
garak/garak/probes/snowball.py
Lines 42 to 47 in 2569a60
from garak.data import path as data_path
This helper class provides access to files in the installed package's data directory and supports user override of the file via the XDG base directory specification so users can provider their own content without needed write permissions to the python runtime library path.
Also it is preferred to load this inside of __init__ for a detector instead of globally on module import. This could be accomplished using a the ABC abstract class patterns. See:
garak/garak/probes/packagehallucination.py
Lines 63 to 102 in 2569a60
|
Thanks @jmartin-tech @leondz for the thorough review. Addressed all four points:
For context on the rule set and methodology — the full spec is at agentthreatrule.org and the academic paper is on Zenodo. Would appreciate any design feedback on how the detector categories map to garak's existing taxonomy — happy to adjust the tagging. |
abf1394 to
63f60e8
Compare
jmartin-tech
left a comment
There was a problem hiding this comment.
A few more adjustments requested.
| xdg_dir.mkdir(parents=True, exist_ok=True) | ||
| return xdg_dir / "rules.json" | ||
| except Exception: | ||
| return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json" |
There was a problem hiding this comment.
I am not sure the fallback location for the exception handler makes sense. If the import fails the tool was likely executed from a location other than the repo source. It is also somewhat unexpected for a tool to create something in a relative path like that. I would hazard that support for either XDG path or a user supplied command line location is sufficient and if the XDG path search raises and exception it may best to exit early and suggest the user to supply a valid --output value or utilize the --stdout option.
| return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json" | |
| print("The user XDG storage location could not be identified, supply --output or --stdout options or ensure garak is available in the python environment.", file=sys.stderr) | |
| sys.exit(1) |
|
@jmartin-tech @leondz — all four review points have been addressed (data_path, no subprocess, extracted tools, init-time loading). Ready for re-review when you have a moment. Since the original submission, ATR has shipped v2.0.0 with some changes worth noting:
Happy to update the PR to v2.0.0 rules if that's useful. The sync tool already supports pulling latest from npm, so garak users would get rule updates automatically via Also — if there's interest, ATR's Threat Cloud can accept detection signals from garak runs. That means every garak user running ATR detectors would contribute back to the rule pipeline. No PII, just pattern hashes. Happy to discuss if that's in scope. |
|
Apologies — my previous comment referenced the round-1 feedback only. I missed that there was a second round of review on 4/10. This commit addresses all round-2 items:
Also updated rules.json to ATR v2.0.0 (113 rules, 736 patterns). |
35d0cd9 to
ddef4a5
Compare
|
Two non-architectural changes since `ddef4a58`: updated `rules.json` to current production set, corrected `atr.py` docstring. Bundle update
172 production-only rules — no garak probe equivalent 121 of 293 rules derive from garak probe payloads via `metadata_provenance.garak_probe`, covering all 32 probe modules. The remaining 172 come from ATR's scan of live MCP/skill registries — patterns observed in production deployments that don't correspond to any current garak probe class. If useful to the project, these could inform future probe development. IITW validation On `inthewild_jailbreak_llms.json` (666 real-world jailbreaks, ATR v2.0.11): 647/666 detected, 97.1% recall. Caveat: the 121 probe-aligned rules account for the majority of that coverage, so this measures probe-rule completeness more than blind generalization. On benign traffic (498 real-world SKILL.md samples), FP rate is 0.20%. Reproducible: `bash scripts/eval-garak.sh` in the ATR repo. |
|
@leondz @jmartin-tech — quick update. Since the last comment (4/21, v2.0.12 / 293 rules), ATR has shipped to v2.0.17 with 314 production rules. Cisco AI Defense merged the full 314-rule pack on 4/22 (skill-scanner #99), and Microsoft Agent Governance Toolkit followed on 4/26 (#1277, 287 rules + weekly auto-sync workflow). For this PR — happy to bump rules.json to v2.0.17, or leave at v2.0.12 if you'd prefer to merge first and let users update via the sync tool. Either works. If anything else is needed to unblock merge, please let me know — all round-2 review items have been addressed in commits since ddef4a5. Thanks for the two rounds of review. |
|
Just a heads-up, not a re-bump — ATR shipped v2.1.0 today with 100% NIST AI RMF mapping (330 rules across 16 RMF subcategories, 1,566 mappings). Mentioning it in case the RMF traceability is useful for downstream garak users running compliance-adjacent evals, or for how this PR's detector categories surface in reports. Available now as Happy to leave the PR pinned at v2.0.12 since the sync tool pulls latest — no commit needed from your side. Will keep an eye out for any further review feedback when you get a moment. |
|
@jmartin-tech @leondz — two updates since 5/9, both strengthen the case for ATR detectors in garak.
Still happy to address any open review items whenever the cycle lines up. |
|
@jmartin-tech @leondz — quick update on ATR's standardisation footprint, both relevant to garak's downstream operators. ATR was integrated into MISP at two layers on 2026-05-10, both merged by adulau (MISP project lead):
What this means for garak users: red-team runs that emit ATR rule IDs now resolve natively in MISP — taxonomy gives rule-ID labelling, and the galaxy cluster gives the cluster-level context CSIRTs use for incident triage. garak operators routing red-team findings into MISP-compatible SIEM / CSIRT workflows get full enterprise threat-intel shape on every detection without a translation layer. Still happy to address any open review items on #1676 whenever the cycle lines up. |
5e7d484 to
1fb9709
Compare
|
DCO signed. Rebased all 4 commits with Signed-off-by: Adam Lin adam@agentthreatrule.org and force-pushed to feat/atr-detectors. CI should re-run now. |
Signed-off-by: Panguard AI <support@panguard.ai> Signed-off-by: eeee2345 <imadam4real@gmail.com> Signed-off-by: Adam Lin <adam@agentthreatrule.org>
…subprocess Changes per reviewer comments: 1. Rules loading uses garak's data_path mechanism (L37 feedback) - Moved atr_rules.json -> garak/data/atr/rules.json - Detector loads via from garak.data import path as data_path - Supports XDG user override 2. Removed subprocess.run (L64 feedback) - sync tool uses urllib.request to download zip - No git dependency required 3. Extracted helper methods to tools/ (L85, L171 feedback) - tools/atr.py: sync_rules() + generate_rule() - Writes to XDG data_path by default, or --stdout - Detector is now pure detection logic only 4. Rules loaded in __init__, not module level (L37 feedback) - _load_rules() called per-instance, not on import Signed-off-by: eeee2345 <eeee2345@users.noreply.github.com> Signed-off-by: Adam Lin <adam@agentthreatrule.org>
- tools/atr.py: exit with error if XDG path unavailable (no relative fallback) - tools/atr.py: add sys.stdout.reconfigure for Windows encoding - detectors/atr.py: inline data_path construction, drop module-level constant - detectors/atr.py: move _rules to local scope in __init__ - rules.json: update to ATR v2.0.0 (113 rules, 736 patterns, 9 categories) Signed-off-by: eeee2345 <imadam4real@gmail.com> Signed-off-by: Adam Lin <adam@agentthreatrule.org>
…terns - rules.json: 113 rules / 736 patterns → 293 production rules / 1,597 patterns (21 draft-maturity rules excluded; compact JSON format) - atr.py docstring: corrected rule/pattern counts to match bundle All 32 garak probe modules have bidirectional ATR coverage via metadata_provenance.garak_probe in each rule's YAML. Signed-off-by: Adam Lin <adam@agentthreatrule.org>
Resolves the test_docs.py::test_docs_detectors[atr] failure by adding the required documentation stub and linking it from detectors.rst. Verified locally: 12 atr-specific doc tests pass (848 total in test_docs.py). Signed-off-by: Adam Lin <adam@agentthreatrule.org>
40b43c0 to
e76b676
Compare
|
Rebased against latest main, conflict resolved. docs/source/detectors.rst was deleted upstream in favor of an auto-generated docs/source/index_detectors.rst; moved the ATR detector page to docs/source/detectors/atr.rst to match the new layout and added a corresponding toctree entry. |
Covers PromptInjection, ToolPoisoning, PrivilegeEscalation, ExcessiveAutonomy with parametrized hit / no-hit cases, a None-output guard test, and a smoke test confirming AgentThreats loads all 1,586 patterns from the bundled rules.json. Signed-off-by: Adam Lin <adam@agentthreatrule.org>
|
@jmartin-tech — pushed one more commit (b24aba5): test(detectors): 24 test cases covering PromptInjection, ToolPoisoning, PrivilegeEscalation, and ExcessiveAutonomy, plus a None-output guard and a smoke test confirming AgentThreats loads all 1,586 patterns from the bundled rules.json. Tests follow the apikey detector test layout. All items from both review rounds should now be addressed:
Happy to rebase or adjust anything if there are remaining items. |
|
@jmartin-tech @leondz — bumping for re-review. Each architectural point from your earlier passes has landed on the branch (current HEAD
Use Load rules inside Extract bundle-creation utility to
XDG fallback path in tool (jmartin-tech, 4/9) → applied. On XDG-resolution failure the tool prints the suggested error to stderr and exits with code 1; no relative-path silent fallback. Windows encoding in DCO check is green (probot/dco status). All commits in this PR carry Whenever the queue allows, would appreciate a re-review pass — happy to address any remaining feedback in a follow-up commit. |
|
@jmartin-tech @leondz — quick refresh for whenever a review slot opens up. Since the 5/21 round, upstream ATR shipped v3.1.0 (462 rules across 10 categories). The bundled For related context only: microsoft/PyRIT#1715 (an adjacent red-team framework's ATR scorer integration) merged 2026-05-27. Earlier architectural feedback remains resolved ( |
…egory pattern count Addresses review: replace the dynamic __name__-split + single-use filename constant with literal 'atr'/'rules.json' (matches the data_path pattern in probes/snowball.py); fix the AgentThreats docstring count to 1,597 patterns to match the bundled rules.json. Signed-off-by: eeee2345 <217509886+eeee2345@users.noreply.github.com>
|
Hi @leondz — ready for another look. Since the DCO note I've worked through the round-1 and round-2 feedback: literal data_path components, extracted the tools/ helper, and added 24 detector tests across 4 categories. DCO is green. It's +637 across 6 files, no deletions. Happy to rebase or split it up if that makes review easier. |
|
Hi @jmartin-tech - ready for another look when you have a moment. I've worked through the round-1 and round-2 feedback: extracted the helper methods into tools/, used literal data_path components, removed the subprocess use, and added 24 detector tests across 4 categories. DCO is green. It's +637 across 6 files, no deletions. Happy to rebase or split it up if that makes review easier. |
|
@jmartin-tech @leondz thanks for the thorough review — apologies for the slow turnaround. All the points are addressed in the later commits (the GitHub review predates them, so most inline threads now show as outdated):
Could you take another pass when you have a moment? Happy to adjust anything else. |
Summary
Adds ATR (Agent Threat Rules) detectors for AI agent-specific threats not covered by garak's existing injection/jailbreak detectors. Focuses on MCP tool poisoning, skill compromise, context exfiltration, and excessive autonomy.
What's included
Three files:
garak/detectors/atr.py— 9 detector classesgarak/data/atr/rules.json— 1,597 regex patterns from 293 production rules (bundled, no runtime dependency)garak/tools/atr.py— sync and rule-generation utilities (optional, separate from detector logic)9 detector classes:
atr.AgentThreatsatr.PromptInjectionatr.ToolPoisoningatr.CredentialExfiltrationatr.PrivilegeEscalationatr.SkillCompromiseatr.ExcessiveAutonomyatr.AgentManipulationatr.DataPoisoningWhy this belongs in garak
garak's existing detectors cover LLM-level threats (jailbreaks, known-bad signatures, encoding tricks). ATR covers agent-level threats specific to the MCP/tool-use ecosystem:
These are not redundant with existing garak detectors — they scan a different attack surface.
Usage
Provenance
ATR rules are MIT-licensed, community-driven, and adopted by Cisco AI Defense. The bundled
rules.jsonis a static snapshot (293 production rules, 21 draft-maturity rules excluded).tools/atr.py syncpulls the latest from the ATR repo without waiting for a garak release.Source: https://github.com/Agent-Threat-Rule/agent-threat-rules