detectors: add Agent Threat Rules by eeee2345 · Pull Request #1676 · NVIDIA/garak

eeee2345 · 2026-04-08T17:52:49Z

Summary

Adds ATR (Agent Threat Rules) detectors for AI agent-specific threats not covered by garak's existing injection/jailbreak detectors. Focuses on MCP tool poisoning, skill compromise, context exfiltration, and excessive autonomy.

What's included

Three files:

garak/detectors/atr.py — 9 detector classes
garak/data/atr/rules.json — 1,597 regex patterns from 293 production rules (bundled, no runtime dependency)
garak/tools/atr.py — sync and rule-generation utilities (optional, separate from detector logic)

9 detector classes:

Detector	ATR Category	Catches
`atr.AgentThreats`	all 9	Comprehensive scan
`atr.PromptInjection`	prompt-injection	Instruction overrides, persona hijacking
`atr.ToolPoisoning`	tool-poisoning	Hidden instructions in tool descriptions
`atr.CredentialExfiltration`	context-exfiltration	API keys, private keys, DB credentials
`atr.PrivilegeEscalation`	privilege-escalation	Shell commands, permission escalation
`atr.SkillCompromise`	skill-compromise	Typosquatting, rug pulls, impersonation
`atr.ExcessiveAutonomy`	excessive-autonomy	Retry loops, resource exhaustion
`atr.AgentManipulation`	agent-manipulation	Cross-agent attacks, trust exploitation
`atr.DataPoisoning`	data-poisoning	Poisoned content, injected instructions

Why this belongs in garak

garak's existing detectors cover LLM-level threats (jailbreaks, known-bad signatures, encoding tricks). ATR covers agent-level threats specific to the MCP/tool-use ecosystem:

Tool descriptions with hidden exfiltration instructions
Skill impersonation via typosquatted tool names
Rug pull attacks (tools that change behavior after trust is established)
Cross-agent manipulation in multi-agent systems

These are not redundant with existing garak detectors — they scan a different attack surface.

Usage

# garak config
detectors:
  - atr.PromptInjection
  - atr.ToolPoisoning
  - atr.AgentThreats

# Sync to latest rules (optional)
python tools/atr.py sync

# Generate draft ATR rule from probe hitlog
python tools/atr.py generate --hitlog report/hitlog.jsonl --category prompt-injection

Provenance

ATR rules are MIT-licensed, community-driven, and adopted by Cisco AI Defense. The bundled rules.json is a static snapshot (293 production rules, 21 draft-maturity rules excluded). tools/atr.py sync pulls the latest from the ATR repo without waiting for a garak release.

Source: https://github.com/Agent-Threat-Rule/agent-threat-rules

github-actions · 2026-04-08T17:53:26Z

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

leondz · 2026-04-08T20:01:09Z

please sign dco for review

eeee2345 · 2026-04-08T20:11:08Z

I have read the DCO Document and I hereby sign the DCO

eeee2345 · 2026-04-08T20:11:29Z

recheck

jmartin-tech

Thanks for submission.

As an extra detector to mix into a run this could be useful. The helper methods that have been placed in the detector should likely be extracted as separate tooling.

jmartin-tech · 2026-04-08T20:37:38Z

+        subprocess.run(
+            ["git", "clone", "--depth", "1", "-b", branch,
+             f"https://github.com/{repo}.git", tmpdir],
+            check=True, capture_output=True,
+        )


subprocess.run is not an acceptable method of retrieving data at runtime.

jmartin-tech · 2026-04-08T20:42:50Z

+def generate_rule_from_probe(
+    probe_outputs: list[str],
+    category: str = "prompt-injection",
+    severity: str = "high",
+    min_common_length: int = 8,
+) -> str:
+    """Generate an ATR rule YAML draft from successful Garak probe outputs.
+
+    Takes a list of strings that bypassed defenses (successful attacks)
+    and extracts common substrings as detection patterns. Returns a
+    YAML rule string ready for review and submission to ATR.
+
+    This is a starting point -- generated rules should be reviewed by
+    a human before being added to the ATR ruleset.
+
+    Usage::
+
+        from garak.detectors.atr import generate_rule_from_probe
+        attacks = ["ignore previous instructions and ...", "forget all rules and ..."]
+        rule_yaml = generate_rule_from_probe(attacks, category="prompt-injection")
+        print(rule_yaml)
+    """
+    if not probe_outputs:
+        return ""
+
+    # Extract keywords that appear in 50%+ of outputs
+    word_counts: dict[str, int] = {}
+    for text in probe_outputs:
+        words = set(re.findall(r"\b[a-zA-Z]{4,}\b", text.lower()))
+        for w in words:
+            word_counts[w] = word_counts.get(w, 0) + 1
+
+    threshold = max(2, len(probe_outputs) // 2)
+    common_words = sorted(
+        [w for w, c in word_counts.items() if c >= threshold],
+        key=lambda w: word_counts[w],
+        reverse=True,
+    )[:6]
+
+    if not common_words:
+        return ""
+
+    # Build regex pattern from common words
+    pattern = r"(?i)\b" + r"\b.*\b".join(re.escape(w) for w in common_words[:4]) + r"\b"
+
+    date = datetime.now().strftime("%Y/%m/%d")
+    rule_id = f"ATR-DRAFT-{hash(pattern) % 100000:05d}"
+
+    return f"""title: "Garak-generated: {common_words[0]} pattern"
+id: {rule_id}
+rule_version: 1
+status: draft
+description: >
+  Auto-generated from {len(probe_outputs)} successful Garak probe outputs.
+  Common keywords: {', '.join(common_words[:6])}.
+  REVIEW REQUIRED before adding to production ruleset.
+author: "garak + ATR"
+date: "{date}"
+schema_version: "0.1"
+detection_tier: pattern
+maturity: experimental
+severity: {severity}
+tags:
+  category: {category}
+  subcategory: garak-generated
+  confidence: low
+agent_source:
+  type: mcp_exchange
+  framework: [any]
+  provider: [any]
+detection:
+  conditions:
+    - field: content
+      operator: regex
+      value: '{pattern}'
+      description: "Pattern from {len(probe_outputs)} Garak probe outputs"
+  condition: any
+response:
+  actions: [alert]
+test_cases:
+  true_positives:
+    - input: "{probe_outputs[0][:100].replace(chr(34), chr(39))}"
+      expected: triggered
+"""


This is an interesting utility for creating yaml files to contribute to defensive tooling. It is never called by code in this PR. I would suggest it should be extracted as a utility that could be used to post process either a report.jsonl or hitlog.jsonl. I could see this being placed in the tools path for use only in a repo based install or exposed as an analyze module to be available as a package provided utility shipped as part of installed package similar to how report_digest is exposed.

jmartin-tech · 2026-04-08T20:49:38Z

+def sync_rules_from_github(
+    repo: str = "Agent-Threat-Rule/agent-threat-rules",
+    branch: str = "main",
+    output: Path | None = None,
+) -> int:
+    """Fetch latest ATR rules from GitHub and update the bundled JSON.
+
+    Requires: git, PyYAML (pip install pyyaml).
+    Returns the number of patterns synced.
+
+    Usage::
+
+        from garak.detectors.atr import sync_rules_from_github
+        count = sync_rules_from_github()
+        print(f"Synced {count} patterns")
+    """
+    import yaml  # PyYAML -- optional dependency
+
+    dest = output or _RULES_PATH
+    with tempfile.TemporaryDirectory() as tmpdir:
+        subprocess.run(
+            ["git", "clone", "--depth", "1", "-b", branch,
+             f"https://github.com/{repo}.git", tmpdir],
+            check=True, capture_output=True,
+        )
+        rules_dir = Path(tmpdir) / "rules"
+        if not rules_dir.exists():
+            raise FileNotFoundError(f"No rules/ directory in {repo}")
+
+        result: dict[str, list[list[str]]] = {}
+        for yaml_file in sorted(rules_dir.rglob("*.yaml")):
+            doc = yaml.safe_load(yaml_file.read_text())
+            if not doc or not doc.get("detection", {}).get("conditions"):
+                continue
+            cat = doc.get("tags", {}).get("category", "unknown")
+            if cat not in result:
+                result[cat] = []
+            for cond in doc["detection"]["conditions"]:
+                if cond.get("operator") == "regex" and cond.get("value"):
+                    pat = re.sub(r"^\(\?[imsx]+\)", "", cond["value"])
+                    result[cat].append([doc["id"], doc.get("severity", "medium"), pat])
+
+        dest.write_text(json.dumps(result, indent=2, ensure_ascii=True))
+        total = sum(len(v) for v in result.values())
+        logger.info("ATR sync: %d patterns across %d categories -> %s", total, len(result), dest)
+        return total


I would suggest this also is likely better extracted into the tools path as a separate utility to be executed independently to configuration the user's system. The utility should likely write the generated configuration to the user's XDG based data_path by default or to stdout so the user can place it in the correct location in their XDG_DATA_HOME for the detector to pick it up in place of the shipped version.

jmartin-tech · 2026-04-08T21:05:01Z

+_RULES_PATH = Path(__file__).parent / "atr_rules.json"
+_ALL_RULES: dict[str, list[list[str]]] = {}
+if _RULES_PATH.exists():
+    with open(_RULES_PATH) as f:
+        _ALL_RULES = json.load(f)
+else:
+    logger.warning("ATR rules file not found: %s", _RULES_PATH)


This should use the data_path access pattern see:

garak/garak/probes/snowball.py

Lines 42 to 47 in 2569a60

with open(

data_path / "graph_connectivity.json",

"r",

encoding="utf-8",

) as f:

self.prompts = json.load(f)

from garak.data import path as data_path

This helper class provides access to files in the installed package's data directory and supports user override of the file via the XDG base directory specification so users can provider their own content without needed write permissions to the python runtime library path.

Also it is preferred to load this inside of __init__ for a detector instead of globally on module import. This could be accomplished using a the ABC abstract class patterns. See:

garak/garak/probes/packagehallucination.py

Lines 63 to 102 in 2569a60

class PackageHallucinationProbe(garak.probes.Probe, ABC):

"""Abstract base class for package hallucination probes

Generators sometimes recommend importing non-existent packages into code. These

package names can be found by attackers and then squatted in public package

repositories, so that incorrect code from generators will start to run, silently

loading malicious squatted packages onto the machine. This is bad. This probe

checks whether a model will recommend code that uses non-existent packages."""

lang = "*"

doc_uri = "https://vulcan.io/blog/ai-hallucinations-package-risk"

tags = [

"owasp:llm09",

"owasp:llm02",

"quality:Robustness:GenerativeMisinformation",

"payload:malicious:badcode",

]

goal = "base probe for importing non-existent packages"

DEFAULT_PARAMS = garak.probes.Probe.DEFAULT_PARAMS | {

"follow_prompt_cap": True,

}

@property

@abstractmethod

def language_name(self) -> str:

"""Programming language name - must be overridden by subclasses"""

raise NotImplementedError

def __init__(self, config_root=_config):

super().__init__(config_root=config_root)

self.prompts = []

for stub_prompt in stub_prompts:

for code_task in code_tasks:

self.prompts.append(

stub_prompt.replace("<language>", self.language_name).replace(

"<task>", code_task

)

)

if self.follow_prompt_cap:

self._prune_data(cap=self.soft_probe_prompt_cap)

eeee2345 · 2026-04-08T21:45:24Z

Thanks @jmartin-tech @leondz for the thorough review. Addressed all four points:

data_path: rules moved to garak/data/atr/rules.json, loaded via from garak.data import path as data_path. Supports XDG user override.
No subprocess: removed entirely from the detector. Sync tool now uses urllib.request to download a zip — no git dependency.
Extracted tools: sync_rules() and generate_rule() moved to tools/atr.py. Writes to XDG data_path by default or --stdout. Detector is now pure detection logic only.
Init-time loading: _load_rules() called in __init__, not on module import.

For context on the rule set and methodology — the full spec is at agentthreatrule.org and the academic paper is on Zenodo. Would appreciate any design feedback on how the detector categories map to garak's existing taxonomy — happy to adjust the tagging.

jmartin-tech

A few more adjustments requested.

jmartin-tech · 2026-04-09T13:47:37Z

+        xdg_dir.mkdir(parents=True, exist_ok=True)
+        return xdg_dir / "rules.json"
+    except Exception:
+        return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json"


I am not sure the fallback location for the exception handler makes sense. If the import fails the tool was likely executed from a location other than the repo source. It is also somewhat unexpected for a tool to create something in a relative path like that. I would hazard that support for either XDG path or a user supplied command line location is sufficient and if the XDG path search raises and exception it may best to exit early and suggest the user to supply a valid --output value or utilize the --stdout option.

Suggested change

return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json"

print("The user XDG storage location could not be identified, supply --output or --stdout options or ensure garak is available in the python environment.", file=sys.stderr)

sys.exit(1)

eeee2345 · 2026-04-15T23:27:11Z

@jmartin-tech @leondz — all four review points have been addressed (data_path, no subprocess, extracted tools, init-time loading). Ready for re-review when you have a moment.

Since the original submission, ATR has shipped v2.0.0 with some changes worth noting:

113 rules (up from 108), including 3 rules generated end-to-end by our Threat Cloud crystallization pipeline — the first detection rules produced by automated threat intelligence, not hand-written regex
RFC-001 v1.1: a vendor-neutral quality standard for detection rules with maturity levels, confidence scoring, and review tier definitions. This means every ATR rule ships with machine-readable quality metadata
96,096-skill ecosystem scan discovered 751 active malware from 3 coordinated threat actors — validating these rules against real attacks, not just benchmarks
Compound detection gates: MCP-context rules now require 30%+ condition match, reducing false positives on legitimate documentation

Happy to update the PR to v2.0.0 rules if that's useful. The sync tool already supports pulling latest from npm, so garak users would get rule updates automatically via atr sync.

Also — if there's interest, ATR's Threat Cloud can accept detection signals from garak runs. That means every garak user running ATR detectors would contribute back to the rule pipeline. No PII, just pattern hashes. Happy to discuss if that's in scope.

eeee2345 · 2026-04-15T23:42:25Z

Apologies — my previous comment referenced the round-1 feedback only. I missed that there was a second round of review on 4/10.

This commit addresses all round-2 items:

Fallback path: removed relative fallback, exits with error + guidance
Windows encoding: added sys.stdout.reconfigure
data_path: inlined per suggestion, dropped constant
_rules scope: moved to local in init

Also updated rules.json to ATR v2.0.0 (113 rules, 736 patterns).

eeee2345 · 2026-04-21T20:33:42Z

Two non-architectural changes since `ddef4a58`: updated `rules.json` to current production set, corrected `atr.py` docstring.

Bundle update

	v2.0.0 (before)	v2.0.12 (after)
Rules	113	293 (21 draft excluded)
Patterns	736	1,597

172 production-only rules — no garak probe equivalent

121 of 293 rules derive from garak probe payloads via `metadata_provenance.garak_probe`, covering all 32 probe modules. The remaining 172 come from ATR's scan of live MCP/skill registries — patterns observed in production deployments that don't correspond to any current garak probe class. If useful to the project, these could inform future probe development.

IITW validation

On `inthewild_jailbreak_llms.json` (666 real-world jailbreaks, ATR v2.0.11): 647/666 detected, 97.1% recall. Caveat: the 121 probe-aligned rules account for the majority of that coverage, so this measures probe-rule completeness more than blind generalization. On benign traffic (498 real-world SKILL.md samples), FP rate is 0.20%.

Reproducible: `bash scripts/eval-garak.sh` in the ATR repo.

eeee2345 · 2026-05-02T16:16:58Z

@leondz @jmartin-tech — quick update.

Since the last comment (4/21, v2.0.12 / 293 rules), ATR has shipped to v2.0.17 with 314 production rules. Cisco AI Defense merged the full 314-rule pack on 4/22 (skill-scanner #99), and Microsoft Agent Governance Toolkit followed on 4/26 (#1277, 287 rules + weekly auto-sync workflow).

For this PR — happy to bump rules.json to v2.0.17, or leave at v2.0.12 if you'd prefer to merge first and let users update via the sync tool. Either works.

If anything else is needed to unblock merge, please let me know — all round-2 review items have been addressed in commits since ddef4a5.

Thanks for the two rounds of review.

eeee2345 · 2026-05-09T08:10:05Z

Just a heads-up, not a re-bump — ATR shipped v2.1.0 today with 100% NIST AI RMF mapping (330 rules across 16 RMF subcategories, 1,566 mappings). Mentioning it in case the RMF traceability is useful for downstream garak users running compliance-adjacent evals, or for how this PR's detector categories surface in reports.

Available now as agent-threat-rules@2.1.0 on npm: https://agentthreatrule.org/en/compliance/nist-ai-rmf

Happy to leave the PR pinned at v2.0.12 since the sync tool pulls latest — no commit needed from your side. Will keep an eye out for any further review feedback when you get a moment.

eeee2345 · 2026-05-10T21:38:07Z

@jmartin-tech @leondz — two updates since 5/9, both strengthen the case for ATR detectors in garak.

ATR was accepted into MISP taxonomies on 2026-05-10 (Add agent-threat-rules taxonomy MISP/misp-taxonomies#323) — the threat-intel sharing layer used by global CERTs and ISACs. garak runs that emit ATR rule IDs now tag-resolve as standard MISP machine tags downstream, useful for any operator routing red-team output into an incident-management workflow.
v2.1.1 shipped 2026-05-10 with 6 new rules covering 7 critical CVEs (CVSS 9.1–10.0). Three are directly garak-probe adjacent: SuperAGI output_handler.py eval RCE (CVE-2024-21552), ModelCache torch.load deserialization (CVE-2025-45146), Enclave VM sandbox escape (CVE-2026-27597, CVSS 10.0). garak probes test for these attack classes; ATR provides the deterministic counterpart that runs at endpoint speed.

Still happy to address any open review items whenever the cycle lines up.

eeee2345 · 2026-05-10T21:41:48Z

@jmartin-tech @leondz — quick update on ATR's standardisation footprint, both relevant to garak's downstream operators.

ATR was integrated into MISP at two layers on 2026-05-10, both merged by adulau (MISP project lead):

Taxonomies: Add agent-threat-rules taxonomy MISP/misp-taxonomies#323 — 10 predicates + 330 rule IDs as machine tags
Galaxy + cluster: Add Agent Threat Rules galaxy + cluster (336 rules) MISP/misp-galaxy#1207 — 336 cluster values with kill-chain category, severity, and cve / owasp_llm / mitre_atlas cross-references per cluster

What this means for garak users: red-team runs that emit ATR rule IDs now resolve natively in MISP — taxonomy gives rule-ID labelling, and the galaxy cluster gives the cluster-level context CSIRTs use for incident triage. garak operators routing red-team findings into MISP-compatible SIEM / CSIRT workflows get full enterprise threat-intel shape on every detection without a translation layer.

Still happy to address any open review items on #1676 whenever the cycle lines up.

eeee2345 · 2026-05-16T06:05:40Z

DCO signed. Rebased all 4 commits with Signed-off-by: Adam Lin adam@agentthreatrule.org and force-pushed to feat/atr-detectors. CI should re-run now.

Signed-off-by: Panguard AI <support@panguard.ai> Signed-off-by: eeee2345 <imadam4real@gmail.com> Signed-off-by: Adam Lin <adam@agentthreatrule.org>

…subprocess Changes per reviewer comments: 1. Rules loading uses garak's data_path mechanism (L37 feedback) - Moved atr_rules.json -> garak/data/atr/rules.json - Detector loads via from garak.data import path as data_path - Supports XDG user override 2. Removed subprocess.run (L64 feedback) - sync tool uses urllib.request to download zip - No git dependency required 3. Extracted helper methods to tools/ (L85, L171 feedback) - tools/atr.py: sync_rules() + generate_rule() - Writes to XDG data_path by default, or --stdout - Detector is now pure detection logic only 4. Rules loaded in __init__, not module level (L37 feedback) - _load_rules() called per-instance, not on import Signed-off-by: eeee2345 <eeee2345@users.noreply.github.com> Signed-off-by: Adam Lin <adam@agentthreatrule.org>

- tools/atr.py: exit with error if XDG path unavailable (no relative fallback) - tools/atr.py: add sys.stdout.reconfigure for Windows encoding - detectors/atr.py: inline data_path construction, drop module-level constant - detectors/atr.py: move _rules to local scope in __init__ - rules.json: update to ATR v2.0.0 (113 rules, 736 patterns, 9 categories) Signed-off-by: eeee2345 <imadam4real@gmail.com> Signed-off-by: Adam Lin <adam@agentthreatrule.org>

…terns - rules.json: 113 rules / 736 patterns → 293 production rules / 1,597 patterns (21 draft-maturity rules excluded; compact JSON format) - atr.py docstring: corrected rule/pattern counts to match bundle All 32 garak probe modules have bidirectional ATR coverage via metadata_provenance.garak_probe in each rule's YAML. Signed-off-by: Adam Lin <adam@agentthreatrule.org>

Resolves the test_docs.py::test_docs_detectors[atr] failure by adding the required documentation stub and linking it from detectors.rst. Verified locally: 12 atr-specific doc tests pass (848 total in test_docs.py). Signed-off-by: Adam Lin <adam@agentthreatrule.org>

eeee2345 · 2026-05-16T07:47:47Z

Rebased against latest main, conflict resolved. docs/source/detectors.rst was deleted upstream in favor of an auto-generated docs/source/index_detectors.rst; moved the ATR detector page to docs/source/detectors/atr.rst to match the new layout and added a corresponding toctree entry.

Covers PromptInjection, ToolPoisoning, PrivilegeEscalation, ExcessiveAutonomy with parametrized hit / no-hit cases, a None-output guard test, and a smoke test confirming AgentThreats loads all 1,586 patterns from the bundled rules.json. Signed-off-by: Adam Lin <adam@agentthreatrule.org>

eeee2345 · 2026-05-19T16:58:07Z

@jmartin-tech — pushed one more commit (b24aba5): test(detectors): 24 test cases covering PromptInjection, ToolPoisoning, PrivilegeEscalation, and ExcessiveAutonomy, plus a None-output guard and a smoke test confirming AgentThreats loads all 1,586 patterns from the bundled rules.json. Tests follow the apikey detector test layout.

All items from both review rounds should now be addressed:

subprocess.run removed (urllib.request)
data_path XDG mechanism for rules loading
helper methods extracted to tools/atr.py
_rules loaded inside init, not module level
exception handler + Windows stdout encoding in tools/atr.py
tests added

Happy to rebase or adjust anything if there are remaining items.

eeee2345 · 2026-05-21T05:19:24Z

@jmartin-tech @leondz — bumping for re-review. Each architectural point from your earlier passes has landed on the branch (current HEAD b24aba59). Mapping each review thread to the resolving code:

subprocess.run for runtime data fetch (jmartin-tech, 4/8) → removed. The detector no longer fetches rules at runtime. Rules ship as garak/data/atr/rules.json in the package, loaded via garak.data.path.

Use data_path access pattern (jmartin-tech, 4/8; leondz, 4/8) → done. garak/detectors/atr.py line 22 imports from garak.data import path as data_path, and _load_rules() reads from data_path / __name__.split(".")[-1] / _RULES_FILENAME. The XDG override path ($XDG_DATA_HOME/garak/data/atr/rules.json) is documented in the ATRDetector docstring.

Load rules inside __init__, not at module import (jmartin-tech, 4/8) → done. ATRDetector.__init__ calls _load_rules() and populates self._compiled per-instance. No module-level rule state.

Extract bundle-creation utility to tools/ path (jmartin-tech, 4/8) → done. tools/atr.py is the standalone utility with sync and generate subcommands. The detector contains no bundle-creation code.

_rules should not live outside __init__ scope (jmartin-tech, 4/10) → done. _compiled is per-instance only.

_RULES_FILENAME constant pattern (jmartin-tech, 4/9) → applied with the cleaner suggested form.

XDG fallback path in tool (jmartin-tech, 4/9) → applied. On XDG-resolution failure the tool prints the suggested error to stderr and exits with code 1; no relative-path silent fallback.

Windows encoding in __main__ (jmartin-tech, 4/9) → applied. tools/atr.py line 223: sys.stdout.reconfigure(encoding="utf-8") before main().

DCO check is green (probot/dco status). All commits in this PR carry Signed-off-by: Adam Lin <adam@agentthreatrule.org>.

Whenever the queue allows, would appreciate a re-review pass — happy to address any remaining feedback in a follow-up commit.

eeee2345 · 2026-06-04T21:27:30Z

@jmartin-tech @leondz — quick refresh for whenever a review slot opens up.

Since the 5/21 round, upstream ATR shipped v3.1.0 (462 rules across 10 categories). The bundled garak/data/atr/rules.json in this PR is from the v3.0 era; happy to push a regenerated bundle if you would like the corpus refreshed before merge, otherwise it works as-is on current detectors.

For related context only: microsoft/PyRIT#1715 (an adjacent red-team framework's ATR scorer integration) merged 2026-05-27.

Earlier architectural feedback remains resolved (subprocess.run removed, data_path access pattern, 24 detector tests across 4 categories). Branch HEAD remains b24aba59. Happy to address anything further whenever it works for you.

…egory pattern count Addresses review: replace the dynamic __name__-split + single-use filename constant with literal 'atr'/'rules.json' (matches the data_path pattern in probes/snowball.py); fix the AgentThreats docstring count to 1,597 patterns to match the bundled rules.json. Signed-off-by: eeee2345 <217509886+eeee2345@users.noreply.github.com>

eeee2345 · 2026-06-11T18:43:31Z

Hi @leondz — ready for another look. Since the DCO note I've worked through the round-1 and round-2 feedback: literal data_path components, extracted the tools/ helper, and added 24 detector tests across 4 categories. DCO is green.

It's +637 across 6 files, no deletions. Happy to rebase or split it up if that makes review easier.

eeee2345 · 2026-06-15T20:18:53Z

Hi @jmartin-tech - ready for another look when you have a moment.

I've worked through the round-1 and round-2 feedback: extracted the helper methods into tools/, used literal data_path components, removed the subprocess use, and added 24 detector tests across 4 categories. DCO is green.

It's +637 across 6 files, no deletions. Happy to rebase or split it up if that makes review easier.

eeee2345 · 2026-06-18T06:14:29Z

@jmartin-tech @leondz thanks for the thorough review — apologies for the slow turnaround. All the points are addressed in the later commits (the GitHub review predates them, so most inline threads now show as outdated):

data_path access — the detector now loads rules via garak's data mechanism: from garak.data import path as data_path then data_path / "atr" / "rules.json" (literal components, no custom / on a constant).
subprocess.run at runtime — removed; the detector does no subprocess work, it just reads the bundled garak/data/atr/rules.json.
_rules scope — rules are loaded locally inside __init__ and compiled into self._compiled; there is no module-level _rules dict anymore.
Rule-generation utility — extracted out of the detector into tools/atr.py as a standalone tool.
Windows stdout — sys.stdout.reconfigure(encoding="utf-8") is applied before main() in tools/atr.py.

Could you take another pass when you have a moment? Happy to adjust anything else.

leondz reviewed Apr 8, 2026

View reviewed changes

Comment thread garak/detectors/atr.py Outdated

eeee2345 force-pushed the feat/atr-detectors branch 2 times, most recently from c960ed7 to 90c887d Compare April 8, 2026 20:11

github-actions Bot added a commit that referenced this pull request Apr 8, 2026

@eeee2345 has signed the CLA in #1676

d509d68

jmartin-tech requested changes Apr 8, 2026

View reviewed changes

jmartin-tech reviewed Apr 8, 2026

View reviewed changes

eeee2345 mentioned this pull request Apr 8, 2026

ATR threat detection: 108 rules as probes + detector for AI agent security #1677

Closed

eeee2345 force-pushed the feat/atr-detectors branch from abf1394 to 63f60e8 Compare April 8, 2026 21:49

eeee2345 mentioned this pull request Apr 8, 2026

ATR (Agent Threat Rules) — request to include in AI Security Solutions Landscape GenAI-Security-Project/GenAI-Agent-Security-Initiative#10

Open

leondz changed the title ~~feat: add ATR detectors -- 108 AI agent threat detection rules~~ detectors: add Agent Threat Rules Apr 9, 2026

jmartin-tech requested changes Apr 10, 2026

View reviewed changes

eeee2345 force-pushed the feat/atr-detectors branch from 35d0cd9 to ddef4a5 Compare April 15, 2026 23:44

eeee2345 force-pushed the feat/atr-detectors branch from 5e7d484 to 1fb9709 Compare May 16, 2026 06:05

eeee2345 and others added 4 commits May 16, 2026 15:44

feat: add ATR detectors — 108 AI agent threat rules (714 patterns)

5e255f3

Signed-off-by: Panguard AI <support@panguard.ai> Signed-off-by: eeee2345 <imadam4real@gmail.com> Signed-off-by: Adam Lin <adam@agentthreatrule.org>

eeee2345 force-pushed the feat/atr-detectors branch from 40b43c0 to e76b676 Compare May 16, 2026 07:45

eeee2345 mentioned this pull request Jun 5, 2026

ATR YAML → Augustus probe converter (proposal, ~50 LOC scaffolding) praetorian-inc/augustus#155

Open

	with open(
	data_path / "graph_connectivity.json",
	"r",
	encoding="utf-8",
	) as f:
	self.prompts = json.load(f)

	class PackageHallucinationProbe(garak.probes.Probe, ABC):
	"""Abstract base class for package hallucination probes

	Generators sometimes recommend importing non-existent packages into code. These
	package names can be found by attackers and then squatted in public package
	repositories, so that incorrect code from generators will start to run, silently
	loading malicious squatted packages onto the machine. This is bad. This probe
	checks whether a model will recommend code that uses non-existent packages."""

	lang = "*"
	doc_uri = "https://vulcan.io/blog/ai-hallucinations-package-risk"
	tags = [
	"owasp:llm09",
	"owasp:llm02",
	"quality:Robustness:GenerativeMisinformation",
	"payload:malicious:badcode",
	]
	goal = "base probe for importing non-existent packages"
	DEFAULT_PARAMS = garak.probes.Probe.DEFAULT_PARAMS \| {
	"follow_prompt_cap": True,
	}

	@property
	@abstractmethod
	def language_name(self) -> str:
	"""Programming language name - must be overridden by subclasses"""
	raise NotImplementedError

	def __init__(self, config_root=_config):
	super().__init__(config_root=config_root)
	self.prompts = []
	for stub_prompt in stub_prompts:
	for code_task in code_tasks:
	self.prompts.append(
	stub_prompt.replace("<language>", self.language_name).replace(
	"<task>", code_task
	)
	)
	if self.follow_prompt_cap:
	self._prune_data(cap=self.soft_probe_prompt_cap)

	return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json"
	print("The user XDG storage location could not be identified, supply --output or --stdout options or ensure garak is available in the python environment.", file=sys.stderr)
	sys.exit(1)

Conversation

eeee2345 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Why this belongs in garak

Usage

Provenance

Uh oh!

github-actions Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leondz commented Apr 8, 2026

Uh oh!

Uh oh!

eeee2345 commented Apr 8, 2026

Uh oh!

eeee2345 commented Apr 8, 2026

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeee2345 commented Apr 8, 2026

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eeee2345 commented Apr 15, 2026

Uh oh!

eeee2345 commented Apr 15, 2026

Uh oh!

eeee2345 commented Apr 21, 2026

Uh oh!

eeee2345 commented May 2, 2026

Uh oh!

eeee2345 commented May 9, 2026

Uh oh!

eeee2345 commented May 10, 2026

Uh oh!

eeee2345 commented May 10, 2026

Uh oh!

eeee2345 commented May 16, 2026

Uh oh!

eeee2345 commented May 16, 2026

Uh oh!

eeee2345 commented May 19, 2026

Uh oh!

eeee2345 commented May 21, 2026

Uh oh!

eeee2345 commented Jun 4, 2026

Uh oh!

eeee2345 commented Jun 11, 2026

Uh oh!

eeee2345 commented Jun 15, 2026

Uh oh!

eeee2345 commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

eeee2345 commented Apr 8, 2026 •

edited

Loading

github-actions Bot commented Apr 8, 2026 •

edited

Loading

jmartin-tech Apr 8, 2026 •

edited

Loading