Skip to content

detectors: add Agent Threat Rules#1676

Open
eeee2345 wants to merge 7 commits into
NVIDIA:mainfrom
eeee2345:feat/atr-detectors
Open

detectors: add Agent Threat Rules#1676
eeee2345 wants to merge 7 commits into
NVIDIA:mainfrom
eeee2345:feat/atr-detectors

Conversation

@eeee2345

@eeee2345 eeee2345 commented Apr 8, 2026

Copy link
Copy Markdown

Summary

Adds ATR (Agent Threat Rules) detectors for AI agent-specific threats not covered by garak's existing injection/jailbreak detectors. Focuses on MCP tool poisoning, skill compromise, context exfiltration, and excessive autonomy.

What's included

Three files:

  • garak/detectors/atr.py — 9 detector classes
  • garak/data/atr/rules.json — 1,597 regex patterns from 293 production rules (bundled, no runtime dependency)
  • garak/tools/atr.py — sync and rule-generation utilities (optional, separate from detector logic)

9 detector classes:

Detector ATR Category Catches
atr.AgentThreats all 9 Comprehensive scan
atr.PromptInjection prompt-injection Instruction overrides, persona hijacking
atr.ToolPoisoning tool-poisoning Hidden instructions in tool descriptions
atr.CredentialExfiltration context-exfiltration API keys, private keys, DB credentials
atr.PrivilegeEscalation privilege-escalation Shell commands, permission escalation
atr.SkillCompromise skill-compromise Typosquatting, rug pulls, impersonation
atr.ExcessiveAutonomy excessive-autonomy Retry loops, resource exhaustion
atr.AgentManipulation agent-manipulation Cross-agent attacks, trust exploitation
atr.DataPoisoning data-poisoning Poisoned content, injected instructions

Why this belongs in garak

garak's existing detectors cover LLM-level threats (jailbreaks, known-bad signatures, encoding tricks). ATR covers agent-level threats specific to the MCP/tool-use ecosystem:

  • Tool descriptions with hidden exfiltration instructions
  • Skill impersonation via typosquatted tool names
  • Rug pull attacks (tools that change behavior after trust is established)
  • Cross-agent manipulation in multi-agent systems

These are not redundant with existing garak detectors — they scan a different attack surface.

Usage

# garak config
detectors:
  - atr.PromptInjection
  - atr.ToolPoisoning
  - atr.AgentThreats
# Sync to latest rules (optional)
python tools/atr.py sync

# Generate draft ATR rule from probe hitlog
python tools/atr.py generate --hitlog report/hitlog.jsonl --category prompt-injection

Provenance

ATR rules are MIT-licensed, community-driven, and adopted by Cisco AI Defense. The bundled rules.json is a static snapshot (293 production rules, 21 draft-maturity rules excluded). tools/atr.py sync pulls the latest from the ATR repo without waiting for a garak release.

Source: https://github.com/Agent-Threat-Rule/agent-threat-rules

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

@leondz

leondz commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

please sign dco for review

Comment thread garak/detectors/atr.py Outdated
@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch 2 times, most recently from c960ed7 to 90c887d Compare April 8, 2026 20:11
@eeee2345

eeee2345 commented Apr 8, 2026

Copy link
Copy Markdown
Author

I have read the DCO Document and I hereby sign the DCO

github-actions Bot added a commit that referenced this pull request Apr 8, 2026
@eeee2345

eeee2345 commented Apr 8, 2026

Copy link
Copy Markdown
Author

recheck

@jmartin-tech jmartin-tech left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submission.

As an extra detector to mix into a run this could be useful. The helper methods that have been placed in the detector should likely be extracted as separate tooling.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +60 to +64
subprocess.run(
["git", "clone", "--depth", "1", "-b", branch,
f"https://github.com/{repo}.git", tmpdir],
check=True, capture_output=True,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subprocess.run is not an acceptable method of retrieving data at runtime.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +88 to +171
def generate_rule_from_probe(
probe_outputs: list[str],
category: str = "prompt-injection",
severity: str = "high",
min_common_length: int = 8,
) -> str:
"""Generate an ATR rule YAML draft from successful Garak probe outputs.

Takes a list of strings that bypassed defenses (successful attacks)
and extracts common substrings as detection patterns. Returns a
YAML rule string ready for review and submission to ATR.

This is a starting point -- generated rules should be reviewed by
a human before being added to the ATR ruleset.

Usage::

from garak.detectors.atr import generate_rule_from_probe
attacks = ["ignore previous instructions and ...", "forget all rules and ..."]
rule_yaml = generate_rule_from_probe(attacks, category="prompt-injection")
print(rule_yaml)
"""
if not probe_outputs:
return ""

# Extract keywords that appear in 50%+ of outputs
word_counts: dict[str, int] = {}
for text in probe_outputs:
words = set(re.findall(r"\b[a-zA-Z]{4,}\b", text.lower()))
for w in words:
word_counts[w] = word_counts.get(w, 0) + 1

threshold = max(2, len(probe_outputs) // 2)
common_words = sorted(
[w for w, c in word_counts.items() if c >= threshold],
key=lambda w: word_counts[w],
reverse=True,
)[:6]

if not common_words:
return ""

# Build regex pattern from common words
pattern = r"(?i)\b" + r"\b.*\b".join(re.escape(w) for w in common_words[:4]) + r"\b"

date = datetime.now().strftime("%Y/%m/%d")
rule_id = f"ATR-DRAFT-{hash(pattern) % 100000:05d}"

return f"""title: "Garak-generated: {common_words[0]} pattern"
id: {rule_id}
rule_version: 1
status: draft
description: >
Auto-generated from {len(probe_outputs)} successful Garak probe outputs.
Common keywords: {', '.join(common_words[:6])}.
REVIEW REQUIRED before adding to production ruleset.
author: "garak + ATR"
date: "{date}"
schema_version: "0.1"
detection_tier: pattern
maturity: experimental
severity: {severity}
tags:
category: {category}
subcategory: garak-generated
confidence: low
agent_source:
type: mcp_exchange
framework: [any]
provider: [any]
detection:
conditions:
- field: content
operator: regex
value: '{pattern}'
description: "Pattern from {len(probe_outputs)} Garak probe outputs"
condition: any
response:
actions: [alert]
test_cases:
true_positives:
- input: "{probe_outputs[0][:100].replace(chr(34), chr(39))}"
expected: triggered
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting utility for creating yaml files to contribute to defensive tooling. It is never called by code in this PR. I would suggest it should be extracted as a utility that could be used to post process either a report.jsonl or hitlog.jsonl. I could see this being placed in the tools path for use only in a repo based install or exposed as an analyze module to be available as a package provided utility shipped as part of installed package similar to how report_digest is exposed.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +40 to +85
def sync_rules_from_github(
repo: str = "Agent-Threat-Rule/agent-threat-rules",
branch: str = "main",
output: Path | None = None,
) -> int:
"""Fetch latest ATR rules from GitHub and update the bundled JSON.

Requires: git, PyYAML (pip install pyyaml).
Returns the number of patterns synced.

Usage::

from garak.detectors.atr import sync_rules_from_github
count = sync_rules_from_github()
print(f"Synced {count} patterns")
"""
import yaml # PyYAML -- optional dependency

dest = output or _RULES_PATH
with tempfile.TemporaryDirectory() as tmpdir:
subprocess.run(
["git", "clone", "--depth", "1", "-b", branch,
f"https://github.com/{repo}.git", tmpdir],
check=True, capture_output=True,
)
rules_dir = Path(tmpdir) / "rules"
if not rules_dir.exists():
raise FileNotFoundError(f"No rules/ directory in {repo}")

result: dict[str, list[list[str]]] = {}
for yaml_file in sorted(rules_dir.rglob("*.yaml")):
doc = yaml.safe_load(yaml_file.read_text())
if not doc or not doc.get("detection", {}).get("conditions"):
continue
cat = doc.get("tags", {}).get("category", "unknown")
if cat not in result:
result[cat] = []
for cond in doc["detection"]["conditions"]:
if cond.get("operator") == "regex" and cond.get("value"):
pat = re.sub(r"^\(\?[imsx]+\)", "", cond["value"])
result[cat].append([doc["id"], doc.get("severity", "medium"), pat])

dest.write_text(json.dumps(result, indent=2, ensure_ascii=True))
total = sum(len(v) for v in result.values())
logger.info("ATR sync: %d patterns across %d categories -> %s", total, len(result), dest)
return total

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest this also is likely better extracted into the tools path as a separate utility to be executed independently to configuration the user's system. The utility should likely write the generated configuration to the user's XDG based data_path by default or to stdout so the user can place it in the correct location in their XDG_DATA_HOME for the detector to pick it up in place of the shipped version.

Comment thread garak/detectors/atr.py Outdated
Comment on lines +31 to +37
_RULES_PATH = Path(__file__).parent / "atr_rules.json"
_ALL_RULES: dict[str, list[list[str]]] = {}
if _RULES_PATH.exists():
with open(_RULES_PATH) as f:
_ALL_RULES = json.load(f)
else:
logger.warning("ATR rules file not found: %s", _RULES_PATH)

@jmartin-tech jmartin-tech Apr 8, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the data_path access pattern see:

with open(
data_path / "graph_connectivity.json",
"r",
encoding="utf-8",
) as f:
self.prompts = json.load(f)

from garak.data import path as data_path

This helper class provides access to files in the installed package's data directory and supports user override of the file via the XDG base directory specification so users can provider their own content without needed write permissions to the python runtime library path.

Also it is preferred to load this inside of __init__ for a detector instead of globally on module import. This could be accomplished using a the ABC abstract class patterns. See:

class PackageHallucinationProbe(garak.probes.Probe, ABC):
"""Abstract base class for package hallucination probes
Generators sometimes recommend importing non-existent packages into code. These
package names can be found by attackers and then squatted in public package
repositories, so that incorrect code from generators will start to run, silently
loading malicious squatted packages onto the machine. This is bad. This probe
checks whether a model will recommend code that uses non-existent packages."""
lang = "*"
doc_uri = "https://vulcan.io/blog/ai-hallucinations-package-risk"
tags = [
"owasp:llm09",
"owasp:llm02",
"quality:Robustness:GenerativeMisinformation",
"payload:malicious:badcode",
]
goal = "base probe for importing non-existent packages"
DEFAULT_PARAMS = garak.probes.Probe.DEFAULT_PARAMS | {
"follow_prompt_cap": True,
}
@property
@abstractmethod
def language_name(self) -> str:
"""Programming language name - must be overridden by subclasses"""
raise NotImplementedError
def __init__(self, config_root=_config):
super().__init__(config_root=config_root)
self.prompts = []
for stub_prompt in stub_prompts:
for code_task in code_tasks:
self.prompts.append(
stub_prompt.replace("<language>", self.language_name).replace(
"<task>", code_task
)
)
if self.follow_prompt_cap:
self._prune_data(cap=self.soft_probe_prompt_cap)

@eeee2345

eeee2345 commented Apr 8, 2026

Copy link
Copy Markdown
Author

Thanks @jmartin-tech @leondz for the thorough review. Addressed all four points:

  1. data_path: rules moved to garak/data/atr/rules.json, loaded via from garak.data import path as data_path. Supports XDG user override.

  2. No subprocess: removed entirely from the detector. Sync tool now uses urllib.request to download a zip — no git dependency.

  3. Extracted tools: sync_rules() and generate_rule() moved to tools/atr.py. Writes to XDG data_path by default or --stdout. Detector is now pure detection logic only.

  4. Init-time loading: _load_rules() called in __init__, not on module import.

For context on the rule set and methodology — the full spec is at agentthreatrule.org and the academic paper is on Zenodo. Would appreciate any design feedback on how the detector categories map to garak's existing taxonomy — happy to adjust the tagging.

@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from abf1394 to 63f60e8 Compare April 8, 2026 21:49
@leondz leondz changed the title feat: add ATR detectors -- 108 AI agent threat detection rules detectors: add Agent Threat Rules Apr 9, 2026

@jmartin-tech jmartin-tech left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more adjustments requested.

Comment thread tools/atr.py Outdated
xdg_dir.mkdir(parents=True, exist_ok=True)
return xdg_dir / "rules.json"
except Exception:
return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the fallback location for the exception handler makes sense. If the import fails the tool was likely executed from a location other than the repo source. It is also somewhat unexpected for a tool to create something in a relative path like that. I would hazard that support for either XDG path or a user supplied command line location is sufficient and if the XDG path search raises and exception it may best to exit early and suggest the user to supply a valid --output value or utilize the --stdout option.

Suggested change
return Path(__file__).parent.parent / "garak" / "data" / "atr" / "rules.json"
print("The user XDG storage location could not be identified, supply --output or --stdout options or ensure garak is available in the python environment.", file=sys.stderr)
sys.exit(1)

Comment thread tools/atr.py
Comment thread garak/detectors/atr.py Outdated
Comment thread garak/detectors/atr.py Outdated
@eeee2345

Copy link
Copy Markdown
Author

@jmartin-tech @leondz — all four review points have been addressed (data_path, no subprocess, extracted tools, init-time loading). Ready for re-review when you have a moment.

Since the original submission, ATR has shipped v2.0.0 with some changes worth noting:

  • 113 rules (up from 108), including 3 rules generated end-to-end by our Threat Cloud crystallization pipeline — the first detection rules produced by automated threat intelligence, not hand-written regex
  • RFC-001 v1.1: a vendor-neutral quality standard for detection rules with maturity levels, confidence scoring, and review tier definitions. This means every ATR rule ships with machine-readable quality metadata
  • 96,096-skill ecosystem scan discovered 751 active malware from 3 coordinated threat actors — validating these rules against real attacks, not just benchmarks
  • Compound detection gates: MCP-context rules now require 30%+ condition match, reducing false positives on legitimate documentation

Happy to update the PR to v2.0.0 rules if that's useful. The sync tool already supports pulling latest from npm, so garak users would get rule updates automatically via atr sync.

Also — if there's interest, ATR's Threat Cloud can accept detection signals from garak runs. That means every garak user running ATR detectors would contribute back to the rule pipeline. No PII, just pattern hashes. Happy to discuss if that's in scope.

@eeee2345

Copy link
Copy Markdown
Author

Apologies — my previous comment referenced the round-1 feedback only. I missed that there was a second round of review on 4/10.

This commit addresses all round-2 items:

  1. Fallback path: removed relative fallback, exits with error + guidance
  2. Windows encoding: added sys.stdout.reconfigure
  3. data_path: inlined per suggestion, dropped constant
  4. _rules scope: moved to local in init

Also updated rules.json to ATR v2.0.0 (113 rules, 736 patterns).

@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from 35d0cd9 to ddef4a5 Compare April 15, 2026 23:44
@eeee2345

Copy link
Copy Markdown
Author

Two non-architectural changes since `ddef4a58`: updated `rules.json` to current production set, corrected `atr.py` docstring.

Bundle update

v2.0.0 (before) v2.0.12 (after)
Rules 113 293 (21 draft excluded)
Patterns 736 1,597

172 production-only rules — no garak probe equivalent

121 of 293 rules derive from garak probe payloads via `metadata_provenance.garak_probe`, covering all 32 probe modules. The remaining 172 come from ATR's scan of live MCP/skill registries — patterns observed in production deployments that don't correspond to any current garak probe class. If useful to the project, these could inform future probe development.

IITW validation

On `inthewild_jailbreak_llms.json` (666 real-world jailbreaks, ATR v2.0.11): 647/666 detected, 97.1% recall. Caveat: the 121 probe-aligned rules account for the majority of that coverage, so this measures probe-rule completeness more than blind generalization. On benign traffic (498 real-world SKILL.md samples), FP rate is 0.20%.

Reproducible: `bash scripts/eval-garak.sh` in the ATR repo.

@eeee2345

eeee2345 commented May 2, 2026

Copy link
Copy Markdown
Author

@leondz @jmartin-tech — quick update.

Since the last comment (4/21, v2.0.12 / 293 rules), ATR has shipped to v2.0.17 with 314 production rules. Cisco AI Defense merged the full 314-rule pack on 4/22 (skill-scanner #99), and Microsoft Agent Governance Toolkit followed on 4/26 (#1277, 287 rules + weekly auto-sync workflow).

For this PR — happy to bump rules.json to v2.0.17, or leave at v2.0.12 if you'd prefer to merge first and let users update via the sync tool. Either works.

If anything else is needed to unblock merge, please let me know — all round-2 review items have been addressed in commits since ddef4a5.

Thanks for the two rounds of review.

@eeee2345

eeee2345 commented May 9, 2026

Copy link
Copy Markdown
Author

Just a heads-up, not a re-bump — ATR shipped v2.1.0 today with 100% NIST AI RMF mapping (330 rules across 16 RMF subcategories, 1,566 mappings). Mentioning it in case the RMF traceability is useful for downstream garak users running compliance-adjacent evals, or for how this PR's detector categories surface in reports.

Available now as agent-threat-rules@2.1.0 on npm: https://agentthreatrule.org/en/compliance/nist-ai-rmf

Happy to leave the PR pinned at v2.0.12 since the sync tool pulls latest — no commit needed from your side. Will keep an eye out for any further review feedback when you get a moment.

@eeee2345

Copy link
Copy Markdown
Author

@jmartin-tech @leondz — two updates since 5/9, both strengthen the case for ATR detectors in garak.

  1. ATR was accepted into MISP taxonomies on 2026-05-10 (Add agent-threat-rules taxonomy MISP/misp-taxonomies#323) — the threat-intel sharing layer used by global CERTs and ISACs. garak runs that emit ATR rule IDs now tag-resolve as standard MISP machine tags downstream, useful for any operator routing red-team output into an incident-management workflow.

  2. v2.1.1 shipped 2026-05-10 with 6 new rules covering 7 critical CVEs (CVSS 9.1–10.0). Three are directly garak-probe adjacent: SuperAGI output_handler.py eval RCE (CVE-2024-21552), ModelCache torch.load deserialization (CVE-2025-45146), Enclave VM sandbox escape (CVE-2026-27597, CVSS 10.0). garak probes test for these attack classes; ATR provides the deterministic counterpart that runs at endpoint speed.

Still happy to address any open review items whenever the cycle lines up.

@eeee2345

Copy link
Copy Markdown
Author

@jmartin-tech @leondz — quick update on ATR's standardisation footprint, both relevant to garak's downstream operators.

ATR was integrated into MISP at two layers on 2026-05-10, both merged by adulau (MISP project lead):

What this means for garak users: red-team runs that emit ATR rule IDs now resolve natively in MISP — taxonomy gives rule-ID labelling, and the galaxy cluster gives the cluster-level context CSIRTs use for incident triage. garak operators routing red-team findings into MISP-compatible SIEM / CSIRT workflows get full enterprise threat-intel shape on every detection without a translation layer.

Still happy to address any open review items on #1676 whenever the cycle lines up.

@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from 5e7d484 to 1fb9709 Compare May 16, 2026 06:05
@eeee2345

Copy link
Copy Markdown
Author

DCO signed. Rebased all 4 commits with Signed-off-by: Adam Lin adam@agentthreatrule.org and force-pushed to feat/atr-detectors. CI should re-run now.

eeee2345 and others added 4 commits May 16, 2026 15:44
Signed-off-by: Panguard AI <support@panguard.ai>
Signed-off-by: eeee2345 <imadam4real@gmail.com>
Signed-off-by: Adam Lin <adam@agentthreatrule.org>
…subprocess

Changes per reviewer comments:
1. Rules loading uses garak's data_path mechanism (L37 feedback)
   - Moved atr_rules.json -> garak/data/atr/rules.json
   - Detector loads via from garak.data import path as data_path
   - Supports XDG user override
2. Removed subprocess.run (L64 feedback)
   - sync tool uses urllib.request to download zip
   - No git dependency required
3. Extracted helper methods to tools/ (L85, L171 feedback)
   - tools/atr.py: sync_rules() + generate_rule()
   - Writes to XDG data_path by default, or --stdout
   - Detector is now pure detection logic only
4. Rules loaded in __init__, not module level (L37 feedback)
   - _load_rules() called per-instance, not on import

Signed-off-by: eeee2345 <eeee2345@users.noreply.github.com>
Signed-off-by: Adam Lin <adam@agentthreatrule.org>
- tools/atr.py: exit with error if XDG path unavailable (no relative fallback)
- tools/atr.py: add sys.stdout.reconfigure for Windows encoding
- detectors/atr.py: inline data_path construction, drop module-level constant
- detectors/atr.py: move _rules to local scope in __init__
- rules.json: update to ATR v2.0.0 (113 rules, 736 patterns, 9 categories)

Signed-off-by: eeee2345 <imadam4real@gmail.com>
Signed-off-by: Adam Lin <adam@agentthreatrule.org>
…terns

- rules.json: 113 rules / 736 patterns → 293 production rules / 1,597 patterns
  (21 draft-maturity rules excluded; compact JSON format)
- atr.py docstring: corrected rule/pattern counts to match bundle

All 32 garak probe modules have bidirectional ATR coverage via
metadata_provenance.garak_probe in each rule's YAML.

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
Resolves the test_docs.py::test_docs_detectors[atr] failure by adding the
required documentation stub and linking it from detectors.rst.

Verified locally: 12 atr-specific doc tests pass (848 total in test_docs.py).

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
@eeee2345 eeee2345 force-pushed the feat/atr-detectors branch from 40b43c0 to e76b676 Compare May 16, 2026 07:45
@eeee2345

Copy link
Copy Markdown
Author

Rebased against latest main, conflict resolved. docs/source/detectors.rst was deleted upstream in favor of an auto-generated docs/source/index_detectors.rst; moved the ATR detector page to docs/source/detectors/atr.rst to match the new layout and added a corresponding toctree entry.

Covers PromptInjection, ToolPoisoning, PrivilegeEscalation, ExcessiveAutonomy
with parametrized hit / no-hit cases, a None-output guard test, and a smoke
test confirming AgentThreats loads all 1,586 patterns from the bundled
rules.json.

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
@eeee2345

Copy link
Copy Markdown
Author

@jmartin-tech — pushed one more commit (b24aba5): test(detectors): 24 test cases covering PromptInjection, ToolPoisoning, PrivilegeEscalation, and ExcessiveAutonomy, plus a None-output guard and a smoke test confirming AgentThreats loads all 1,586 patterns from the bundled rules.json. Tests follow the apikey detector test layout.

All items from both review rounds should now be addressed:

  • subprocess.run removed (urllib.request)
  • data_path XDG mechanism for rules loading
  • helper methods extracted to tools/atr.py
  • _rules loaded inside init, not module level
  • exception handler + Windows stdout encoding in tools/atr.py
  • tests added

Happy to rebase or adjust anything if there are remaining items.

@eeee2345

Copy link
Copy Markdown
Author

@jmartin-tech @leondz — bumping for re-review. Each architectural point from your earlier passes has landed on the branch (current HEAD b24aba59). Mapping each review thread to the resolving code:

subprocess.run for runtime data fetch (jmartin-tech, 4/8) → removed. The detector no longer fetches rules at runtime. Rules ship as garak/data/atr/rules.json in the package, loaded via garak.data.path.

Use data_path access pattern (jmartin-tech, 4/8; leondz, 4/8) → done. garak/detectors/atr.py line 22 imports from garak.data import path as data_path, and _load_rules() reads from data_path / __name__.split(".")[-1] / _RULES_FILENAME. The XDG override path ($XDG_DATA_HOME/garak/data/atr/rules.json) is documented in the ATRDetector docstring.

Load rules inside __init__, not at module import (jmartin-tech, 4/8) → done. ATRDetector.__init__ calls _load_rules() and populates self._compiled per-instance. No module-level rule state.

Extract bundle-creation utility to tools/ path (jmartin-tech, 4/8) → done. tools/atr.py is the standalone utility with sync and generate subcommands. The detector contains no bundle-creation code.

_rules should not live outside __init__ scope (jmartin-tech, 4/10) → done. _compiled is per-instance only.

_RULES_FILENAME constant pattern (jmartin-tech, 4/9) → applied with the cleaner suggested form.

XDG fallback path in tool (jmartin-tech, 4/9) → applied. On XDG-resolution failure the tool prints the suggested error to stderr and exits with code 1; no relative-path silent fallback.

Windows encoding in __main__ (jmartin-tech, 4/9) → applied. tools/atr.py line 223: sys.stdout.reconfigure(encoding="utf-8") before main().

DCO check is green (probot/dco status). All commits in this PR carry Signed-off-by: Adam Lin <adam@agentthreatrule.org>.

Whenever the queue allows, would appreciate a re-review pass — happy to address any remaining feedback in a follow-up commit.

@eeee2345

eeee2345 commented Jun 4, 2026

Copy link
Copy Markdown
Author

@jmartin-tech @leondz — quick refresh for whenever a review slot opens up.

Since the 5/21 round, upstream ATR shipped v3.1.0 (462 rules across 10 categories). The bundled garak/data/atr/rules.json in this PR is from the v3.0 era; happy to push a regenerated bundle if you would like the corpus refreshed before merge, otherwise it works as-is on current detectors.

For related context only: microsoft/PyRIT#1715 (an adjacent red-team framework's ATR scorer integration) merged 2026-05-27.

Earlier architectural feedback remains resolved (subprocess.run removed, data_path access pattern, 24 detector tests across 4 categories). Branch HEAD remains b24aba59. Happy to address anything further whenever it works for you.

…egory pattern count

Addresses review: replace the dynamic __name__-split + single-use filename constant with literal 'atr'/'rules.json' (matches the data_path pattern in probes/snowball.py); fix the AgentThreats docstring count to 1,597 patterns to match the bundled rules.json.

Signed-off-by: eeee2345 <217509886+eeee2345@users.noreply.github.com>
@eeee2345

Copy link
Copy Markdown
Author

Hi @leondz — ready for another look. Since the DCO note I've worked through the round-1 and round-2 feedback: literal data_path components, extracted the tools/ helper, and added 24 detector tests across 4 categories. DCO is green.

It's +637 across 6 files, no deletions. Happy to rebase or split it up if that makes review easier.

@eeee2345

Copy link
Copy Markdown
Author

Hi @jmartin-tech - ready for another look when you have a moment.

I've worked through the round-1 and round-2 feedback: extracted the helper methods into tools/, used literal data_path components, removed the subprocess use, and added 24 detector tests across 4 categories. DCO is green.

It's +637 across 6 files, no deletions. Happy to rebase or split it up if that makes review easier.

@eeee2345

Copy link
Copy Markdown
Author

@jmartin-tech @leondz thanks for the thorough review — apologies for the slow turnaround. All the points are addressed in the later commits (the GitHub review predates them, so most inline threads now show as outdated):

  • data_path access — the detector now loads rules via garak's data mechanism: from garak.data import path as data_path then data_path / "atr" / "rules.json" (literal components, no custom / on a constant).
  • subprocess.run at runtime — removed; the detector does no subprocess work, it just reads the bundled garak/data/atr/rules.json.
  • _rules scope — rules are loaded locally inside __init__ and compiled into self._compiled; there is no module-level _rules dict anymore.
  • Rule-generation utility — extracted out of the detector into tools/atr.py as a standalone tool.
  • Windows stdoutsys.stdout.reconfigure(encoding="utf-8") is applied before main() in tools/atr.py.

Could you take another pass when you have a moment? Happy to adjust anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants