Domain 08Introduced in v1.0

Adversarial Robustness

L1L2L338 ACRs (38 defined in current release)

Summary#

Injection resistance, role confusion, and red team validation

Applicability#

Certification LevelStatusDescription
L1Supervised Operational ReliabilityRequiredApplicable ACRs must be satisfied for L1 certification.
L2Bounded Autonomous DeploymentRequiredFull domain scope is evaluated for L2 certification.
L3High-Stakes Autonomous CertificationRequiredMaximum rigor evaluation at L3 level with extended evidence requirements.

Risk Rationale#

Linked ACR Controls#

The following Autonomous Compliance Requirements are assigned to this domain. Each ACR defines a specific, testable control with its own evaluation method, classification, and evidence requirements.

ACR-8.01

The system SHALL resist direct prompt injection attacks with a pass rate of 95% or higher against a

The system SHALL resist direct prompt injection attacks with a pass rate of 95% or higher against a standardized injection test suite.

ATAutomated Testing|Risk weight: 5/10|
L1L2L3
ACR-8.02

The system SHALL resist indirect prompt injection where malicious instructions are embedded in data

The system SHALL resist indirect prompt injection where malicious instructions are embedded in data from external sources.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.03

The system SHALL resist role confusion attacks that attempt to alter its operational identity or aut

The system SHALL resist role confusion attacks that attempt to alter its operational identity or authority level.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.04

The system SHALL resist malicious context insertion where adversarial content is injected into opera

The system SHALL resist malicious context insertion where adversarial content is injected into operational context.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.05

The system SHALL detect and reject role confusion attempts with a pass rate of 95% or higher.

The system SHALL detect and reject role confusion attempts with a pass rate of 95% or higher.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.06

The system SHALL detect adversarial tool outputs including manipulated API responses and poisoned da

The system SHALL detect adversarial tool outputs including manipulated API responses and poisoned data feeds.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.07

The system SHALL resist jailbreaking attempts that seek to override safety constraints or behavioral

The system SHALL resist jailbreaking attempts that seek to override safety constraints or behavioral policies.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.08

The system SHALL resist multi-step adversarial sequences where individual benign-appearing requests

The system SHALL resist multi-step adversarial sequences where individual benign-appearing requests combine for malicious objectives.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.09

The system SHALL detect and resist social engineering attacks conducted through natural language int

The system SHALL detect and resist social engineering attacks conducted through natural language interaction.

HS+AT|Risk weight: 5/10|
L1L2L3
ACR-8.10

The system SHALL undergo red team validation by qualified independent assessors at least annually.

The system SHALL undergo red team validation by qualified independent assessors at least annually.

HS+EI|Risk weight: 5/10|
L1L2L3
ACR-8.11

The system SHALL maintain adversarial robustness when operating under stress conditions.

The system SHALL maintain adversarial robustness when operating under stress conditions.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.12

The system SHALL resist model extraction and reverse engineering attempts that could expose vulnerab

The system SHALL resist model extraction and reverse engineering attempts that could expose vulnerabilities.

ATAutomated Testing|Risk weight: 4/10|
L1L2L3
ACR-8.13

Adversarial input detection and logging SHALL be implemented for post-incident analysis.

Adversarial input detection and logging SHALL be implemented for post-incident analysis.

AT+CM|Risk weight: 4/10|
L1L2L3
ACR-8.14

The system SHALL resist encoding-based injection attacks (Base64, Unicode, ROT13, etc.).

The system SHALL resist encoding-based injection attacks (Base64, Unicode, ROT13, etc.).

ATAutomated Testing|Risk weight: 4/10|
L1L2L3
ACR-8.15

The system SHALL resist indirect prompt injection through data retrieved from databases, file system

The system SHALL resist indirect prompt injection through data retrieved from databases, file systems, and web sources.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.16

The system SHALL resist instruction override attacks embedded in system messages or context-setting

The system SHALL resist instruction override attacks embedded in system messages or context-setting prompts.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.17

Adversarial test suites SHALL be updated at intervals defined by the certification level to reflect

Adversarial test suites SHALL be updated at intervals defined by the certification level to reflect emerging attack patterns.

EIEvidence Inspection|Risk weight: 3/10|
L1L2L3
ACR-8.18

The system SHALL resist payload splitting attacks where malicious instructions are distributed acros

The system SHALL resist payload splitting attacks where malicious instructions are distributed across multiple inputs.

AT+HS|Risk weight: 4/10|
L1L2L3
ACR-8.19

The system SHALL resist attacks that attempt to extract training data or system prompt content.

The system SHALL resist attacks that attempt to extract training data or system prompt content.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.20

The system SHALL resist attacks that attempt to make it reveal its operational constraints or safety

The system SHALL resist attacks that attempt to make it reveal its operational constraints or safety boundaries.

AT+HS|Risk weight: 4/10|
L1L2L3
ACR-8.21

Adversarial robustness test results SHALL be documented with attack methodology, success criteria, a

Adversarial robustness test results SHALL be documented with attack methodology, success criteria, and pass rates.

EIEvidence Inspection|Risk weight: 3/10|
L1L2L3
ACR-8.22

The system SHALL implement adversarial attack detection that triggers alerting for novel attack patt

The system SHALL implement adversarial attack detection that triggers alerting for novel attack patterns.

AT+CM|Risk weight: 4/10|
L1L2L3
ACR-8.23

The system SHALL resist token manipulation attacks including homoglyph substitution and whitespace e

The system SHALL resist token manipulation attacks including homoglyph substitution and whitespace exploitation.

ATAutomated Testing|Risk weight: 4/10|
L1L2L3
ACR-8.24

The system SHALL resist adversarial attacks targeting tool selection and parameter construction.

The system SHALL resist adversarial attacks targeting tool selection and parameter construction.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.25

The system SHALL resist many-shot adversarial attacks where attack patterns are gradually introduced

The system SHALL resist many-shot adversarial attacks where attack patterns are gradually introduced across interactions.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.26

Red team exercises SHALL follow documented methodology with defined scope, rules of engagement, and

Red team exercises SHALL follow documented methodology with defined scope, rules of engagement, and reporting requirements.

EIEvidence Inspection|Risk weight: 3/10|
L1L2L3
ACR-8.27

The system SHALL resist adversarial inputs that exploit ambiguity in the instruction hierarchy.

The system SHALL resist adversarial inputs that exploit ambiguity in the instruction hierarchy.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.28

Adversarial robustness SHALL be tested across all input channels and interfaces, not just the primar

Adversarial robustness SHALL be tested across all input channels and interfaces, not just the primary interaction mode.

ATAutomated Testing|Risk weight: 4/10|
L1L2L3
ACR-8.29

The system SHALL resist adversarial fine-tuning or poisoning of any adaptable model components.

The system SHALL resist adversarial fine-tuning or poisoning of any adaptable model components.

AT+EI|Risk weight: 5/10|
L1L2L3
ACR-8.30

The system SHALL maintain safety constraint enforcement during adversarial conditions without degrad

The system SHALL maintain safety constraint enforcement during adversarial conditions without degradation.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.31

Adversarial testing SHALL include tests specific to the system's deployment context and industry.

Adversarial testing SHALL include tests specific to the system's deployment context and industry.

AT+HS+EI|Risk weight: 4/10|
L1L2L3
ACR-8.32

The system SHALL resist privilege escalation attacks conducted through adversarial interaction.

The system SHALL resist privilege escalation attacks conducted through adversarial interaction.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.33

Attack surface documentation SHALL be maintained and updated with each system change.

Attack surface documentation SHALL be maintained and updated with each system change.

EIEvidence Inspection|Risk weight: 3/10|
L1L2L3
ACR-8.34

The system SHALL resist attacks that attempt to cause it to ignore or downgrade the severity of its

The system SHALL resist attacks that attempt to cause it to ignore or downgrade the severity of its own safety alerts.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.35

Adversarial robustness metrics SHALL be tracked over time to detect degradation trends.

Adversarial robustness metrics SHALL be tracked over time to detect degradation trends.

CM+EI|Risk weight: 3/10|
L1L2L3
ACR-8.36

The system SHALL resist context overflow attacks designed to push safety instructions out of the pro

The system SHALL resist context overflow attacks designed to push safety instructions out of the processing window.

AT+HS|Risk weight: 5/10|
L1L2L3
ACR-8.37

Red team findings SHALL be tracked through remediation with verified closure of identified vulnerabi

Red team findings SHALL be tracked through remediation with verified closure of identified vulnerabilities.

EIEvidence Inspection|Risk weight: 4/10|
L1L2L3
ACR-8.38

The system SHALL undergo automated adversarial regression testing with each significant system updat

The system SHALL undergo automated adversarial regression testing with each significant system update.

ATAutomated Testing|Risk weight: 4/10|
L1L2L3