Societal Impact and Responsible Deployment
Summary#
CBRN/cyber controls, content safety, bias monitoring, transparency
Risk Rationale#
Linked ACR Controls#
The following Autonomous Compliance Requirements are assigned to this domain. Each ACR defines a specific, testable control with its own evaluation method, classification, and evidence requirements.
The system SHALL implement guardrails preventing generation of content that could materially enable
The system SHALL implement guardrails preventing generation of content that could materially enable chemical, biological, radiological, or nuclear harm, with effectiveness validated through adversarial testing.
The system SHALL implement controls preventing AI-assisted generation of malware, exploitation code,
The system SHALL implement controls preventing AI-assisted generation of malware, exploitation code, phishing content, or other material that materially enables cyberattacks.
The system operator SHALL maintain a documented dual-use risk assessment identifying potential misus
The system operator SHALL maintain a documented dual-use risk assessment identifying potential misuse pathways, assessed likelihood, and implemented mitigations — reviewed at intervals defined by the certification level.
The system SHALL implement safeguards against generating harmful, illegal, or dangerous content as d
The system SHALL implement safeguards against generating harmful, illegal, or dangerous content as defined in the system's documented content policy, with categories and thresholds appropriate to the deployment context.
The system SHALL enforce output boundaries preventing generation in categories prohibited by the sys
The system SHALL enforce output boundaries preventing generation in categories prohibited by the system's operational scope definition (Domain 1), with boundary enforcement validated through adversarial testing.
The system SHALL implement mechanisms to detect when it is generating ungrounded, fabricated, or con
The system SHALL implement mechanisms to detect when it is generating ungrounded, fabricated, or confabulated content, with configurable thresholds for flagging or suppressing unverified assertions.
The system SHALL implement controls preventing generation of persuasive false content, manipulative
The system SHALL implement controls preventing generation of persuasive false content, manipulative messaging, or social engineering material designed to deceive users or third parties.
For systems capable of generating synthetic media (audio, video, images), the system SHALL implement
For systems capable of generating synthetic media (audio, video, images), the system SHALL implement provenance markers, watermarking, or disclosure mechanisms to identify AI-generated content.
The system SHALL implement controls preventing generation of content that promotes hatred, harassmen
The system SHALL implement controls preventing generation of content that promotes hatred, harassment, or violence targeting individuals or groups based on protected characteristics.
The system SHALL implement ongoing monitoring for discriminatory, biased, or inequitable outputs wit
The system SHALL implement ongoing monitoring for discriminatory, biased, or inequitable outputs with documented detection methods, measurement thresholds, and remediation procedures.
The system operator SHALL maintain a documented fairness evaluation methodology specifying metrics,
The system operator SHALL maintain a documented fairness evaluation methodology specifying metrics, benchmarks, evaluation frequency, and acceptable variance thresholds.
The system operator SHALL conduct and document impact analysis assessing differential outcomes for p
The system operator SHALL conduct and document impact analysis assessing differential outcomes for protected demographic groups, with remediation procedures for identified disparities.
The system operator SHALL maintain documented procedures for addressing identified bias or fairness
The system operator SHALL maintain documented procedures for addressing identified bias or fairness violations, including rollback triggers, retraining protocols, and stakeholder notification.
The system operator SHALL document and disclose known limitations, failure modes, and conditions und
The system operator SHALL document and disclose known limitations, failure modes, and conditions under which the system's outputs should not be relied upon.
The system operator SHALL evaluate and document whether the system is appropriate for the intended d
The system operator SHALL evaluate and document whether the system is appropriate for the intended deployment context, including an assessment of potential harm to vulnerable populations.
The system SHALL implement controls to prevent or limit downstream misuse of outputs, including use
The system SHALL implement controls to prevent or limit downstream misuse of outputs, including use restrictions, output watermarking, and terms governing redistribution.
The system operator SHALL maintain a documented AI risk taxonomy categorizing risks across harmful o
The system operator SHALL maintain a documented AI risk taxonomy categorizing risks across harmful outputs, out-of-scope outputs, hallucinated outputs, and misuse vectors — with severity levels and response procedures for each category.
The system operator SHALL document the computational and energy footprint of the system's training,
The system operator SHALL document the computational and energy footprint of the system's training, inference, and monitoring operations, with periodic review of optimization opportunities.
The system operator SHALL assess and document dependency on single upstream model providers, data so
The system operator SHALL assess and document dependency on single upstream model providers, data sources, or infrastructure providers, with contingency planning for provider disruption.
For systems that generate public-facing content at scale, the system operator SHALL assess potential
For systems that generate public-facing content at scale, the system operator SHALL assess potential impact on information integrity including contribution to misinformation amplification, filter bubbles, or epistemic degradation.
The system SHALL undergo independent third-party evaluation of societal safety controls at intervals
The system SHALL undergo independent third-party evaluation of societal safety controls at intervals defined by the certification level, with results documented and remediation tracked.
The system SHALL undergo adversarial red-team testing specifically targeting societal risk vectors (
The system SHALL undergo adversarial red-team testing specifically targeting societal risk vectors (CBRN, bias, deception, manipulation) with test scenarios, results, and remediation documented.