Failure Mode Containment
Summary#
Anomaly detection, safe fallback states, and cascading prevention
Risk Rationale#
Linked ACR Controls#
The following Autonomous Compliance Requirements are assigned to this domain. Each ACR defines a specific, testable control with its own evaluation method, classification, and evidence requirements.
The system SHALL detect outputs that deviate from expected distributions by more than defined thresh
The system SHALL detect outputs that deviate from expected distributions by more than defined thresholds.
The system SHALL detect outputs that violate format constraints or contain impossible values.
The system SHALL detect outputs that violate format constraints or contain impossible values.
The system SHALL escalate critical failures to human operators with sufficient context for informed
The system SHALL escalate critical failures to human operators with sufficient context for informed intervention.
Upon critical failure detection, the system SHALL enter a documented safe fallback state within defi
Upon critical failure detection, the system SHALL enter a documented safe fallback state within defined time bounds.
Safe fallback states SHALL be documented, independently tested, and verified for each failure catego
Safe fallback states SHALL be documented, independently tested, and verified for each failure category.
The system SHALL prevent cascading failure across subsystems through isolation mechanisms.
The system SHALL prevent cascading failure across subsystems through isolation mechanisms.
Circuit breaker mechanisms SHALL be implemented and tested for all inter-subsystem connections.
Circuit breaker mechanisms SHALL be implemented and tested for all inter-subsystem connections.
The system SHALL implement circuit breakers preventing cascading failure to connected subsystems.
The system SHALL implement circuit breakers preventing cascading failure to connected subsystems.
Timeout mechanisms SHALL be implemented for all operations with defined maximum execution durations.
Timeout mechanisms SHALL be implemented for all operations with defined maximum execution durations.
Operation idempotency SHALL be maintained where possible to prevent duplicate actions during retry s
Operation idempotency SHALL be maintained where possible to prevent duplicate actions during retry sequences.
Transaction rollback capabilities SHALL be implemented for operations that support reversal.
Transaction rollback capabilities SHALL be implemented for operations that support reversal.
The system SHALL detect and handle resource exhaustion conditions including memory, compute, storage
The system SHALL detect and handle resource exhaustion conditions including memory, compute, storage, and API rate limits.
A failure taxonomy SHALL be defined and maintained classifying failure modes by severity, impact, an
A failure taxonomy SHALL be defined and maintained classifying failure modes by severity, impact, and required response.
All defined failure modes SHALL be tested through deliberate fault injection to verify containment e
All defined failure modes SHALL be tested through deliberate fault injection to verify containment effectiveness.
Failure detection latency SHALL be measured and SHALL NOT exceed defined maximum detection time boun
Failure detection latency SHALL be measured and SHALL NOT exceed defined maximum detection time bounds.
The system SHALL maintain a failure recovery log documenting each failure event, detection time, res
The system SHALL maintain a failure recovery log documenting each failure event, detection time, response action, and resolution.
Graceful degradation modes SHALL be defined for each subsystem with documented reduced-capability op
Graceful degradation modes SHALL be defined for each subsystem with documented reduced-capability operation.
The system SHALL prevent data corruption during failure and recovery sequences.
The system SHALL prevent data corruption during failure and recovery sequences.
Failure containment boundaries SHALL be independently verifiable by external assessors.
Failure containment boundaries SHALL be independently verifiable by external assessors.
The system SHALL implement automated health checks that detect pre-failure degradation indicators.
The system SHALL implement automated health checks that detect pre-failure degradation indicators.
Recovery procedures SHALL be automated where possible and SHALL NOT require system restart for non-c
Recovery procedures SHALL be automated where possible and SHALL NOT require system restart for non-critical failures.
The system SHALL maintain service to unaffected functions during localized failure containment.
The system SHALL maintain service to unaffected functions during localized failure containment.
Failure simulation tests SHALL be conducted at intervals defined by the certification level.
Failure simulation tests SHALL be conducted at intervals defined by the certification level.
The system SHALL implement dead-letter queues or equivalent mechanisms for failed operations requiri
The system SHALL implement dead-letter queues or equivalent mechanisms for failed operations requiring post-mortem review.
Failure mode testing SHALL include simultaneous multi-fault scenarios at Level 2 and above.
Failure mode testing SHALL include simultaneous multi-fault scenarios at Level 2 and above.
The system SHALL NOT silently drop operations during failure conditions without logging and notifica
The system SHALL NOT silently drop operations during failure conditions without logging and notification.
Failure containment mechanisms SHALL be tested independently from the components they protect.
Failure containment mechanisms SHALL be tested independently from the components they protect.
The system SHALL define and enforce maximum blast radius limits for each failure category.
The system SHALL define and enforce maximum blast radius limits for each failure category.