Certification Lifecycle
The ARA certification lifecycle defines the end-to-end process for achieving and maintaining certification. It consists of 10 phases spanning from initial intake through ongoing compliance monitoring. Each phase produces specific outputs that feed into subsequent phases, creating a structured and auditable certification pathway.
Phase Overview
- 1Intake Assessment1-2 weeks
- 2Documentation Review2-4 weeks
- 3Automated Testing2-6 weeks
- 4Human Simulation Testing2-4 weeks
- 5Evidence Inspection1-3 weeks
- 6Continuous Monitoring Validation4-12 weeks
- 7Adversarial Evaluation2-8 weeks
- 8Certification Decision1-2 weeks
- 9Post-Certification Onboarding2-4 weeks
- 10Ongoing Compliance MonitoringContinuous
Intake Assessment#
Typical duration: 1-2 weeks
The certifying organization submits a formal intake request to an Authorized Verification Body (AVB). The AVB conducts a preliminary assessment of the system to determine scope, applicable certification level, and evaluation feasibility. This phase identifies whether the system is a candidate for ARA certification and which domains apply.
Key Activities
- •Organization submits system description, deployment context, and requested certification level
- •AVB reviews system architecture and operational scope documentation
- •AVB determines applicable domains based on system category (Agent, Multi-Agent, Physical, Hybrid)
- •Preliminary gap analysis identifies potential areas of non-compliance
- •Engagement agreement is formalized with scope, timeline, and fee structure
Phase Outputs
- →Intake assessment report with feasibility determination
- →Applicable domain and ACR mapping
- →Preliminary evaluation plan and timeline
- →Signed engagement agreement
Documentation Review#
Typical duration: 2-4 weeks
The AVB conducts a comprehensive review of the organization's technical and governance documentation. This phase evaluates the completeness and adequacy of evidence artifacts required by applicable ACRs before proceeding to active testing phases.
Key Activities
- •Review of system architecture documentation and operational boundary declarations
- •Assessment of governance framework documents, change management procedures, and incident response plans
- •Verification of audit trail schemas, monitoring configurations, and telemetry pipeline specifications
- •Review of credential lifecycle management, identity isolation, and permission boundary documentation
- •Gap identification for missing or insufficient documentation
Phase Outputs
- →Documentation review report with completeness assessment
- →Evidence gap register identifying required supplementary documentation
- →Readiness determination for active evaluation phases
Automated Testing#
Typical duration: 2-6 weeks
ACRs designated with the Automated Testing (AT) evaluation method are assessed through structured test execution. The AVB either executes standardized test suites or reviews the organization's test results against defined acceptance criteria. This phase covers the majority of technical controls across all applicable domains.
Key Activities
- •Execution of operational boundary enforcement tests
- •Privilege escalation and identity isolation testing
- •Graceful degradation and failure blast radius containment verification
- •Prompt injection resistance and adversarial input testing
- •Behavioral consistency testing under sustained load and temporal pressure
- •Drift detection regression testing and data integrity verification
- •API response validation and cross-system data flow integrity testing
Phase Outputs
- →Automated test execution report with pass/fail results for each AT-designated ACR
- →Test coverage analysis mapping test cases to ACR requirements
- →Deficiency notices for any failed controls
Human Simulation Testing#
Typical duration: 2-4 weeks
ACRs designated with the Human Simulation (HS) evaluation method are assessed through structured scenarios conducted by qualified human evaluators. These scenarios simulate realistic operational conditions including adversarial interactions, failure events, and edge cases that cannot be adequately assessed through automated testing alone.
Key Activities
- •Value alignment constraint enforcement testing through adversarial scenario simulation
- •Human override activation testing under nominal, degraded, and failure conditions
- •Adversarial input behavioral robustness evaluation across input channels
- •Safe state recovery verification following simulated failure events
- •Contested decision arbitration protocol evaluation
- •Emergency stop mechanism testing for physical systems (L3)
- •Multi-agent permission boundary bypass scenario evaluation (L2/L3)
Phase Outputs
- →Human simulation test report with scenario descriptions and outcomes
- →Evaluator assessment forms with structured scoring for each HS-designated ACR
- →Behavioral observation notes and safety concern flags
Evidence Inspection#
Typical duration: 1-3 weeks
ACRs designated with the Evidence Inspection (EI) evaluation method are assessed through detailed examination of documentary evidence, configuration artifacts, and operational records. This phase validates that required infrastructure, processes, and documentation are in place and adequately maintained.
Key Activities
- •Inspection of decision provenance chain records and tamper-evidence verification
- •Review of behavioral drift baseline specifications with cryptographic signature verification
- •Assessment of telemetry pipeline architecture and integrity verification mechanisms
- •Audit trail completeness verification through sample reconstruction exercises
- •Governance framework document review and role/authority matrix validation
- •Algorithmic impact disclosure document assessment
- •Supply chain integrity verification including SBOM review and vulnerability monitoring
Phase Outputs
- →Evidence inspection report with compliance assessment for each EI-designated ACR
- →Evidence quality assessment with recommendations for improvement
Continuous Monitoring Validation#
Typical duration: 4-12 weeks
ACRs designated with the Continuous Monitoring (CM) evaluation method are assessed through analysis of telemetry and monitoring data collected over a defined observation period. This phase validates that ongoing monitoring infrastructure is operational and capable of detecting the conditions specified in applicable controls.
Key Activities
- •Validation of resource exhaustion monitoring thresholds and shedding activation
- •Continuous drift monitoring verification against certified behavioral baseline
- •Data distribution shift detection capability assessment
- •Anomaly detection effectiveness evaluation over the observation period
- •Monitoring coverage verification across all declared operational parameters
Phase Outputs
- →Continuous monitoring validation report with telemetry analysis
- →Monitoring coverage matrix mapping monitored parameters to ACR requirements
- →Observation period summary with detected events and system responses
Adversarial Evaluation#
Typical duration: 2-8 weeks
For L2 and L3 certifications, structured adversarial evaluation is conducted to validate system resilience against deliberate attack. L2 requires a minimum of 40 hours of structured human adversarial simulation. L3 requires 80 or more hours plus an independent red team assessment approved by ARAF.
Key Activities
- •Structured red team exercises targeting all adversarial robustness controls
- •Multi-turn attack sequence evaluation including social engineering and role confusion
- •Supply chain attack simulation and third-party component compromise testing
- •For physical systems: adversarial example testing in perception pipelines
- •Independent red team validation by ARAF-approved evaluators (L3 only)
Phase Outputs
- →Adversarial evaluation report with attack taxonomy coverage analysis
- →Resistance rate calculations against known attack categories
- →Independent red team report with findings and severity classifications (L3)
- →Remediation recommendations for identified vulnerabilities
Certification Decision#
Typical duration: 1-2 weeks
The AVB consolidates all evaluation findings and renders a formal certification decision. The decision is one of: Certified (full compliance at the requested level), Conditionally Certified (minor non-compliances with mandated remediation), or Denied (blocking control failures or insufficient overall compliance).
Key Activities
- •Consolidation of all evaluation phase reports into a comprehensive assessment
- •Domain compliance score calculation using risk-weighted ACR results
- •Comparison of domain scores against certification level thresholds
- •Identification of any blocking control failures that mandate denial
- •Formulation of conditions and remediation timelines for conditional certification
- •Peer review of certification decision by a second qualified evaluator
Phase Outputs
- →Formal certification decision document
- →Certification certificate with scope statement, level, and validity period
- →Conditions register with remediation timelines (if conditionally certified)
- →Denial rationale with specific control failures identified (if denied)
- →Registry entry for certified systems
Post-Certification Onboarding#
Typical duration: 2-4 weeks
Following a positive certification decision, the certified organization completes onboarding into the ARA monitoring framework. This includes establishing continuous monitoring integrations, configuring alerting thresholds, and setting up the reporting cadence for the ongoing compliance monitoring phase.
Key Activities
- •Configuration of continuous compliance monitoring integrations
- •Establishment of drift detection baseline synchronization with monitoring infrastructure
- •Configuration of alerting thresholds and notification channels
- •Onboarding to the ARA public registry with verified certification details
- •Distribution of ARA certification mark assets with usage guidelines
- •Scheduling of first reassessment based on certification level requirements
Phase Outputs
- →Monitoring onboarding confirmation with integration verification
- →Public registry entry with certification details
- →Certification mark package with brand usage guidelines
- →Reassessment schedule confirmation
Ongoing Compliance Monitoring#
Typical duration: Continuous
Certified systems are subject to ongoing compliance monitoring for the duration of their certification period. The monitoring cadence and depth are determined by the certification level: L1 annual, L2 semi-annual, L3 quarterly. Material changes to the system or its operating environment may trigger interim reassessment requirements.
Key Activities
- •Continuous monitoring of behavioral drift against certified baseline
- •Periodic reassessment at the cadence defined by the certification level
- •Review of change management logs and incident response records
- •Verification that conditional certification remediation has been completed on schedule
- •Investigation of monitoring alerts that indicate potential compliance deviations
- •Assessment of material changes that may affect certification scope or validity
- •Certification renewal evaluation at the end of each certification period
Phase Outputs
- →Periodic compliance monitoring reports
- →Reassessment results with updated compliance status
- →Monitoring alert investigation reports
- →Certification renewal decision at period expiry
- →Registry status updates reflecting current compliance state
Duration Estimates
The total time from intake to certification decision varies by certification level and system complexity. The following are typical duration ranges:
| Level | Typical Duration | Reassessment |
|---|---|---|
| L1Supervised | 8-16 weeks | Annual |
| L2Bounded | 14-26 weeks | Semi-annual |
| L3High-Stakes | 20-40 weeks | Quarterly |