Certification Lifecycle

The ARA certification lifecycle defines the end-to-end process for achieving and maintaining certification. It consists of 10 phases spanning from initial intake through ongoing compliance monitoring. Each phase produces specific outputs that feed into subsequent phases, creating a structured and auditable certification pathway.

1

Intake Assessment#

Typical duration: 1-2 weeks

The certifying organization submits a formal intake request to an Authorized Verification Body (AVB). The AVB conducts a preliminary assessment of the system to determine scope, applicable certification level, and evaluation feasibility. This phase identifies whether the system is a candidate for ARA certification and which domains apply.

Key Activities

  • Organization submits system description, deployment context, and requested certification level
  • AVB reviews system architecture and operational scope documentation
  • AVB determines applicable domains based on system category (Agent, Multi-Agent, Physical, Hybrid)
  • Preliminary gap analysis identifies potential areas of non-compliance
  • Engagement agreement is formalized with scope, timeline, and fee structure

Phase Outputs

  • Intake assessment report with feasibility determination
  • Applicable domain and ACR mapping
  • Preliminary evaluation plan and timeline
  • Signed engagement agreement
2

Documentation Review#

Typical duration: 2-4 weeks

The AVB conducts a comprehensive review of the organization's technical and governance documentation. This phase evaluates the completeness and adequacy of evidence artifacts required by applicable ACRs before proceeding to active testing phases.

Key Activities

  • Review of system architecture documentation and operational boundary declarations
  • Assessment of governance framework documents, change management procedures, and incident response plans
  • Verification of audit trail schemas, monitoring configurations, and telemetry pipeline specifications
  • Review of credential lifecycle management, identity isolation, and permission boundary documentation
  • Gap identification for missing or insufficient documentation

Phase Outputs

  • Documentation review report with completeness assessment
  • Evidence gap register identifying required supplementary documentation
  • Readiness determination for active evaluation phases
3

Automated Testing#

Typical duration: 2-6 weeks

ACRs designated with the Automated Testing (AT) evaluation method are assessed through structured test execution. The AVB either executes standardized test suites or reviews the organization's test results against defined acceptance criteria. This phase covers the majority of technical controls across all applicable domains.

Key Activities

  • Execution of operational boundary enforcement tests
  • Privilege escalation and identity isolation testing
  • Graceful degradation and failure blast radius containment verification
  • Prompt injection resistance and adversarial input testing
  • Behavioral consistency testing under sustained load and temporal pressure
  • Drift detection regression testing and data integrity verification
  • API response validation and cross-system data flow integrity testing

Phase Outputs

  • Automated test execution report with pass/fail results for each AT-designated ACR
  • Test coverage analysis mapping test cases to ACR requirements
  • Deficiency notices for any failed controls
4

Human Simulation Testing#

Typical duration: 2-4 weeks

ACRs designated with the Human Simulation (HS) evaluation method are assessed through structured scenarios conducted by qualified human evaluators. These scenarios simulate realistic operational conditions including adversarial interactions, failure events, and edge cases that cannot be adequately assessed through automated testing alone.

Key Activities

  • Value alignment constraint enforcement testing through adversarial scenario simulation
  • Human override activation testing under nominal, degraded, and failure conditions
  • Adversarial input behavioral robustness evaluation across input channels
  • Safe state recovery verification following simulated failure events
  • Contested decision arbitration protocol evaluation
  • Emergency stop mechanism testing for physical systems (L3)
  • Multi-agent permission boundary bypass scenario evaluation (L2/L3)

Phase Outputs

  • Human simulation test report with scenario descriptions and outcomes
  • Evaluator assessment forms with structured scoring for each HS-designated ACR
  • Behavioral observation notes and safety concern flags
5

Evidence Inspection#

Typical duration: 1-3 weeks

ACRs designated with the Evidence Inspection (EI) evaluation method are assessed through detailed examination of documentary evidence, configuration artifacts, and operational records. This phase validates that required infrastructure, processes, and documentation are in place and adequately maintained.

Key Activities

  • Inspection of decision provenance chain records and tamper-evidence verification
  • Review of behavioral drift baseline specifications with cryptographic signature verification
  • Assessment of telemetry pipeline architecture and integrity verification mechanisms
  • Audit trail completeness verification through sample reconstruction exercises
  • Governance framework document review and role/authority matrix validation
  • Algorithmic impact disclosure document assessment
  • Supply chain integrity verification including SBOM review and vulnerability monitoring

Phase Outputs

  • Evidence inspection report with compliance assessment for each EI-designated ACR
  • Evidence quality assessment with recommendations for improvement
6

Continuous Monitoring Validation#

Typical duration: 4-12 weeks

ACRs designated with the Continuous Monitoring (CM) evaluation method are assessed through analysis of telemetry and monitoring data collected over a defined observation period. This phase validates that ongoing monitoring infrastructure is operational and capable of detecting the conditions specified in applicable controls.

Key Activities

  • Validation of resource exhaustion monitoring thresholds and shedding activation
  • Continuous drift monitoring verification against certified behavioral baseline
  • Data distribution shift detection capability assessment
  • Anomaly detection effectiveness evaluation over the observation period
  • Monitoring coverage verification across all declared operational parameters

Phase Outputs

  • Continuous monitoring validation report with telemetry analysis
  • Monitoring coverage matrix mapping monitored parameters to ACR requirements
  • Observation period summary with detected events and system responses
7

Adversarial Evaluation#

Typical duration: 2-8 weeks

For L2 and L3 certifications, structured adversarial evaluation is conducted to validate system resilience against deliberate attack. L2 requires a minimum of 40 hours of structured human adversarial simulation. L3 requires 80 or more hours plus an independent red team assessment approved by ARAF.

Key Activities

  • Structured red team exercises targeting all adversarial robustness controls
  • Multi-turn attack sequence evaluation including social engineering and role confusion
  • Supply chain attack simulation and third-party component compromise testing
  • For physical systems: adversarial example testing in perception pipelines
  • Independent red team validation by ARAF-approved evaluators (L3 only)

Phase Outputs

  • Adversarial evaluation report with attack taxonomy coverage analysis
  • Resistance rate calculations against known attack categories
  • Independent red team report with findings and severity classifications (L3)
  • Remediation recommendations for identified vulnerabilities
8

Certification Decision#

Typical duration: 1-2 weeks

The AVB consolidates all evaluation findings and renders a formal certification decision. The decision is one of: Certified (full compliance at the requested level), Conditionally Certified (minor non-compliances with mandated remediation), or Denied (blocking control failures or insufficient overall compliance).

Key Activities

  • Consolidation of all evaluation phase reports into a comprehensive assessment
  • Domain compliance score calculation using risk-weighted ACR results
  • Comparison of domain scores against certification level thresholds
  • Identification of any blocking control failures that mandate denial
  • Formulation of conditions and remediation timelines for conditional certification
  • Peer review of certification decision by a second qualified evaluator

Phase Outputs

  • Formal certification decision document
  • Certification certificate with scope statement, level, and validity period
  • Conditions register with remediation timelines (if conditionally certified)
  • Denial rationale with specific control failures identified (if denied)
  • Registry entry for certified systems
9

Post-Certification Onboarding#

Typical duration: 2-4 weeks

Following a positive certification decision, the certified organization completes onboarding into the ARA monitoring framework. This includes establishing continuous monitoring integrations, configuring alerting thresholds, and setting up the reporting cadence for the ongoing compliance monitoring phase.

Key Activities

  • Configuration of continuous compliance monitoring integrations
  • Establishment of drift detection baseline synchronization with monitoring infrastructure
  • Configuration of alerting thresholds and notification channels
  • Onboarding to the ARA public registry with verified certification details
  • Distribution of ARA certification mark assets with usage guidelines
  • Scheduling of first reassessment based on certification level requirements

Phase Outputs

  • Monitoring onboarding confirmation with integration verification
  • Public registry entry with certification details
  • Certification mark package with brand usage guidelines
  • Reassessment schedule confirmation
10

Ongoing Compliance Monitoring#

Typical duration: Continuous

Certified systems are subject to ongoing compliance monitoring for the duration of their certification period. The monitoring cadence and depth are determined by the certification level: L1 annual, L2 semi-annual, L3 quarterly. Material changes to the system or its operating environment may trigger interim reassessment requirements.

Key Activities

  • Continuous monitoring of behavioral drift against certified baseline
  • Periodic reassessment at the cadence defined by the certification level
  • Review of change management logs and incident response records
  • Verification that conditional certification remediation has been completed on schedule
  • Investigation of monitoring alerts that indicate potential compliance deviations
  • Assessment of material changes that may affect certification scope or validity
  • Certification renewal evaluation at the end of each certification period

Phase Outputs

  • Periodic compliance monitoring reports
  • Reassessment results with updated compliance status
  • Monitoring alert investigation reports
  • Certification renewal decision at period expiry
  • Registry status updates reflecting current compliance state

Duration Estimates

The total time from intake to certification decision varies by certification level and system complexity. The following are typical duration ranges:

LevelTypical DurationReassessment
L1Supervised8-16 weeksAnnual
L2Bounded14-26 weeksSemi-annual
L3High-Stakes20-40 weeksQuarterly