Telemetry SDK Integration Guide

The @araf/telemetry-sdk package provides a TypeScript client for submitting telemetry events from your autonomous system to the ARA Continuous Assurance Platform. It handles batching, retries, and transport negotiation based on your Assurance Class.

Installation

Package Manager

npm install @araf/telemetry-sdk
# or
yarn add @araf/telemetry-sdk

Configuration

Initialize the SDK with your ARA System Identifier, system profile, and telemetry endpoint. The endpoint is provided by your CAPO for Class B/C systems, or the ARA public ingestion endpoint for Class A.

TypeScript

config.ts

import { ARATelemetry, SystemProfile } from '@araf/telemetry-sdk';

const telemetry = new ARATelemetry({
  systemId: process.env.ARA_SYSTEM_ID!,
  profile: SystemProfile.STANDARD,
  endpoint: process.env.ARA_TELEMETRY_ENDPOINT!,
  batchSize: 100,
  flushIntervalMs: 5000,
});

Option	Type	Required	Description
systemId	string	Yes	Your ARA System Identifier (e.g., ASI-2026-00142)
profile	SystemProfile	Yes	System profile: FOUNDATIONAL, STANDARD, ADVANCED, or COMPREHENSIVE
endpoint	string	Yes	Telemetry ingestion URL (provided by CAPO or ARA)
batchSize	number	No	Events per batch (default: 50, max: 1000)
flushIntervalMs	number	No	Auto-flush interval in milliseconds (default: 10000)
maxRetries	number	No	Maximum retry attempts on failure (default: 3)
debug	boolean	No	Enable verbose logging (default: false)

Event Types

Operational Events

Track core system operations: decisions, tool invocations, and escalations to human oversight.

trackDecision()

DecisionEvent

interface DecisionEvent {
  domain: string;        // ARA domain slug
  action: string;        // Action taken (e.g., 'recommendation', 'approval')
  confidence: number;    // 0.0 - 1.0
  reasoning: string;     // Human-readable rationale
  outcome?: string;      // Result of the decision
  escalated?: boolean;   // Whether human oversight was invoked
}

telemetry.trackDecision({
  domain: 'decision-integrity',
  action: 'transaction.approve',
  confidence: 0.94,
  reasoning: 'Policy rules #14, #22 satisfied; amount within threshold.',
  outcome: 'approved',
  escalated: false,
});

trackToolCall()

ToolCallEvent

interface ToolCallEvent {
  toolName: string;          // Identifier of the tool invoked
  parameters: Record<string, unknown>;
  responseStatus: 'success' | 'error' | 'timeout';
  latencyMs: number;         // Round-trip time
}

telemetry.trackToolCall({
  toolName: 'credit-check-api',
  parameters: { applicantId: 'app_12345' },
  responseStatus: 'success',
  latencyMs: 234,
});

trackEscalation()

EscalationEvent

interface EscalationEvent {
  reason: string;            // Why escalation was triggered
  domain: string;            // ARA domain slug
  severity: 'warning' | 'critical';
  assignedTo?: string;       // Human operator identifier
  resolvedWithinMs?: number; // Time to human resolution
}

telemetry.trackEscalation({
  reason: 'Confidence below threshold for high-value transaction',
  domain: 'human-oversight',
  severity: 'warning',
  assignedTo: 'ops-team-lead',
});

Drift Metrics

Report behavioral baseline measurements and deviations that feed the drift detection pipeline.

trackDrift()

DriftEvent

interface DriftEvent {
  metricName: string;        // Metric identifier
  baselineValue: number;     // Expected value
  currentValue: number;      // Observed value
  deviationPercent: number;  // Percentage deviation
  windowHours: number;       // Measurement window
}

telemetry.trackDrift({
  metricName: 'approval_rate',
  baselineValue: 0.73,
  currentValue: 0.82,
  deviationPercent: 12.3,
  windowHours: 72,
});

trackBehavioralBaseline()

BaselineEvent

interface BaselineEvent {
  metricName: string;
  value: number;
  sampleSize: number;
  periodStart: string;   // ISO 8601
  periodEnd: string;     // ISO 8601
}

telemetry.trackBehavioralBaseline({
  metricName: 'approval_rate',
  value: 0.73,
  sampleSize: 14200,
  periodStart: '2026-01-01T00:00:00Z',
  periodEnd: '2026-01-31T23:59:59Z',
});

Incident Signals

Report incidents and anomalies detected in your system for compliance record and alerting.

trackIncident()

IncidentEvent

interface IncidentEvent {
  incidentType: string;          // e.g., 'boundary_exceedance', 'data_anomaly'
  severity: 'warning' | 'critical';
  affectedDomains: string[];     // ARA domain slugs
  description: string;
  resolutionStatus: 'open' | 'investigating' | 'resolved';
}

telemetry.trackIncident({
  incidentType: 'boundary_exceedance',
  severity: 'critical',
  affectedDomains: ['decision-integrity', 'operational-boundaries'],
  description: 'System approved transaction exceeding declared limit.',
  resolutionStatus: 'investigating',
});

trackAnomaly()

AnomalyEvent

interface AnomalyEvent {
  anomalyType: string;
  domain: string;
  severity: 'info' | 'warning' | 'critical';
  metric: string;
  expectedRange: [number, number];
  observedValue: number;
}

telemetry.trackAnomaly({
  anomalyType: 'statistical_outlier',
  domain: 'performance-reliability',
  severity: 'warning',
  metric: 'response_latency_p99',
  expectedRange: [100, 500],
  observedValue: 1240,
});

Health Checks

Periodic health check events confirm system liveness and SDK connectivity.

trackHealthCheck()

HealthCheckEvent

interface HealthCheckEvent {
  status: 'healthy' | 'degraded' | 'unhealthy';
  uptime: number;          // Seconds since last restart
  eventQueueDepth: number; // Pending events in buffer
  lastFlushAt: string;     // ISO 8601
}

telemetry.trackHealthCheck({
  status: 'healthy',
  uptime: 86400,
  eventQueueDepth: 12,
  lastFlushAt: '2026-02-28T14:29:55Z',
});

Class B vs Class C Integration

The SDK automatically adjusts its transport and batching behavior based on the Assurance Class of your certification. The key differences are summarized below.

Capability	Class B (Monitored)	Class C (Continuous)
Transport	HTTPS batch (POST)	WebSocket streaming
Flush interval	5,000 - 30,000 ms	Real-time (<1,000 ms)
Batch size	Up to 1,000 events	Single event streaming
Data retention	90 days	365 days
Health check frequency	Every 5 minutes	Every 60 seconds
Connection recovery	Retry with exponential backoff	Auto-reconnect with buffered replay

Error Handling & Retry Logic

The SDK includes built-in error handling with configurable retry behavior. Failed transmissions are buffered locally and retried with exponential backoff.

Error Handling Configuration

const telemetry = new ARATelemetry({
  systemId: process.env.ARA_SYSTEM_ID!,
  profile: SystemProfile.STANDARD,
  endpoint: process.env.ARA_TELEMETRY_ENDPOINT!,
  maxRetries: 5,
  retryBaseDelayMs: 1000,     // Initial retry delay
  retryMaxDelayMs: 30000,     // Maximum retry delay
  onError: (error, events) => {
    console.error('Telemetry transmission failed:', error.message);
    console.error('Affected events:', events.length);
  },
  onRetry: (attempt, delay) => {
    console.warn(`Retry attempt ${attempt}, next in ${delay}ms`);
  },
});

Automatic Buffering

Events are buffered in memory during network failures. The buffer holds up to 10,000 events before the oldest entries are dropped.

Exponential Backoff

Retries use exponential backoff with jitter: delay = min(baseDelay * 2^attempt + random(0, 1000), maxDelay).

Graceful Shutdown

Call telemetry.flush() before process exit to transmit all buffered events. The SDK registers SIGTERM/SIGINT handlers automatically.