Telemetry SDK Integration Guide

The @araf/telemetry-sdk package provides a TypeScript client for submitting telemetry events from your autonomous system to the ARA Continuous Assurance Platform. It handles batching, retries, and transport negotiation based on your Assurance Class.

Installation

Package Manager

npm install @araf/telemetry-sdk
# or
yarn add @araf/telemetry-sdk

Configuration

Initialize the SDK with your ARA System Identifier, system profile, and telemetry endpoint. The endpoint is provided by your CAPO for Class B/C systems, or the ARA public ingestion endpoint for Class A.

TypeScript

config.ts
import { ARATelemetry, SystemProfile } from '@araf/telemetry-sdk';

const telemetry = new ARATelemetry({
  systemId: process.env.ARA_SYSTEM_ID!,
  profile: SystemProfile.STANDARD,
  endpoint: process.env.ARA_TELEMETRY_ENDPOINT!,
  batchSize: 100,
  flushIntervalMs: 5000,
});
OptionTypeRequiredDescription
systemIdstringYesYour ARA System Identifier (e.g., ASI-2026-00142)
profileSystemProfileYesSystem profile: FOUNDATIONAL, STANDARD, ADVANCED, or COMPREHENSIVE
endpointstringYesTelemetry ingestion URL (provided by CAPO or ARA)
batchSizenumberNoEvents per batch (default: 50, max: 1000)
flushIntervalMsnumberNoAuto-flush interval in milliseconds (default: 10000)
maxRetriesnumberNoMaximum retry attempts on failure (default: 3)
debugbooleanNoEnable verbose logging (default: false)

Event Types

Operational Events

Track core system operations: decisions, tool invocations, and escalations to human oversight.

trackDecision()

DecisionEvent
interface DecisionEvent {
  domain: string;        // ARA domain slug
  action: string;        // Action taken (e.g., 'recommendation', 'approval')
  confidence: number;    // 0.0 - 1.0
  reasoning: string;     // Human-readable rationale
  outcome?: string;      // Result of the decision
  escalated?: boolean;   // Whether human oversight was invoked
}

telemetry.trackDecision({
  domain: 'decision-integrity',
  action: 'transaction.approve',
  confidence: 0.94,
  reasoning: 'Policy rules #14, #22 satisfied; amount within threshold.',
  outcome: 'approved',
  escalated: false,
});

trackToolCall()

ToolCallEvent
interface ToolCallEvent {
  toolName: string;          // Identifier of the tool invoked
  parameters: Record<string, unknown>;
  responseStatus: 'success' | 'error' | 'timeout';
  latencyMs: number;         // Round-trip time
}

telemetry.trackToolCall({
  toolName: 'credit-check-api',
  parameters: { applicantId: 'app_12345' },
  responseStatus: 'success',
  latencyMs: 234,
});

trackEscalation()

EscalationEvent
interface EscalationEvent {
  reason: string;            // Why escalation was triggered
  domain: string;            // ARA domain slug
  severity: 'warning' | 'critical';
  assignedTo?: string;       // Human operator identifier
  resolvedWithinMs?: number; // Time to human resolution
}

telemetry.trackEscalation({
  reason: 'Confidence below threshold for high-value transaction',
  domain: 'human-oversight',
  severity: 'warning',
  assignedTo: 'ops-team-lead',
});

Drift Metrics

Report behavioral baseline measurements and deviations that feed the drift detection pipeline.

trackDrift()

DriftEvent
interface DriftEvent {
  metricName: string;        // Metric identifier
  baselineValue: number;     // Expected value
  currentValue: number;      // Observed value
  deviationPercent: number;  // Percentage deviation
  windowHours: number;       // Measurement window
}

telemetry.trackDrift({
  metricName: 'approval_rate',
  baselineValue: 0.73,
  currentValue: 0.82,
  deviationPercent: 12.3,
  windowHours: 72,
});

trackBehavioralBaseline()

BaselineEvent
interface BaselineEvent {
  metricName: string;
  value: number;
  sampleSize: number;
  periodStart: string;   // ISO 8601
  periodEnd: string;     // ISO 8601
}

telemetry.trackBehavioralBaseline({
  metricName: 'approval_rate',
  value: 0.73,
  sampleSize: 14200,
  periodStart: '2026-01-01T00:00:00Z',
  periodEnd: '2026-01-31T23:59:59Z',
});

Incident Signals

Report incidents and anomalies detected in your system for compliance record and alerting.

trackIncident()

IncidentEvent
interface IncidentEvent {
  incidentType: string;          // e.g., 'boundary_exceedance', 'data_anomaly'
  severity: 'warning' | 'critical';
  affectedDomains: string[];     // ARA domain slugs
  description: string;
  resolutionStatus: 'open' | 'investigating' | 'resolved';
}

telemetry.trackIncident({
  incidentType: 'boundary_exceedance',
  severity: 'critical',
  affectedDomains: ['decision-integrity', 'operational-boundaries'],
  description: 'System approved transaction exceeding declared limit.',
  resolutionStatus: 'investigating',
});

trackAnomaly()

AnomalyEvent
interface AnomalyEvent {
  anomalyType: string;
  domain: string;
  severity: 'info' | 'warning' | 'critical';
  metric: string;
  expectedRange: [number, number];
  observedValue: number;
}

telemetry.trackAnomaly({
  anomalyType: 'statistical_outlier',
  domain: 'performance-reliability',
  severity: 'warning',
  metric: 'response_latency_p99',
  expectedRange: [100, 500],
  observedValue: 1240,
});

Health Checks

Periodic health check events confirm system liveness and SDK connectivity.

trackHealthCheck()

HealthCheckEvent
interface HealthCheckEvent {
  status: 'healthy' | 'degraded' | 'unhealthy';
  uptime: number;          // Seconds since last restart
  eventQueueDepth: number; // Pending events in buffer
  lastFlushAt: string;     // ISO 8601
}

telemetry.trackHealthCheck({
  status: 'healthy',
  uptime: 86400,
  eventQueueDepth: 12,
  lastFlushAt: '2026-02-28T14:29:55Z',
});

Class B vs Class C Integration

The SDK automatically adjusts its transport and batching behavior based on the Assurance Class of your certification. The key differences are summarized below.

CapabilityClass B (Monitored)Class C (Continuous)
TransportHTTPS batch (POST)WebSocket streaming
Flush interval5,000 - 30,000 msReal-time (<1,000 ms)
Batch sizeUp to 1,000 eventsSingle event streaming
Data retention90 days365 days
Health check frequencyEvery 5 minutesEvery 60 seconds
Connection recoveryRetry with exponential backoffAuto-reconnect with buffered replay

Error Handling & Retry Logic

The SDK includes built-in error handling with configurable retry behavior. Failed transmissions are buffered locally and retried with exponential backoff.

Error Handling Configuration

const telemetry = new ARATelemetry({
  systemId: process.env.ARA_SYSTEM_ID!,
  profile: SystemProfile.STANDARD,
  endpoint: process.env.ARA_TELEMETRY_ENDPOINT!,
  maxRetries: 5,
  retryBaseDelayMs: 1000,     // Initial retry delay
  retryMaxDelayMs: 30000,     // Maximum retry delay
  onError: (error, events) => {
    console.error('Telemetry transmission failed:', error.message);
    console.error('Affected events:', events.length);
  },
  onRetry: (attempt, delay) => {
    console.warn(`Retry attempt ${attempt}, next in ${delay}ms`);
  },
});

Automatic Buffering

Events are buffered in memory during network failures. The buffer holds up to 10,000 events before the oldest entries are dropped.

Exponential Backoff

Retries use exponential backoff with jitter: delay = min(baseDelay * 2^attempt + random(0, 1000), maxDelay).

Graceful Shutdown

Call telemetry.flush() before process exit to transmit all buffered events. The SDK registers SIGTERM/SIGINT handlers automatically.

Related Documentation