Skip to main content
Last verified: 2026-02-13 | Commit scope: bc0fb41

Domain Objects

Node

Location: internal/models/node.go
FieldTypeDescription
idUUIDUnique identifier
subscription_idUUIDOwning subscription
chain_profile_idstringChain identifier (e.g., celestia-mocha)
node_typestringNode type (full, archive)
stateNodeStateInfrastructure lifecycle state (owned by heartbeat evaluator)
sync_statusSyncStatusChain sync status (syncing, synced)
application_healthApplicationHealthApplication health (owned by policy evaluator)
providerstringCloud provider (internal)
instance_typestringVM instance type (internal)
host_idstringProvider host identifier (internal)
host_ipstringHost IP address (internal)
regionstringCloud region (internal)
last_heartbeattimestampLast agent heartbeat
last_migration_triggered_attimestampMigration cooldown tracking
Field Ownership: All state mutations flow through NodeHealthMachine (internal/health/machine.go), which enforces transitions, uses optimistic locking, and emits events via health_event_outbox.
FieldWriterMachine MethodPurpose
stateHeartbeat EvaluatorApplyHeartbeatDecisions()Infrastructure liveness
application_healthPolicy EvaluatorUpdateApplicationHealth()Application health (CEL policies)
sync_statusNATS sync eventsUpdateSyncStatus()Chain sync progress
State values:
  • state: provisioning, syncing, healthy, degraded, down, maintenance, terminating, terminated, failed
  • application_health: unknown, ok, degraded, critical
Display state: DisplayState() computes a combined state for API responses — infrastructure states take precedence, then application_health, then sync_status. Invariants:
  • Node belongs to exactly one subscription
  • Infrastructure fields (provider, host_id, host_ip) are internal-only
  • Terminal states (terminated, failed) have no outgoing transitions
  • Terminal transitions auto-reset sync_status and application_health to unknown
  • Migration cooldown prevents rapid re-triggering (default: 1h)

Node State Machine

Transition table (defined in models.validTransitions):
FromValid targets
provisioningsyncing, terminating, failed
syncinghealthy, degraded, down, terminating
healthydegraded, down, maintenance, terminating
degradedhealthy, down, maintenance, terminating
downsyncing, healthy, maintenance, terminating
maintenancesyncing, healthy, terminating
terminatingterminated, failed
terminated, failed— (terminal)
API: CanTransitionTo(target), IsTerminal(), ValidSourceStates(target). Force updates via ForceUpdateState() bypass validation for compensation.
See also: Health and Incidents for how transitions are triggered.

Subscription

Location: internal/models/subscription.go
FieldTypeSourceDescription
idUUIDSystemUnique identifier
user_idstringAuth contextOwner
chain_profile_idstringUser inputChain profile
node_typestringUser inputNode type
providerstringSystem-resolvedCloud provider (from duration mapping)
regionstringSystem-resolvedDeployment region
instance_typestringSystem-resolvedCompute instance type
durationstringUser input1w, 2w, 1m, 3m, 6m
statusSubscriptionStatusSystemLifecycle state
expires_attimestampSystemSubscription period end
grace_period_expires_attimestamp (nullable)SystemRenewal deadline after expiration
payment_idstring (nullable)SystemPayment service reference
User-provided: chain_profile_id, node_type, duration. System-resolved: provider, region, instance_type (from chain profile + duration mapping). API responses hide system-resolved fields.

Subscription State Machine

StatusMeaning
pending_paymentCreated but unpaid. No nodes provisioned. PaymentID is null.
activePaid and running. Nodes provisioned and serving.
expiringPast expires_at, in 24h grace period. Nodes still running.
terminatedGrace period ended. All nodes terminated, backups deleted.
Invariants:
  • Infrastructure fields resolved at creation time (not user-provided)
  • pending_payment subscriptions deleted after TTL (default 30m)
  • Expiration triggers 24h grace period before termination
  • Abandoned pending_payment subscriptions have payment_id = null
See also: Workflows — Subscription Lifecycle for execution flow.

Incident

Location: internal/models/incident.go
FieldTypeDescription
idUUIDUnique identifier
node_idUUIDAssociated node
subscription_idUUIDAssociated subscription
user_idUUID (nullable)Resolved owner (notification routing)
chain_profile_idstringChain identifier
categoryIncidentCategoryIncident type
severityIncidentSeverityCurrent severity
statusIncidentStatusLifecycle status
titlestringHuman-readable title
descriptionstringDetailed description
metadatamapTrigger-specific context
occurrence_countintDedup counter
first_seen_attimestampFirst occurrence
last_seen_attimestampMost recent occurrence
acknowledged_attimestamp (nullable)User acknowledgment
resolved_attimestamp (nullable)Resolution time
is_flappingboolFlapping flag (persisted, survives restarts)
resolution_debounceintPer-incident debounce counter for auto-resolution
Categories:
CategoryTriggerInitial Severity
node_downState transition to downcritical
app_criticalapplication_health -> criticalcritical
app_degradedapplication_health -> degradedwarning
sync_stalledsync_status -> stalled (CEL pipeline)warning
provision_failedState -> failed from provisioningcritical
migration_failedState -> failed from maintenancecritical
Incident Status Lifecycle: Dedup: Partial unique index on (node_id, category) WHERE status NOT IN ('resolved', 'auto_resolved'). Upsert increments occurrence_count, updates last_seen_at. No foreign keys: Incidents are historical records that survive node/subscription deletion.
See also: Health and Incidents for the full incident pipeline.

User

Location: internal/models/user.go
FieldTypeDescription
idUUIDUnique identifier
wallet_address*string (nullable)Blockchain wallet address (checksummed for Ethereum)
email*string (nullable)Email address
wallet_public_key[]bytePublic key bytes from signature
wallet_type*WalletType (nullable)ethereum, solana
external_auth_id*string (nullable)Clerk user_xxxx ID
Identity combinations:
Scenariowallet_addresswallet_typeexternal_auth_id
Clerk user (no wallet)nilnilset
Clerk user + Ethereum walletsetethereumset
Clerk user + Solana walletsetsolanaset
Invariants: Wallet address stored checksummed (EIP-55), lookups case-insensitive (lowercase index). Email optional and unique when provided. external_auth_id unique when not null. Created via Clerk webhook.

API Key

Location: internal/models/apikey.go, internal/models/apikey_scope.go
FieldTypeDescription
idUUIDUnique identifier
user_idUUIDOwning user
namestringHuman-readable name (max 100 chars)
key_hashstringSHA-256 hash (plaintext never stored)
scopesstring[]Granted permissions
expires_attimestamp (nullable)Expiration
revoked_attimestamp (nullable)Revocation
last_used_attimestamp (nullable)Last validation
deleted_attimestamp (nullable)Soft-delete
Scopes: nodes:read, nodes:write, subscriptions:read, subscriptions:write, chains:read, keys:export, payments:read, payments:write, api-keys:manage, * (wildcard) Business rules: At least one scope required. Max 50 active keys per user. Min 1h expiration. Soft-deleted for audit trail. Rotation is atomic (new key + old revoked in one transaction). Plaintext returned only on creation/rotation.

Node Keys

Location: migrations/001_initial_schema.sql:88 Encryption hierarchy:
Master Key (Vault) --encrypts--> DEK (32-byte AES, stored encrypted) --encrypts--> Chain Key (mnemonic, stored encrypted)
User-provided secrets flow:
Client fetches X25519 public key (GET /api/v1/crypto/public-key)
 -> Client encrypts with NaCl sealed box (tweetnacl)
 -> Sealed ciphertext stored in provisioning_input_secrets
 -> PrepareUserProvidedKeys activity: decrypt with server X25519 private key (Vault)
 -> Re-encrypt with node DEK (AES-256-GCM) -> stored in node_keys
Invariants: Master key never leaves Vault. DEK in plaintext only in agent memory during operation. All keys deleted on subscription expiration.

Auth Nonce

Location: internal/models/auth.go — Single-use challenge for wallet verification. 64-char random, 5-min TTL.

Refresh Token

Location: internal/models/auth.go — SHA-256 hashed, 7-day TTL. Atomic rotation on refresh (old revoked, new created in one transaction). Multiple active tokens per user (different devices).

Rollout Group

Location: internal/models/rollout.go Coordinates multi-binary upgrades (e.g., ethereum-holesky: geth + lighthouse). Executes component rollouts in declared order.
FieldTypeDescription
idUUIDUnique identifier
chain_profile_idstringTarget chain
statusRolloutGroupStatusLifecycle state
failure_policyGroupFailurePolicypartial_ok, rollback_all, manual (REQUIRED, no default)
component_orderJSONBOrdered array of {binary_name, version, url, checksum, upgrade_id (optional for auto-population)}
desired_versionsJSONBTarget state: {binary_name: version}
strategystringrolling, canary, all_at_once
batch_sizeintNodes per batch
canary_sizeintCanary batch size
failure_thresholdfloatAuto-pause threshold (0.0–1.0)
health_wait_durationintervalHealth gate duration between components
created_bystringOperator identifier
Group status values: pending, running, paused, completed, partial, failed, cancelled, rolled_back Concurrency: Partial unique index on (chain_profile_id) WHERE status IN ('pending', 'running', 'paused') — one active group per chain. Standalone single-component rollouts do not require a group.

Rollout

Location: internal/models/rollout.go Single-component upgrade rollout. Tracks progress, strategy, and per-node status.
FieldTypeDescription
idUUIDUnique identifier
group_idUUID (nullable)Parent rollout group (null for standalone)
component_indexintOrdering within group
chain_profile_idstringTarget chain
binary_namestringBinary to upgrade
source_versionstringCurrent version
target_versionstringTarget version
target_binary_urlstringFully resolved artifact URL
target_binary_checksumstringSHA256 checksum (REQUIRED)
strategyRolloutStrategyrolling, canary, all_at_once
statusRolloutStatusLifecycle state
batch_sizeintNodes per batch
failure_thresholdfloatAuto-pause threshold
health_wait_durationintervalPost-batch health gate
total_nodesintDenormalized progress counter
succeeded_nodesintDenormalized progress counter
failed_nodesintDenormalized progress counter
manifest_content_hashstringSHA256 of manifest at creation time
Rollout status values: pending, scheduled, running, paused, completed, failed, cancelled, rolled_back Concurrency: Partial unique index on (chain_profile_id) WHERE status IN ('pending', 'scheduled', 'running', 'paused').

Rollout Status State Machine

Rollout Node

Location: internal/models/rollout.go Per-node upgrade tracking within a rollout.
FieldTypeDescription
idUUIDUnique identifier
rollout_idUUIDParent rollout
node_idUUIDTarget node
batch_indexintAssigned batch
statusNodeUpgradeStatusLifecycle state
upgrade_phaseUpgradePhaseCurrent execution phase (display state)
previous_versionstringVersion before upgrade
previous_binary_urlstringURL before upgrade (for rollback)
previous_checksumstringChecksum before upgrade
error_messagestringFailure details
attempt_countintRetry counter
Node upgrade status values: pending, in_progress, validating, succeeded, failed, rolled_back, skipped

UpgradePhase Enum

Display states derived from executor progress through the action sequence:
PhaseMeaning
preparingBackup + artifact acquisition + verification
stoppedService stopped
mutatingArtifact/config replacement in progress
startingService start issued
verifyingHealth validation in progress
completedHealth gate passed
rollbackingCompensation in progress
rolled_backAll compensations succeeded
failed_rollbackCompensation failed — manual intervention needed

NodeConfigState

Location: internal/models/node_config_state.go Tracks binary version per node. Pre-existing node_config_state table now has Go model and repository.
FieldTypeDescription
node_idUUIDNode reference
binary_versionstring (nullable)Currently installed binary version
config_versionstring (nullable)Currently applied config version
Written by UpdateNodeBinaryVersion activity after successful upgrade, and by provisioning workflow after initial CONFIGURE.
See also: Workflows — Upgrade Rollout Workflows for the full workflow hierarchy.

Business Rules

Key Management

  1. Keys exist in plaintext only temporarily for node operation
  2. Encrypted recovery backups only within subscription TTL
  3. No human-accessible key material (no SSH, keys only via encrypted gRPC)
  4. Recovery without backup is impossible by design
  5. User-provided secrets use client-side NaCl sealed box encryption

Migration

  1. Gradual detection: Consecutive failures tracked per node (default: 3 required for DOWN)
  2. Cooldown: 1h between migration attempts (persisted in DB)
  3. Auto-recovery: Heartbeat during DOWN grace period -> back to HEALTHY, counter reset
  4. Steps: Verify backup -> provision new infra -> restore keys -> start node -> destroy old
  5. Non-blocking: No user approval required
ParameterDefaultPurpose
HeartbeatTimeout60sTime without heartbeat to count as failure
ConsecutiveFailuresForDown3Failures before DOWN state
MigrationGracePeriod60sWait after DOWN before migration
MigrationCooldown1hMin time between migrations

Subscription Lifecycle

  1. Two-phase creation: POST /subscriptions creates with pending_payment. Idempotent — returns existing if same user/chain/nodeType/duration.
  2. Pending cleanup: Abandoned pending_payment deleted after TTL (default 30m)
  3. Expiration: active -> expiring (24h grace period). Nodes continue running.
  4. Termination: Grace period expired -> terminate all nodes, delete backups -> terminated

Database Schema Summary

Main App Database

TablePurpose
usersUser identity (wallet, email, external_auth_id)
refresh_tokensJWT refresh tokens (hashed)
api_keysAPI authentication (scoped, soft-delete)
subscriptionsSubscription lifecycle (status, duration, payment_id, grace_period)
nodesNode lifecycle (state, sync_status, application_health)
node_health_statePer-node health tracking (consecutive_failures, down_since, version)
health_event_outboxHealth events pending dispatch
incidentsHealth incidents (no foreign keys, survives deletion)
notification_outboxFailed notifications queued for retry
agent_registrationsAgent connectivity (last_seen_at, running_command_ids)
node_keysEncrypted key material (encrypted_dek, encrypted_chain_key)
provisioning_inputsProvisioning input values (text/proof, subscription-scoped)
provisioning_input_secretsUser-provided secrets (sealed box ciphertext)
node_tokensInternal authentication
node_config_statePer-node version tracking (binary_version, config_version)
rollout_groupsMulti-component rollout coordination
rolloutsSingle-component upgrade rollouts
rollout_nodesPer-node upgrade tracking within a rollout

Payment Service Database

Separate PostgreSQL instance. See Payment Service for schema.
DataPayment ServiceMain App
User identityMirror (customer_id)Primary (users)
Payment recordsPrimaryReference only
SubscriptionsReference onlyPrimary

Key Indexes

IndexTablePurpose
wallet_address (LOWER, unique)usersCase-insensitive lookup
email (unique where not null)usersEmail uniqueness
external_auth_id (partial unique)usersExternal auth mapping
token_hash (unique)refresh_tokensToken validation
statenodesHealth evaluation queries
subscription_idnodesSubscription cleanup
last_seen_atagent_registrationsHeartbeat timeout
expires_atsubscriptionsExpiration cleanup
(node_id, category) WHERE status NOT IN (resolved, auto_resolved)incidentsDedup upsert
payment_id WHERE status = 'pending_payment'subscriptionsPayment lookup
created_at WHERE status = 'pending_payment'subscriptionsTTL cleanup
(chain_profile_id) WHERE status IN activerollout_groupsOne active group per chain
(chain_profile_id) WHERE status IN activerolloutsOne active rollout per chain
(rollout_id, node_id) UNIQUErollout_nodesDedup per rollout-node pair
(rollout_id, status)rollout_nodesBatch status queries

State Storage Locations

StateStorageAccess Pattern
Infrastructure statePostgreSQL (nodes.state)Written via NodeHealthMachine by heartbeat evaluator
Application healthPostgreSQL (nodes.application_health)Written via NodeHealthMachine by policy evaluator
Health trackingPostgreSQL (node_health_state)Per-node: consecutive_failures, down_since, version
Health eventsPostgreSQL (health_event_outbox)Emitted by health machine, consumed by outbox worker
IncidentsPostgreSQL (incidents)No foreign keys (survives deletion)
Notification retryPostgreSQL (notification_outbox)Exponential backoff retry
Encrypted keysPostgreSQL (node_keys)Written once, read for injection/migration
Key backupsS3Created at provisioning, deleted at termination
Terraform stateFiles (/app/terraform-state/{node_id}/)Per-node, managed by Terraform
Pending commandsRedis queuePushed by orchestrator, consumed by agent
Command progressRedis (progress:{commandID})Written by gateway, read by orchestrator
In-flight workflowsTemporalManaged by Temporal server
MetricsVictoria Metrics TSDB30-day retention, queried by policy evaluator
Rollout statePostgreSQL (rollouts, rollout_nodes)Written by workflow activities
Version trackingPostgreSQL (node_config_state)Updated after successful upgrade/provision
Upgrade backupNode filesystem ({data_dir}/.upgrade-backup/{upgrade_id}/)Per-upgrade, retained 24h
Metrics transportNATS JetStream1-hour retention