Last verified: 2026-02-13 | Commit scope: bc0fb41
Domain Objects
Node
Location:internal/models/node.go
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
subscription_id | UUID | Owning subscription |
chain_profile_id | string | Chain identifier (e.g., celestia-mocha) |
node_type | string | Node type (full, archive) |
state | NodeState | Infrastructure lifecycle state (owned by heartbeat evaluator) |
sync_status | SyncStatus | Chain sync status (syncing, synced) |
application_health | ApplicationHealth | Application health (owned by policy evaluator) |
provider | string | Cloud provider (internal) |
instance_type | string | VM instance type (internal) |
host_id | string | Provider host identifier (internal) |
host_ip | string | Host IP address (internal) |
region | string | Cloud region (internal) |
last_heartbeat | timestamp | Last agent heartbeat |
last_migration_triggered_at | timestamp | Migration cooldown tracking |
NodeHealthMachine (internal/health/machine.go), which enforces transitions, uses optimistic locking, and emits events via health_event_outbox.
| Field | Writer | Machine Method | Purpose |
|---|---|---|---|
state | Heartbeat Evaluator | ApplyHeartbeatDecisions() | Infrastructure liveness |
application_health | Policy Evaluator | UpdateApplicationHealth() | Application health (CEL policies) |
sync_status | NATS sync events | UpdateSyncStatus() | Chain sync progress |
state:provisioning,syncing,healthy,degraded,down,maintenance,terminating,terminated,failedapplication_health:unknown,ok,degraded,critical
DisplayState() computes a combined state for API responses — infrastructure states take precedence, then application_health, then sync_status.
Invariants:
- Node belongs to exactly one subscription
- Infrastructure fields (
provider,host_id,host_ip) are internal-only - Terminal states (
terminated,failed) have no outgoing transitions - Terminal transitions auto-reset
sync_statusandapplication_healthtounknown - Migration cooldown prevents rapid re-triggering (default: 1h)
Node State Machine
Transition table (defined inmodels.validTransitions):
| From | Valid targets |
|---|---|
provisioning | syncing, terminating, failed |
syncing | healthy, degraded, down, terminating |
healthy | degraded, down, maintenance, terminating |
degraded | healthy, down, maintenance, terminating |
down | syncing, healthy, maintenance, terminating |
maintenance | syncing, healthy, terminating |
terminating | terminated, failed |
terminated, failed | — (terminal) |
CanTransitionTo(target), IsTerminal(), ValidSourceStates(target). Force updates via ForceUpdateState() bypass validation for compensation.
See also: Health and Incidents for how transitions are triggered.
Subscription
Location:internal/models/subscription.go
| Field | Type | Source | Description |
|---|---|---|---|
id | UUID | System | Unique identifier |
user_id | string | Auth context | Owner |
chain_profile_id | string | User input | Chain profile |
node_type | string | User input | Node type |
provider | string | System-resolved | Cloud provider (from duration mapping) |
region | string | System-resolved | Deployment region |
instance_type | string | System-resolved | Compute instance type |
duration | string | User input | 1w, 2w, 1m, 3m, 6m |
status | SubscriptionStatus | System | Lifecycle state |
expires_at | timestamp | System | Subscription period end |
grace_period_expires_at | timestamp (nullable) | System | Renewal deadline after expiration |
payment_id | string (nullable) | System | Payment service reference |
chain_profile_id, node_type, duration. System-resolved: provider, region, instance_type (from chain profile + duration mapping). API responses hide system-resolved fields.
Subscription State Machine
| Status | Meaning |
|---|---|
pending_payment | Created but unpaid. No nodes provisioned. PaymentID is null. |
active | Paid and running. Nodes provisioned and serving. |
expiring | Past expires_at, in 24h grace period. Nodes still running. |
terminated | Grace period ended. All nodes terminated, backups deleted. |
- Infrastructure fields resolved at creation time (not user-provided)
pending_paymentsubscriptions deleted after TTL (default 30m)- Expiration triggers 24h grace period before termination
- Abandoned
pending_paymentsubscriptions havepayment_id = null
See also: Workflows — Subscription Lifecycle for execution flow.
Incident
Location:internal/models/incident.go
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
node_id | UUID | Associated node |
subscription_id | UUID | Associated subscription |
user_id | UUID (nullable) | Resolved owner (notification routing) |
chain_profile_id | string | Chain identifier |
category | IncidentCategory | Incident type |
severity | IncidentSeverity | Current severity |
status | IncidentStatus | Lifecycle status |
title | string | Human-readable title |
description | string | Detailed description |
metadata | map | Trigger-specific context |
occurrence_count | int | Dedup counter |
first_seen_at | timestamp | First occurrence |
last_seen_at | timestamp | Most recent occurrence |
acknowledged_at | timestamp (nullable) | User acknowledgment |
resolved_at | timestamp (nullable) | Resolution time |
is_flapping | bool | Flapping flag (persisted, survives restarts) |
resolution_debounce | int | Per-incident debounce counter for auto-resolution |
| Category | Trigger | Initial Severity |
|---|---|---|
node_down | State transition to down | critical |
app_critical | application_health -> critical | critical |
app_degraded | application_health -> degraded | warning |
sync_stalled | sync_status -> stalled (CEL pipeline) | warning |
provision_failed | State -> failed from provisioning | critical |
migration_failed | State -> failed from maintenance | critical |
(node_id, category) WHERE status NOT IN ('resolved', 'auto_resolved'). Upsert increments occurrence_count, updates last_seen_at.
No foreign keys: Incidents are historical records that survive node/subscription deletion.
See also: Health and Incidents for the full incident pipeline.
User
Location:internal/models/user.go
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
wallet_address | *string (nullable) | Blockchain wallet address (checksummed for Ethereum) |
email | *string (nullable) | Email address |
wallet_public_key | []byte | Public key bytes from signature |
wallet_type | *WalletType (nullable) | ethereum, solana |
external_auth_id | *string (nullable) | Clerk user_xxxx ID |
| Scenario | wallet_address | wallet_type | external_auth_id |
|---|---|---|---|
| Clerk user (no wallet) | nil | nil | set |
| Clerk user + Ethereum wallet | set | ethereum | set |
| Clerk user + Solana wallet | set | solana | set |
external_auth_id unique when not null. Created via Clerk webhook.
API Key
Location:internal/models/apikey.go, internal/models/apikey_scope.go
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
user_id | UUID | Owning user |
name | string | Human-readable name (max 100 chars) |
key_hash | string | SHA-256 hash (plaintext never stored) |
scopes | string[] | Granted permissions |
expires_at | timestamp (nullable) | Expiration |
revoked_at | timestamp (nullable) | Revocation |
last_used_at | timestamp (nullable) | Last validation |
deleted_at | timestamp (nullable) | Soft-delete |
nodes:read, nodes:write, subscriptions:read, subscriptions:write, chains:read, keys:export, payments:read, payments:write, api-keys:manage, * (wildcard)
Business rules: At least one scope required. Max 50 active keys per user. Min 1h expiration. Soft-deleted for audit trail. Rotation is atomic (new key + old revoked in one transaction). Plaintext returned only on creation/rotation.
Node Keys
Location:migrations/001_initial_schema.sql:88
Encryption hierarchy:
Auth Nonce
Location:internal/models/auth.go — Single-use challenge for wallet verification. 64-char random, 5-min TTL.
Refresh Token
Location:internal/models/auth.go — SHA-256 hashed, 7-day TTL. Atomic rotation on refresh (old revoked, new created in one transaction). Multiple active tokens per user (different devices).
Rollout Group
Location:internal/models/rollout.go
Coordinates multi-binary upgrades (e.g., ethereum-holesky: geth + lighthouse). Executes component rollouts in declared order.
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
chain_profile_id | string | Target chain |
status | RolloutGroupStatus | Lifecycle state |
failure_policy | GroupFailurePolicy | partial_ok, rollback_all, manual (REQUIRED, no default) |
component_order | JSONB | Ordered array of {binary_name, version, url, checksum, upgrade_id (optional for auto-population)} |
desired_versions | JSONB | Target state: {binary_name: version} |
strategy | string | rolling, canary, all_at_once |
batch_size | int | Nodes per batch |
canary_size | int | Canary batch size |
failure_threshold | float | Auto-pause threshold (0.0–1.0) |
health_wait_duration | interval | Health gate duration between components |
created_by | string | Operator identifier |
pending, running, paused, completed, partial, failed, cancelled, rolled_back
Concurrency: Partial unique index on (chain_profile_id) WHERE status IN ('pending', 'running', 'paused') — one active group per chain.
Standalone single-component rollouts do not require a group.
Rollout
Location:internal/models/rollout.go
Single-component upgrade rollout. Tracks progress, strategy, and per-node status.
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
group_id | UUID (nullable) | Parent rollout group (null for standalone) |
component_index | int | Ordering within group |
chain_profile_id | string | Target chain |
binary_name | string | Binary to upgrade |
source_version | string | Current version |
target_version | string | Target version |
target_binary_url | string | Fully resolved artifact URL |
target_binary_checksum | string | SHA256 checksum (REQUIRED) |
strategy | RolloutStrategy | rolling, canary, all_at_once |
status | RolloutStatus | Lifecycle state |
batch_size | int | Nodes per batch |
failure_threshold | float | Auto-pause threshold |
health_wait_duration | interval | Post-batch health gate |
total_nodes | int | Denormalized progress counter |
succeeded_nodes | int | Denormalized progress counter |
failed_nodes | int | Denormalized progress counter |
manifest_content_hash | string | SHA256 of manifest at creation time |
pending, scheduled, running, paused, completed, failed, cancelled, rolled_back
Concurrency: Partial unique index on (chain_profile_id) WHERE status IN ('pending', 'scheduled', 'running', 'paused').
Rollout Status State Machine
Rollout Node
Location:internal/models/rollout.go
Per-node upgrade tracking within a rollout.
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
rollout_id | UUID | Parent rollout |
node_id | UUID | Target node |
batch_index | int | Assigned batch |
status | NodeUpgradeStatus | Lifecycle state |
upgrade_phase | UpgradePhase | Current execution phase (display state) |
previous_version | string | Version before upgrade |
previous_binary_url | string | URL before upgrade (for rollback) |
previous_checksum | string | Checksum before upgrade |
error_message | string | Failure details |
attempt_count | int | Retry counter |
pending, in_progress, validating, succeeded, failed, rolled_back, skipped
UpgradePhase Enum
Display states derived from executor progress through the action sequence:| Phase | Meaning |
|---|---|
preparing | Backup + artifact acquisition + verification |
stopped | Service stopped |
mutating | Artifact/config replacement in progress |
starting | Service start issued |
verifying | Health validation in progress |
completed | Health gate passed |
rollbacking | Compensation in progress |
rolled_back | All compensations succeeded |
failed_rollback | Compensation failed — manual intervention needed |
NodeConfigState
Location:internal/models/node_config_state.go
Tracks binary version per node. Pre-existing node_config_state table now has Go model and repository.
| Field | Type | Description |
|---|---|---|
node_id | UUID | Node reference |
binary_version | string (nullable) | Currently installed binary version |
config_version | string (nullable) | Currently applied config version |
UpdateNodeBinaryVersion activity after successful upgrade, and by provisioning workflow after initial CONFIGURE.
See also: Workflows — Upgrade Rollout Workflows for the full workflow hierarchy.
Business Rules
Key Management
- Keys exist in plaintext only temporarily for node operation
- Encrypted recovery backups only within subscription TTL
- No human-accessible key material (no SSH, keys only via encrypted gRPC)
- Recovery without backup is impossible by design
- User-provided secrets use client-side NaCl sealed box encryption
Migration
- Gradual detection: Consecutive failures tracked per node (default: 3 required for DOWN)
- Cooldown: 1h between migration attempts (persisted in DB)
- Auto-recovery: Heartbeat during DOWN grace period -> back to HEALTHY, counter reset
- Steps: Verify backup -> provision new infra -> restore keys -> start node -> destroy old
- Non-blocking: No user approval required
| Parameter | Default | Purpose |
|---|---|---|
| HeartbeatTimeout | 60s | Time without heartbeat to count as failure |
| ConsecutiveFailuresForDown | 3 | Failures before DOWN state |
| MigrationGracePeriod | 60s | Wait after DOWN before migration |
| MigrationCooldown | 1h | Min time between migrations |
Subscription Lifecycle
- Two-phase creation:
POST /subscriptionscreates withpending_payment. Idempotent — returns existing if same user/chain/nodeType/duration. - Pending cleanup: Abandoned
pending_paymentdeleted after TTL (default 30m) - Expiration:
active->expiring(24h grace period). Nodes continue running. - Termination: Grace period expired -> terminate all nodes, delete backups ->
terminated
Database Schema Summary
Main App Database
| Table | Purpose |
|---|---|
users | User identity (wallet, email, external_auth_id) |
refresh_tokens | JWT refresh tokens (hashed) |
api_keys | API authentication (scoped, soft-delete) |
subscriptions | Subscription lifecycle (status, duration, payment_id, grace_period) |
nodes | Node lifecycle (state, sync_status, application_health) |
node_health_state | Per-node health tracking (consecutive_failures, down_since, version) |
health_event_outbox | Health events pending dispatch |
incidents | Health incidents (no foreign keys, survives deletion) |
notification_outbox | Failed notifications queued for retry |
agent_registrations | Agent connectivity (last_seen_at, running_command_ids) |
node_keys | Encrypted key material (encrypted_dek, encrypted_chain_key) |
provisioning_inputs | Provisioning input values (text/proof, subscription-scoped) |
provisioning_input_secrets | User-provided secrets (sealed box ciphertext) |
node_tokens | Internal authentication |
node_config_state | Per-node version tracking (binary_version, config_version) |
rollout_groups | Multi-component rollout coordination |
rollouts | Single-component upgrade rollouts |
rollout_nodes | Per-node upgrade tracking within a rollout |
Payment Service Database
Separate PostgreSQL instance. See Payment Service for schema.| Data | Payment Service | Main App |
|---|---|---|
| User identity | Mirror (customer_id) | Primary (users) |
| Payment records | Primary | Reference only |
| Subscriptions | Reference only | Primary |
Key Indexes
| Index | Table | Purpose |
|---|---|---|
wallet_address (LOWER, unique) | users | Case-insensitive lookup |
email (unique where not null) | users | Email uniqueness |
external_auth_id (partial unique) | users | External auth mapping |
token_hash (unique) | refresh_tokens | Token validation |
state | nodes | Health evaluation queries |
subscription_id | nodes | Subscription cleanup |
last_seen_at | agent_registrations | Heartbeat timeout |
expires_at | subscriptions | Expiration cleanup |
(node_id, category) WHERE status NOT IN (resolved, auto_resolved) | incidents | Dedup upsert |
payment_id WHERE status = 'pending_payment' | subscriptions | Payment lookup |
created_at WHERE status = 'pending_payment' | subscriptions | TTL cleanup |
(chain_profile_id) WHERE status IN active | rollout_groups | One active group per chain |
(chain_profile_id) WHERE status IN active | rollouts | One active rollout per chain |
(rollout_id, node_id) UNIQUE | rollout_nodes | Dedup per rollout-node pair |
(rollout_id, status) | rollout_nodes | Batch status queries |
State Storage Locations
| State | Storage | Access Pattern |
|---|---|---|
| Infrastructure state | PostgreSQL (nodes.state) | Written via NodeHealthMachine by heartbeat evaluator |
| Application health | PostgreSQL (nodes.application_health) | Written via NodeHealthMachine by policy evaluator |
| Health tracking | PostgreSQL (node_health_state) | Per-node: consecutive_failures, down_since, version |
| Health events | PostgreSQL (health_event_outbox) | Emitted by health machine, consumed by outbox worker |
| Incidents | PostgreSQL (incidents) | No foreign keys (survives deletion) |
| Notification retry | PostgreSQL (notification_outbox) | Exponential backoff retry |
| Encrypted keys | PostgreSQL (node_keys) | Written once, read for injection/migration |
| Key backups | S3 | Created at provisioning, deleted at termination |
| Terraform state | Files (/app/terraform-state/{node_id}/) | Per-node, managed by Terraform |
| Pending commands | Redis queue | Pushed by orchestrator, consumed by agent |
| Command progress | Redis (progress:{commandID}) | Written by gateway, read by orchestrator |
| In-flight workflows | Temporal | Managed by Temporal server |
| Metrics | Victoria Metrics TSDB | 30-day retention, queried by policy evaluator |
| Rollout state | PostgreSQL (rollouts, rollout_nodes) | Written by workflow activities |
| Version tracking | PostgreSQL (node_config_state) | Updated after successful upgrade/provision |
| Upgrade backup | Node filesystem ({data_dir}/.upgrade-backup/{upgrade_id}/) | Per-upgrade, retained 24h |
| Metrics transport | NATS JetStream | 1-hour retention |
Related Documents
- Overview — System overview and service descriptions
- Workflows — State transition triggers, execution flows
- Health and Incidents — Health evaluation pipeline
- Extending — Adding wallet types