Domain Model - HoodCloud

Last verified: 2026-02-13 | Commit scope: bc0fb41

Domain Objects

Node

Location: internal/models/node.go

Field	Type	Description
`id`	UUID	Unique identifier
`subscription_id`	UUID	Owning subscription
`chain_profile_id`	string	Chain identifier (e.g., `celestia-mocha`)
`node_type`	string	Node type (`full`, `archive`)
`state`	NodeState	Infrastructure lifecycle state (owned by heartbeat evaluator)
`sync_status`	SyncStatus	Chain sync status (`syncing`, `synced`)
`application_health`	ApplicationHealth	Application health (owned by policy evaluator)
`provider`	string	Cloud provider (internal)
`instance_type`	string	VM instance type (internal)
`host_id`	string	Provider host identifier (internal)
`host_ip`	string	Host IP address (internal)
`region`	string	Cloud region (internal)
`last_heartbeat`	timestamp	Last agent heartbeat
`last_migration_triggered_at`	timestamp	Migration cooldown tracking

Field Ownership: All state mutations flow through NodeHealthMachine (internal/health/machine.go), which enforces transitions, uses optimistic locking, and emits events via health_event_outbox.

Field	Writer	Machine Method	Purpose
`state`	Heartbeat Evaluator	`ApplyHeartbeatDecisions()`	Infrastructure liveness
`application_health`	Policy Evaluator	`UpdateApplicationHealth()`	Application health (CEL policies)
`sync_status`	NATS sync events	`UpdateSyncStatus()`	Chain sync progress

State values:

state: provisioning, syncing, healthy, degraded, down, maintenance, terminating, terminated, failed
application_health: unknown, ok, degraded, critical

Display state: DisplayState() computes a combined state for API responses — infrastructure states take precedence, then application_health, then sync_status. Invariants:

Node belongs to exactly one subscription
Infrastructure fields (provider, host_id, host_ip) are internal-only
Terminal states (terminated, failed) have no outgoing transitions
Terminal transitions auto-reset sync_status and application_health to unknown
Migration cooldown prevents rapid re-triggering (default: 1h)

Node State Machine

Transition table (defined in models.validTransitions):

From	Valid targets
`provisioning`	syncing, terminating, failed
`syncing`	healthy, degraded, down, terminating
`healthy`	degraded, down, maintenance, terminating
`degraded`	healthy, down, maintenance, terminating
`down`	syncing, healthy, maintenance, terminating
`maintenance`	syncing, healthy, terminating
`terminating`	terminated, failed
`terminated`, `failed`	— (terminal)

API: CanTransitionTo(target), IsTerminal(), ValidSourceStates(target). Force updates via ForceUpdateState() bypass validation for compensation.

See also: Health and Incidents for how transitions are triggered.

Subscription

Location: internal/models/subscription.go

Field	Type	Source	Description
`id`	UUID	System	Unique identifier
`user_id`	string	Auth context	Owner
`chain_profile_id`	string	User input	Chain profile
`node_type`	string	User input	Node type
`provider`	string	System-resolved	Cloud provider (from duration mapping)
`region`	string	System-resolved	Deployment region
`instance_type`	string	System-resolved	Compute instance type
`duration`	string	User input	`1w`, `2w`, `1m`, `3m`, `6m`
`status`	SubscriptionStatus	System	Lifecycle state
`expires_at`	timestamp	System	Subscription period end
`grace_period_expires_at`	timestamp (nullable)	System	Renewal deadline after expiration
`payment_id`	string (nullable)	System	Payment service reference

User-provided: chain_profile_id, node_type, duration. System-resolved: provider, region, instance_type (from chain profile + duration mapping). API responses hide system-resolved fields.

Subscription State Machine

Status	Meaning
`pending_payment`	Created but unpaid. No nodes provisioned. PaymentID is null.
`active`	Paid and running. Nodes provisioned and serving.
`expiring`	Past `expires_at`, in 24h grace period. Nodes still running.
`terminated`	Grace period ended. All nodes terminated, backups deleted.

Invariants:

Infrastructure fields resolved at creation time (not user-provided)
pending_payment subscriptions deleted after TTL (default 30m)
Expiration triggers 24h grace period before termination
Abandoned pending_payment subscriptions have payment_id = null

See also: Workflows — Subscription Lifecycle for execution flow.

Incident

Location: internal/models/incident.go

Field	Type	Description
`id`	UUID	Unique identifier
`node_id`	UUID	Associated node
`subscription_id`	UUID	Associated subscription
`user_id`	UUID (nullable)	Resolved owner (notification routing)
`chain_profile_id`	string	Chain identifier
`category`	IncidentCategory	Incident type
`severity`	IncidentSeverity	Current severity
`status`	IncidentStatus	Lifecycle status
`title`	string	Human-readable title
`description`	string	Detailed description
`metadata`	map	Trigger-specific context
`occurrence_count`	int	Dedup counter
`first_seen_at`	timestamp	First occurrence
`last_seen_at`	timestamp	Most recent occurrence
`acknowledged_at`	timestamp (nullable)	User acknowledgment
`resolved_at`	timestamp (nullable)	Resolution time
`is_flapping`	bool	Flapping flag (persisted, survives restarts)
`resolution_debounce`	int	Per-incident debounce counter for auto-resolution

Categories:

Category	Trigger	Initial Severity
`node_down`	State transition to `down`	critical
`app_critical`	`application_health` -> `critical`	critical
`app_degraded`	`application_health` -> `degraded`	warning
`sync_stalled`	`sync_status` -> `stalled` (CEL pipeline)	warning
`provision_failed`	State -> `failed` from `provisioning`	critical
`migration_failed`	State -> `failed` from `maintenance`	critical

Incident Status Lifecycle: Dedup: Partial unique index on (node_id, category) WHERE status NOT IN ('resolved', 'auto_resolved'). Upsert increments occurrence_count, updates last_seen_at. No foreign keys: Incidents are historical records that survive node/subscription deletion.

See also: Health and Incidents for the full incident pipeline.

User

Location: internal/models/user.go

Field	Type	Description
`id`	UUID	Unique identifier
`wallet_address`	*string (nullable)	Blockchain wallet address (checksummed for Ethereum)
`email`	*string (nullable)	Email address
`wallet_public_key`	[]byte	Public key bytes from signature
`wallet_type`	*WalletType (nullable)	`ethereum`, `solana`
`external_auth_id`	*string (nullable)	Clerk `user_xxxx` ID

Identity combinations:

Scenario	`wallet_address`	`wallet_type`	`external_auth_id`
Clerk user (no wallet)	nil	nil	set
Clerk user + Ethereum wallet	set	`ethereum`	set
Clerk user + Solana wallet	set	`solana`	set

Invariants: Wallet address stored checksummed (EIP-55), lookups case-insensitive (lowercase index). Email optional and unique when provided. external_auth_id unique when not null. Created via Clerk webhook.

API Key

Location: internal/models/apikey.go, internal/models/apikey_scope.go

Field	Type	Description
`id`	UUID	Unique identifier
`user_id`	UUID	Owning user
`name`	string	Human-readable name (max 100 chars)
`key_hash`	string	SHA-256 hash (plaintext never stored)
`scopes`	string[]	Granted permissions
`expires_at`	timestamp (nullable)	Expiration
`revoked_at`	timestamp (nullable)	Revocation
`last_used_at`	timestamp (nullable)	Last validation
`deleted_at`	timestamp (nullable)	Soft-delete

Scopes: nodes:read, nodes:write, subscriptions:read, subscriptions:write, chains:read, keys:export, payments:read, payments:write, api-keys:manage, * (wildcard) Business rules: At least one scope required. Max 50 active keys per user. Min 1h expiration. Soft-deleted for audit trail. Rotation is atomic (new key + old revoked in one transaction). Plaintext returned only on creation/rotation.

Node Keys

Location: migrations/001_initial_schema.sql:88 Encryption hierarchy:

Master Key (Vault) --encrypts--> DEK (32-byte AES, stored encrypted) --encrypts--> Chain Key (mnemonic, stored encrypted)

User-provided secrets flow:

Client fetches X25519 public key (GET /api/v1/crypto/public-key)
 -> Client encrypts with NaCl sealed box (tweetnacl)
 -> Sealed ciphertext stored in provisioning_input_secrets
 -> PrepareUserProvidedKeys activity: decrypt with server X25519 private key (Vault)
 -> Re-encrypt with node DEK (AES-256-GCM) -> stored in node_keys

Invariants: Master key never leaves Vault. DEK in plaintext only in agent memory during operation. All keys deleted on subscription expiration.

Auth Nonce

Location: internal/models/auth.go — Single-use challenge for wallet verification. 64-char random, 5-min TTL.

Refresh Token

Location: internal/models/auth.go — SHA-256 hashed, 7-day TTL. Atomic rotation on refresh (old revoked, new created in one transaction). Multiple active tokens per user (different devices).

Rollout Group

Location: internal/models/rollout.go Coordinates multi-binary upgrades (e.g., ethereum-holesky: geth + lighthouse). Executes component rollouts in declared order.

Field	Type	Description
`id`	UUID	Unique identifier
`chain_profile_id`	string	Target chain
`status`	RolloutGroupStatus	Lifecycle state
`failure_policy`	GroupFailurePolicy	`partial_ok`, `rollback_all`, `manual` (REQUIRED, no default)
`component_order`	JSONB	Ordered array of `{binary_name, version, url, checksum, upgrade_id (optional for auto-population)}`
`desired_versions`	JSONB	Target state: `{binary_name: version}`
`strategy`	string	`rolling`, `canary`, `all_at_once`
`batch_size`	int	Nodes per batch
`canary_size`	int	Canary batch size
`failure_threshold`	float	Auto-pause threshold (0.0–1.0)
`health_wait_duration`	interval	Health gate duration between components
`created_by`	string	Operator identifier

Group status values: pending, running, paused, completed, partial, failed, cancelled, rolled_back Concurrency: Partial unique index on (chain_profile_id) WHERE status IN ('pending', 'running', 'paused') — one active group per chain. Standalone single-component rollouts do not require a group.

Rollout

Location: internal/models/rollout.go Single-component upgrade rollout. Tracks progress, strategy, and per-node status.

Field	Type	Description
`id`	UUID	Unique identifier
`group_id`	UUID (nullable)	Parent rollout group (null for standalone)
`component_index`	int	Ordering within group
`chain_profile_id`	string	Target chain
`binary_name`	string	Binary to upgrade
`source_version`	string	Current version
`target_version`	string	Target version
`target_binary_url`	string	Fully resolved artifact URL
`target_binary_checksum`	string	SHA256 checksum (REQUIRED)
`strategy`	RolloutStrategy	`rolling`, `canary`, `all_at_once`
`status`	RolloutStatus	Lifecycle state
`batch_size`	int	Nodes per batch
`failure_threshold`	float	Auto-pause threshold
`health_wait_duration`	interval	Post-batch health gate
`total_nodes`	int	Denormalized progress counter
`succeeded_nodes`	int	Denormalized progress counter
`failed_nodes`	int	Denormalized progress counter
`manifest_content_hash`	string	SHA256 of manifest at creation time

Rollout status values: pending, scheduled, running, paused, completed, failed, cancelled, rolled_back Concurrency: Partial unique index on (chain_profile_id) WHERE status IN ('pending', 'scheduled', 'running', 'paused').

Rollout Status State Machine

Rollout Node

Location: internal/models/rollout.go Per-node upgrade tracking within a rollout.

Field	Type	Description
`id`	UUID	Unique identifier
`rollout_id`	UUID	Parent rollout
`node_id`	UUID	Target node
`batch_index`	int	Assigned batch
`status`	NodeUpgradeStatus	Lifecycle state
`upgrade_phase`	UpgradePhase	Current execution phase (display state)
`previous_version`	string	Version before upgrade
`previous_binary_url`	string	URL before upgrade (for rollback)
`previous_checksum`	string	Checksum before upgrade
`error_message`	string	Failure details
`attempt_count`	int	Retry counter

Node upgrade status values: pending, in_progress, validating, succeeded, failed, rolled_back, skipped

UpgradePhase Enum

Display states derived from executor progress through the action sequence:

Phase	Meaning
`preparing`	Backup + artifact acquisition + verification
`stopped`	Service stopped
`mutating`	Artifact/config replacement in progress
`starting`	Service start issued
`verifying`	Health validation in progress
`completed`	Health gate passed
`rollbacking`	Compensation in progress
`rolled_back`	All compensations succeeded
`failed_rollback`	Compensation failed — manual intervention needed

NodeConfigState

Location: internal/models/node_config_state.go Tracks binary version per node. Pre-existing node_config_state table now has Go model and repository.

Field	Type	Description
`node_id`	UUID	Node reference
`binary_version`	string (nullable)	Currently installed binary version
`config_version`	string (nullable)	Currently applied config version

Written by UpdateNodeBinaryVersion activity after successful upgrade, and by provisioning workflow after initial CONFIGURE.

See also: Workflows — Upgrade Rollout Workflows for the full workflow hierarchy.

Business Rules

Key Management

Keys exist in plaintext only temporarily for node operation
Encrypted recovery backups only within subscription TTL
No human-accessible key material (no SSH, keys only via encrypted gRPC)
Recovery without backup is impossible by design
User-provided secrets use client-side NaCl sealed box encryption

Migration

Gradual detection: Consecutive failures tracked per node (default: 3 required for DOWN)
Cooldown: 1h between migration attempts (persisted in DB)
Auto-recovery: Heartbeat during DOWN grace period -> back to HEALTHY, counter reset
Steps: Verify backup -> provision new infra -> restore keys -> start node -> destroy old
Non-blocking: No user approval required

Parameter	Default	Purpose
HeartbeatTimeout	60s	Time without heartbeat to count as failure
ConsecutiveFailuresForDown	3	Failures before DOWN state
MigrationGracePeriod	60s	Wait after DOWN before migration
MigrationCooldown	1h	Min time between migrations

Subscription Lifecycle

Two-phase creation: POST /subscriptions creates with pending_payment. Idempotent — returns existing if same user/chain/nodeType/duration.
Pending cleanup: Abandoned pending_payment deleted after TTL (default 30m)
Expiration: active -> expiring (24h grace period). Nodes continue running.
Termination: Grace period expired -> terminate all nodes, delete backups -> terminated

Database Schema Summary

Main App Database

Table	Purpose
`users`	User identity (wallet, email, external_auth_id)
`refresh_tokens`	JWT refresh tokens (hashed)
`api_keys`	API authentication (scoped, soft-delete)
`subscriptions`	Subscription lifecycle (status, duration, payment_id, grace_period)
`nodes`	Node lifecycle (state, sync_status, application_health)
`node_health_state`	Per-node health tracking (consecutive_failures, down_since, version)
`health_event_outbox`	Health events pending dispatch
`incidents`	Health incidents (no foreign keys, survives deletion)
`notification_outbox`	Failed notifications queued for retry
`agent_registrations`	Agent connectivity (last_seen_at, running_command_ids)
`node_keys`	Encrypted key material (encrypted_dek, encrypted_chain_key)
`provisioning_inputs`	Provisioning input values (text/proof, subscription-scoped)
`provisioning_input_secrets`	User-provided secrets (sealed box ciphertext)
`node_tokens`	Internal authentication
`node_config_state`	Per-node version tracking (binary_version, config_version)
`rollout_groups`	Multi-component rollout coordination
`rollouts`	Single-component upgrade rollouts
`rollout_nodes`	Per-node upgrade tracking within a rollout

Payment Service Database

Separate PostgreSQL instance. See Payment Service for schema.

Data	Payment Service	Main App
User identity	Mirror (customer_id)	Primary (users)
Payment records	Primary	Reference only
Subscriptions	Reference only	Primary

Key Indexes

Index	Table	Purpose
`wallet_address` (LOWER, unique)	`users`	Case-insensitive lookup
`email` (unique where not null)	`users`	Email uniqueness
`external_auth_id` (partial unique)	`users`	External auth mapping
`token_hash` (unique)	`refresh_tokens`	Token validation
`state`	`nodes`	Health evaluation queries
`subscription_id`	`nodes`	Subscription cleanup
`last_seen_at`	`agent_registrations`	Heartbeat timeout
`expires_at`	`subscriptions`	Expiration cleanup
`(node_id, category) WHERE status NOT IN (resolved, auto_resolved)`	`incidents`	Dedup upsert
`payment_id WHERE status = 'pending_payment'`	`subscriptions`	Payment lookup
`created_at WHERE status = 'pending_payment'`	`subscriptions`	TTL cleanup
`(chain_profile_id) WHERE status IN active`	`rollout_groups`	One active group per chain
`(chain_profile_id) WHERE status IN active`	`rollouts`	One active rollout per chain
`(rollout_id, node_id) UNIQUE`	`rollout_nodes`	Dedup per rollout-node pair
`(rollout_id, status)`	`rollout_nodes`	Batch status queries

State Storage Locations

State	Storage	Access Pattern
Infrastructure state	PostgreSQL (`nodes.state`)	Written via NodeHealthMachine by heartbeat evaluator
Application health	PostgreSQL (`nodes.application_health`)	Written via NodeHealthMachine by policy evaluator
Health tracking	PostgreSQL (`node_health_state`)	Per-node: consecutive_failures, down_since, version
Health events	PostgreSQL (`health_event_outbox`)	Emitted by health machine, consumed by outbox worker
Incidents	PostgreSQL (`incidents`)	No foreign keys (survives deletion)
Notification retry	PostgreSQL (`notification_outbox`)	Exponential backoff retry
Encrypted keys	PostgreSQL (`node_keys`)	Written once, read for injection/migration
Key backups	S3	Created at provisioning, deleted at termination
Terraform state	Files (`/app/terraform-state/{node_id}/`)	Per-node, managed by Terraform
Pending commands	Redis queue	Pushed by orchestrator, consumed by agent
Command progress	Redis (`progress:{commandID}`)	Written by gateway, read by orchestrator
In-flight workflows	Temporal	Managed by Temporal server
Metrics	Victoria Metrics TSDB	30-day retention, queried by policy evaluator
Rollout state	PostgreSQL (`rollouts`, `rollout_nodes`)	Written by workflow activities
Version tracking	PostgreSQL (`node_config_state`)	Updated after successful upgrade/provision
Upgrade backup	Node filesystem (`{data_dir}/.upgrade-backup/{upgrade_id}/`)	Per-upgrade, retained 24h
Metrics transport	NATS JetStream	1-hour retention

Overview — System overview and service descriptions
Workflows — State transition triggers, execution flows
Health and Incidents — Health evaluation pipeline
Extending — Adding wallet types

​Domain Objects

​Node

​Node State Machine

​Subscription

​Subscription State Machine

​Incident

​User

​API Key

​Node Keys

​Auth Nonce

​Refresh Token

​Rollout Group

​Rollout

​Rollout Status State Machine

​Rollout Node

​UpgradePhase Enum

​NodeConfigState

​Business Rules

​Key Management

​Migration

​Subscription Lifecycle

​Database Schema Summary

​Main App Database

​Payment Service Database

​Key Indexes

​State Storage Locations

​Related Documents