Skip to main content

What is HoodCloud?

HoodCloud is a fully managed blockchain node infrastructure service implemented in Go. It provisions, monitors, and maintains non-validator blockchain nodes (full nodes, archive nodes, indexers) without requiring users to manage infrastructure, software upgrades, or monitoring. Primary Responsibilities:
  • Provision blockchain nodes on cloud infrastructure (Hetzner, OVH, extensible via module registry)
  • Generate and securely manage cryptographic key material
  • Monitor node health and automatically trigger migrations on failure
  • Apply configuration changes and software updates via declarative recipes
  • Apply targeted binary and config upgrades via rollout orchestration
  • Terminate nodes and clean up resources when subscriptions expire
What This Is NOT:
  • Not a validator service
  • Not a public RPC endpoint provider
  • Not a self-service platform (admin-controlled provisioning in v1)
  • Not highly available by default (single-host nodes in v1)

Application Type

HoodCloud is a distributed control plane system consisting of six services:
ServiceTransportPortPurpose
API ServerHTTP8080Business logic — nodes, subscriptions, chains, payments, key export
Auth ServerHTTP8081Identity, authentication, wallet registration, API keys
Agent GatewaygRPC9090Ops-agent communication (heartbeat, commands, events)
OrchestratorTemporal workerDurable workflow execution (provision, migrate, terminate, upgrade rollout)
Health EvaluatorBackground daemon9090 (metrics)Health evaluation, incidents, notifications, cleanup (leader + standby)
Migration RunnerOne-shot CLIDatabase schema migrations (cmd/migrate), runs as pre-deploy step
Ops AgentgRPC clientOn-host agent (one per node VM)
A separate Payment Service handles payment processing as an isolated microservice. See Payment Service Architecture.

Technology Stack

ComponentTechnology
LanguageGo 1.25
Workflow EngineTemporal (durable workflows)
DatabasePostgreSQL (node state, subscriptions, keys, users; per-service connection pools and statement timeouts)
Payment DatabasePostgreSQL (separate instance, payment records)
Cache/QueueRedis (command queue, progress storage, idempotency)
Event StreamingNATS JetStream (agent events, metrics transport, payment events; 3-node cluster, R=3 in production)
RPC ProtocolgRPC with Protocol Buffers
Service-to-Service AuthmTLS (TLS 1.3, mutual certificate verification)
InfrastructureTerraform (declarative provisioning via module registry)
EncryptionAES-256-GCM (node key material), NaCl sealed box (user-provided secrets)
AuthenticationClerk (external auth provider), JWT (RS256)
Node HostingHetzner, OVH (VPS, Public Cloud, Dedicated) — extensible
Cloud ServicesAWS (S3), Secrets (HashiCorp Vault), Vault AWS Credential Provider
Metrics StorageVictoria Metrics TSDB
Policy EngineCEL (Common Expression Language)
Distributed TracingOpenTelemetry SDK -> OTel Collector -> Grafana Tempo
Status PageGatus (health monitoring dashboard)

High-Level System Architecture

Provisioning Workflow Sequence

Security Boundaries

Keys exist in plaintext only temporarily for node operation. No permanent key retention — keys and backups are deleted on subscription expiration. No SSH access to nodes; all management via ops-agent. User-provided secrets are client-side encrypted (NaCl sealed box).
See also: CLAUDE.md — Security Boundaries for the canonical security rules.

Service Descriptions

API Server (cmd/api-server/)

Purpose: Business logic — nodes, subscriptions, chains, payments, key export.
  • Entry point: cmd/api-server/main.go (thin, delegates to internal/app/bootstrap/api_server.go)
  • Port: 8080 (HTTP)
  • Dependencies: PostgreSQL, Redis, Temporal, Vault, NATS (optional for payment consumer)
  • Auth: DualAuthMiddleware — JWT (primary) + API key (programmatic). See internal/api/middleware.go
  • Key packages: internal/api/ (handlers, routing, middleware), internal/service/ (business logic)
Startup: Load config -> Init telemetry -> Init secrets (Vault) -> Connect PostgreSQL -> Init repos -> Connect Temporal -> Init chain config -> Init services -> Connect Redis -> Start HTTP server -> Wait for signal -> Graceful shutdown.
Note: Database migrations are handled by the dedicated cmd/migrate binary as a pre-deploy step, not at service startup.
See also: CLAUDE.md — Server Separation for endpoint lists.

Auth Server (cmd/auth-server/)

Purpose: Identity, authentication, wallet registration, API key management.
  • Entry point: cmd/auth-server/main.go (thin, delegates to internal/app/bootstrap/auth_server.go)
  • Port: 8081 (HTTP), 9094 (metrics)
  • Dependencies: PostgreSQL, Vault (JWT keys). Minimal — no Temporal, no Redis
  • Key packages: internal/auth/ (service, handlers, JWT), internal/authprovider/ (Clerk adapter)
Key features:
  • Clerk webhook endpoint (POST /webhooks/clerk) for user lifecycle sync
  • JWT session management (RS256, 15m access / 7d refresh, atomic rotation)
  • Chain-agnostic wallet registration via SignatureVerifierRegistry
  • API key CRUD and rotation
  • IP-based rate limiting (20 req/min, Redis-backed for global enforcement across instances)
See also: Clerk Setup for operational configuration.

Agent Gateway (cmd/agent-gateway/)

Purpose: gRPC endpoint for ops-agent communication.
  • Entry point: cmd/agent-gateway/main.go
  • Port: 9090 (gRPC), 9091 (metrics HTTP)
  • Dependencies: PostgreSQL, Redis, NATS
  • Key packages: internal/grpc/ (server), internal/commandqueue/ (Redis queue + progress)
Responsibilities:
  • Agent registration and heartbeat processing
  • Command queue delivery (Redis -> agent via heartbeat response)
  • DEK retrieval for key decryption
  • Progress tracking for long-running commands (Redis progress:{commandID})
  • Event forwarding to NATS

Orchestrator (cmd/orchestrator/)

Purpose: Temporal workflow worker — executes provision, migrate, and terminate workflows.
  • Entry point: cmd/orchestrator/main.go
  • Dependencies: PostgreSQL, Redis, Temporal, Vault, Terraform, S3
  • Key packages: internal/workflows/ (workflow definitions), internal/activities/ (activity implementations), internal/terraform/ (infrastructure provisioning)
Registered workflows: ProvisionNodeWorkflow, MigrateNodeWorkflow, TerminateNodeWorkflow, RolloutGroupWorkflow, RolloutWorkflow, UpgradeNodeWorkflow Startup: Load config -> Init telemetry -> Init secrets -> Connect PostgreSQL -> Init repos + crypto -> Init chain config + Terraform -> Connect Redis -> Connect Temporal -> Create worker -> Register workflows + activities -> Start worker -> Wait for signal.

Health Evaluator (cmd/health-evaluator/)

Purpose: Background daemon for health evaluation, incident management, notifications, and cleanup.
  • Entry point: cmd/health-evaluator/main.go
  • Port: 9090 (metrics HTTP)
  • Dependencies: PostgreSQL, Temporal (for migration triggers), NATS (event subscription), S3 (backup cleanup), Victoria Metrics (metrics queries)
  • Key packages: internal/health/ (machine, evaluator, outbox, cleanup), internal/incident/ (service, notifier), internal/observation/ (policy evaluation, metrics ingestion), internal/uptime/ (state log handler, uptime worker)
Leader election: Uses PostgreSQL advisory lock-based leader election. One active leader, N-1 hot standby instances. 9 of 11 goroutine loops are leader-gated; 2 run on all instances (outbox worker, metrics ingester). Leader election uses a dedicated pgx.Conn (not pooled) for advisory lock persistence. On leader failure, a standby acquires the lock within one evaluation interval (15-30s). Subsystems run concurrently:
SubsystemLeader-gated?Description
Heartbeat evaluatorYES30s cycle, batch evaluation via 3-way JOIN snapshot
Policy evaluatorYES60s cycle, CEL policy evaluation against Victoria Metrics
Outbox workerNOPolls health_event_outbox, dispatches to handlers (FOR UPDATE SKIP LOCKED — multi-instance safe)
Incident pipelineYESIncident service + notification dispatcher (Slack, Telegram, Email, Webhook)
Uptime workerYES5min cycle, materializes hourly uptime buckets from state transition log
Metrics ingesterNONATS -> Victoria Metrics (idempotent writes)
Subscription cleanupYESExpiration, grace period, pending payment TTL
Backup cleanupYESS3 orphaned backup removal
Terraform cleanupYESOrphaned state directory removal
Maintenance cleanupYESStuck maintenance node recovery
See also: Health and Incidents for the full pipeline architecture.

Ops Agent (cmd/ops-agent/)

Purpose: Lightweight on-host agent for node lifecycle management.
  • Entry point: cmd/ops-agent/main.go
  • Runs on: Each node VM (installed via cloud-init)
  • Communication: gRPC client -> Agent Gateway, NATS publisher for metrics
  • Key packages: internal/opsagent/ (agent core, commands, recipes, config, state tracking, observation, upgrade)
Responsibilities:
  • Lifecycle control (start/stop/restart node process via systemd)
  • Upgrade execution via three-layer architecture (actions, runtime adapters, executor)
  • Configuration application via declarative recipes (hoodcloud-chain-configs/recipes/)
  • System and chain metric collection via observation runner
  • Sync status tracking (events forwarded via gRPC -> NATS)
  • Key injection (encrypted key material decrypted with DEK in memory)
  • Progress monitoring for long-running operations (snapshot downloads)
  • Self-update mechanism
Startup: Load config -> Create metrics collector -> Create agent -> Init state tracker -> Init observation runner (if observation.yaml exists) -> Start gRPC server -> Register with control plane -> Fetch DEK -> Start state tracking + observation + heartbeat loops -> Wait for signal -> Stop node -> Clear DEK -> Shutdown.

Key Packages

PackagePurpose
internal/contracts/Canonical interfaces (repositories, services, notifiers)
internal/models/Domain model types (Node, Subscription, User, Incident, etc.)
internal/database/PostgreSQL repositories (auth/, ops/, user/ domains)
internal/service/Business logic services (NodeService, SubscriptionService, etc.)
internal/crypto/AES-256-GCM encryption, NaCl sealed box
internal/vault/HashiCorp Vault client (AppRole, Transit, PKI, circuit breaker)
internal/chains/Chain profile loading (local filesystem or S3 with version polling)
internal/upgrade/manifest/Upgrade manifest reader (YAML manifests from chain config directory)
internal/provision/input/Provisioning input framework (schema, validation, storage)
internal/wallet/Chain-agnostic wallet verification (Ethereum, Solana)
internal/consumers/NATS consumers (payment events, idempotency)
internal/observation/Metrics collection, transport, CEL policy evaluation
internal/uptime/Rolling uptime calculation (state log handler, uptime worker)
internal/terraform/Terraform execution and self-describing module registry
internal/health/leader.goPostgreSQL advisory lock leader election (generic, reusable)
internal/app/bootstrap/leader_election.goLeader election bootstrap pattern
internal/correlation/Correlation ID propagation
internal/telemetry/OpenTelemetry distributed tracing
Architecture: Operations: