Skip to main content

Adding a New Chain

No code changes required. HoodCloud is chain-agnostic — adding a chain requires only YAML configuration.

Required Files

FilePurpose
hoodcloud-chain-configs/chains/{chain}/profile.yamlChain identity, node types, resources, providers, sync data, binaries
hoodcloud-chain-configs/chains/{chain}/runtime.yamlNode status provider, ports, key paths
hoodcloud-chain-configs/recipes/{chain}/{nodeType}.yamlInstallation and sync recipe
hoodcloud-chain-configs/chains/{chain}/observation.yaml(Optional) Metric collectors and CEL health policies
hoodcloud-chain-configs/upgrades/{chain}/Upgrade manifests — one YAML per binary × node_type, defines target version, artifact URL, checksum, state compatibility

profile.yaml

Defines chain identity, node types with resource requirements, provider mappings, sync data, and binary versions.
chain_id: new-chain-testnet-1
display_name: New Chain Testnet
network_type: testnet
chain_type: new-chain

node_types:
  full:
    resources:
      cpu: 4
      memory_gb: 16
      disk_gb: 500
      disk_type: ssd
    default_providers: [hetzner]
    providers:
      hetzner:
        instance_type: cx32
        default_region: fsn1
        regions: [fsn1, nbg1, hel1]
    sync_data:
      snapshot_url: "https://snapshots.new-chain.io/testnet/latest.tar.lz4"

binaries:
  new-chain:
    version: v1.0.0
    url_template: "https://releases.new-chain.io/{{.Version}}/new-chain-linux-amd64"
    checksum: "sha256:abc123..."

runtime.yaml

Declares health check expressions, port mappings, and key paths.
schema_version: "1.0"
chain_profile_id: new-chain-testnet

node_status:
  provider: prometheus
  healthy_expression: 'up == 1'
  synced_expression: 'syncing == 0'
  metrics_port: 9090

ports:
  rpc: 26657
  p2p: 26656
  prometheus: 9090

observation.yaml (Optional)

Defines metric collectors and CEL health policies. Enables the observation system for this chain.
schema_version: "1.0"
chain_profile_id: new-chain-testnet

collectors:
  - type: prometheus
    endpoint: "http://localhost:{{.Ports.prometheus}}/metrics"
    scrape_interval: 15s
    metrics:
      - name: block_height
        source_name: tendermint_consensus_height

policies:
  healthy:
    up:
      expression: 'up == 1'
  synced:
    not_syncing:
      expression: 'syncing == 0'

Recipe

Recipes define installation and sync logic. Sync data from profile.yaml available via {{.SyncData.xxx}}. Use when conditionals to enable/disable steps. Key concepts:
  • Data in Profile, Logic in Recipe: profile.yaml holds sync URLs/config, recipes define what to do with them
  • when conditionals: Steps execute only when condition is truthy (non-empty string)
  • One recipe per node type: A single full.yaml handles multiple sync methods via conditionals
  • Progress monitoring: metadata.progress_monitor tracks long-running steps (snapshot downloads)
Recipe types:
  • Systemd-based: Binary download + systemd service (e.g., Cosmos chains)
  • Docker-based: docker_container step type with auto-HOME, restart policy, volume/port mapping
See existing recipes in hoodcloud-chain-configs/recipes/ for reference implementations.

Adding a New Collector Type

Package: internal/observation/collector/
  1. Implement Collector interface:
    type CustomCollector struct { endpoint string }
    
    func NewCustomCollector(config map[string]any) (Collector, error) { /* ... */ }
    
    func (c *CustomCollector) Collect(ctx context.Context) ([]*io_prometheus_client.MetricFamily, error) {
        // Return Prometheus metric families
    }
    
  2. Register in collector/registry.go inside NewRegistry():
    r.Register("custom", NewCustomCollector)
    
  3. Use in chain observation.yaml:
    collectors:
      - type: custom
        endpoint: "http://localhost:8080/custom-metrics"
    
Existing types: prometheus, http, script, otlp

Adding a Health Event Handler

Use case: React to node state transitions or dimension changes (custom alerting, audit logging, external webhook).
  1. Implement the handler interface:
    // For state transitions (healthy -> down, provisioning -> failed, etc.)
    func (h *MyHandler) OnTransition(ctx context.Context, event *contracts.HealthTransitionEvent) error {
        // event.PreviousState, event.NewState, event.Trigger, event.Metadata
        return nil
    }
    
    // For dimension changes (application_health, sync_status)
    func (h *MyHandler) OnDimensionChange(ctx context.Context, event *contracts.HealthDimensionEvent) error {
        // event.Dimension, event.OldValue, event.NewValue
        return nil
    }
    
  2. Register in bootstrap (internal/app/bootstrap/):
    transitionHandler := &contracts.CompositeTransitionHandler{
        Handlers: []contracts.HealthTransitionHandler{
            incidentService,
            migrationHandler,
            stateLogHandler,
            myHandler, // Add here
        },
    }
    
Semantics: Handler errors are logged but don’t block other handlers. Handlers must be idempotent (outbox guarantees at-least-once delivery).
See also: Health and Incidents — Health Event Outbox for the outbox processing model.

Adding a Runtime Adapter

Package: internal/opsagent/upgrade/adapters/ Runtime adapters implement HOW upgrade actions execute on a specific runtime. Adding a new runtime (e.g., LXC, Podman, Kubernetes) requires only a new adapter — zero changes to actions, executor, or workflows.
  1. Implement RuntimeAdapter interface in internal/opsagent/upgrade/adapters/{runtime}.go:
    type LXCAdapter struct {
        containerName string
        // runtime-specific fields
    }
    
    // Embed runtime.Runtime for Start/Stop/Restart/Status
    func (a *LXCAdapter) AcquireArtifact(ctx context.Context, url, checksum string) (ArtifactHandle, error) { /* ... */ }
    func (a *LXCAdapter) VerifyArtifact(ctx context.Context, handle ArtifactHandle, checksum string) error { /* ... */ }
    func (a *LXCAdapter) IsArtifactInstalled(ctx context.Context, name, checksum string) (bool, error) { /* ... */ }
    func (a *LXCAdapter) BackupState(ctx context.Context, backupDir string) error { /* ... */ }
    func (a *LXCAdapter) InstallArtifact(ctx context.Context, handle ArtifactHandle, target string) error { /* ... */ }
    func (a *LXCAdapter) WriteConfig(ctx context.Context, path string, content []byte) error { /* ... */ }
    func (a *LXCAdapter) IsConfigWritten(ctx context.Context, path string, expectedHash string) (bool, error) { /* ... */ }
    func (a *LXCAdapter) ReloadDaemon(ctx context.Context) error { /* ... */ }
    func (a *LXCAdapter) RestoreState(ctx context.Context, backupDir string) error { /* ... */ }
    func (a *LXCAdapter) IsArtifactAcquired(ctx context.Context, url, checksum string) (bool, error) { /* ... */ }
    func (a *LXCAdapter) IsBackedUp(ctx context.Context, backupDir string) (bool, error) { /* ... */ }
    
  2. Register in the adapter factory based on runtime.type from chain config.
  3. Configure chain profiles with runtime: { type: lxc }.
Existing adapters: Systemd (adapters/systemd.go), Docker (adapters/docker.go). Key design rules:
  • The adapter handles ALL runtime-specific logic — actions never branch on runtime type
  • AcquireArtifact returns an opaque ArtifactHandle (file path for systemd, image ref for Docker)
  • BackupState writes manifest.json last (atomic marker for IsBackedUp())
  • RestoreState must be safe to call during compensation (reverse-order rollback)
Chain-agnostic verification: Adding a new chain that uses an existing runtime (systemd/Docker) requires zero code changes. The same 8 action primitives and same executor work automatically — the adapter handles runtime specifics.
See also: Workflows — Agent Upgrade Execution for the three-layer architecture.

Adding an Upgrade Action Primitive

Package: internal/opsagent/upgrade/actions/ v1 uses compiled action composition (8 built-in actions). Adding a new action primitive:
  1. Implement UpgradeAction interface:
    type MigrateData struct{}
    
    func (a *MigrateData) Name() string { return "migrate_data" }
    func (a *MigrateData) Timeout() time.Duration { return 10 * time.Minute }
    func (a *MigrateData) Execute(ctx context.Context, rt RuntimeAdapter, p ActionParams) error { /* ... */ }
    func (a *MigrateData) IsAlreadyDone(ctx context.Context, rt RuntimeAdapter, p ActionParams) (bool, error) { /* ... */ }
    func (a *MigrateData) Compensate(ctx context.Context, rt RuntimeAdapter, p ActionParams) error { /* ... */ }
    
  2. Add to the action sequence in internal/opsagent/upgrade/sequences.go.
Existing actions: BackupCurrentState, AcquireArtifact, VerifyArtifact, StopNode, InstallArtifact, WriteConfigs, ReloadDaemon, StartNode. Key rules:
  • IsAlreadyDone() must check actual system state via the RuntimeAdapter — no shell commands
  • Compensate() must be safe to call in reverse order
  • Timeout() returns a per-action time budget enforced by the executor

Adding a New Workflow

Package: internal/workflows/
  1. Define workflow function:
    func BackupNodeWorkflow(ctx workflow.Context, req BackupRequest) error {
        err := workflow.ExecuteActivity(ctx, activities.SendStopCommand, req.NodeID).Get(ctx, nil)
        if err != nil {
            return err
        }
        // ... backup logic ...
        return workflow.ExecuteActivity(ctx, activities.StartNode, req.NodeID).Get(ctx, nil)
    }
    
  2. Register in orchestrator (cmd/orchestrator/main.go):
    w.RegisterWorkflow(workflows.BackupNodeWorkflow)
    
  3. Trigger from API or background job via temporalClient.ExecuteWorkflow().
Patterns: Activities are idempotent and retryable. Workflows survive process restarts. Use compensation for cleanup on failure.
See also: Workflows for existing workflow patterns and compensation.

Adding a New Cloud Provider

No code changes required. The Terraform module registry is self-describing.
  1. Create module directory: infrastructure/terraform/modules/node-host-{provider}/
  2. Create interface.yaml (self-describing module):
    name: node-host-aws
    description: AWS EC2 node provisioning module
    billing_model: hourly
    provisioning_timeout_minutes: 15
    
    required_env_vars:
      - name: AWS_ACCESS_KEY_ID
        description: AWS access key
    
    variables:
      host_id:
        type: string
        required: true
        category: core
        source: computed
      instance_type:
        type: string
        required: true
        category: core
        source: chain_profile
    
  3. Create Terraform files: main.tf, variables.tf, outputs.tf, cloud-init.yaml
  4. Store credentials in Vault: vault kv put secret/hoodcloud/providers/{provider} ...
  5. Update chain profiles to include the new provider in node_types.{type}.providers.
Discovery: Modules discovered automatically via interface.yaml. Paths derived from naming convention: {modules_dir}/node-host-{provider}. Credentials loaded from registry’s required_env_vars.

Adding a Notification Channel

Package: internal/incident/notifier/
  1. Implement contracts.Notifier:
    type PagerDutyChannel struct { /* config, client */ }
    
    func (c *PagerDutyChannel) Send(ctx context.Context, n contracts.Notification) error {
        // Map notification to PagerDuty event
        return nil
    }
    
    func (c *PagerDutyChannel) Name() string { return "pagerduty" }
    
  2. Register in bootstrap (internal/app/bootstrap/incident.go) alongside existing channels.
Existing channels: Slack (incoming webhook), Telegram (bot API), Email (generic HTTP API), Webhook (JSON POST with X-Webhook-Secret). Notification types: created, escalated, resolved, auto_resolved, flapping, correlated Built-in features: Rate limiting (per-node, per-user, global sliding window) and persistent retry via notification_outbox apply to all channels automatically.
See also: Health and Incidents — Notification Pipeline for the full dispatch architecture.

Adding a Payment Provider

Package: payment-service/internal/adapters/
  1. Create adapter package at payment-service/internal/adapters/<provider>/adapter.go
  2. Implement the Provider interface (see payment-service/internal/adapters/provider.go)
  3. Add provider config in payment-service/internal/config/config.go
  4. Register in provider factory in payment-service/cmd/payment-service/main.go
  5. Add webhook/event handler if the provider uses callbacks
Reference: Tempo adapter (payment-service/internal/adapters/tempo/) — blockchain event watcher pattern. Stripe adapter (payment-service/internal/adapters/stripe/) — webhook-based pattern.
See also: Payment Service — Multi-Provider Architecture for the provider interface and existing adapters.

Adding Wallet Types

Use case: Support a new blockchain’s wallet signatures (e.g., Cosmos, Polkadot).
  1. Add WalletType constant to internal/models/wallet.go:
    WalletTypeCosmos WalletType = "cosmos"
    
  2. Implement contracts.SignatureVerifier in internal/wallet/<chain>/verifier.go:
    type Verifier struct{}
    
    func (v *Verifier) VerifySignature(message, signature, address string) error { /* ... */ }
    func (v *Verifier) DeriveExportKey(signature string) ([]byte, error) { /* ... */ }
    func (v *Verifier) PublicKeyBytes(message, signature, address string) ([]byte, error) { /* ... */ }
    func (v *Verifier) WalletType() models.WalletType { return models.WalletTypeCosmos }
    
  3. Register in bootstrap (internal/app/bootstrap/services.go):
    return wallet.NewRegistry(
        &ethereum.Verifier{},
        &solana.Verifier{},
        &cosmos.Verifier{}, // Add here
    )
    
  4. Update address detection in internal/wallet/detect.go if auto-detection is needed.
Existing: Ethereum (secp256k1), Solana (Ed25519).
See also: CLAUDE.md — Chain-Agnostic Wallet Operations for the SignatureVerifierRegistry pattern.

Adding a Leader-Gated Component

Package: internal/health/leader.go (generic, reusable) The leader election implementation is not health-evaluator-specific. Any future singleton component can use it:
  1. Create a dedicated advisory lock connection via internal/app/bootstrap/leader_election.go:
    leaderConn, err := bootstrap.NewLeaderElectionConn(ctx, dbConfig)
    
  2. Create leader elector with a unique lock ID:
    elector := leader.NewElector(leaderConn, leader.Config{
        LockID:        12345, // unique per component
        RetryInterval: 5 * time.Second,
    })
    
  3. Gate goroutine loops using the leader context:
    go elector.Run(ctx, func(leaderCtx context.Context) {
        // This function runs only while this instance is the leader.
        // leaderCtx is canceled when leadership is lost.
        myWorker.Run(leaderCtx)
    })
    
Key rules:
  • The advisory lock connection MUST be a dedicated pgx.Conn, not from the pool
  • Each component needs a unique lock ID to avoid conflicting with the health evaluator’s lock
  • Subsystems that are already multi-instance safe (e.g., FOR UPDATE SKIP LOCKED) should NOT be leader-gated
See also: Health and Incidents — Leader Election for the health evaluator’s implementation.

Adding New API Endpoints

  1. Add handler method in internal/api/ (e.g., handler_node.go)
  2. Register route in internal/api/router.go
  3. Add service layer logic in internal/service/
  4. Apply scope enforcement via RequireScope() middleware
Middleware stack (applied automatically): Correlation ID, Tracing, HTTP Metrics, Logging, Authentication (DualAuthMiddleware), Rate Limiting, Scope Enforcement, CORS. Scope enforcement (internal/api/scope.go): Both JWT and API key requests are subject to scope checks via RequireScope(). Missing scope returns 403 Forbidden. The admin scope is backend-granted only and required for rollout endpoints and future admin-only operations.
See also: CLAUDE.md — Server Separation for endpoint lists and auth details. For authentication provider setup, see Clerk Setup.

Adding Tracing to New Components

Distributed tracing is automatic for HTTP, gRPC, database, and Temporal activities when OTEL_ENABLED=true. For custom spans:
tracer := otel.Tracer("hoodcloud/myservice")
ctx, span := tracer.Start(ctx, "ProcessSomething")
defer span.End()
span.SetAttributes(attribute.String("entity.id", id))
Propagate trace context to external services via otel.GetTextMapPropagator().Inject(ctx, ...).

RPC Adapters

Package: internal/adapters/rpc/ Generic JSON-RPC types and chain-specific response structures for health data collection.
FilePurpose
types.goJSONRPCResponse[T] — generic JSON-RPC 2.0 response with typed Result
ethereum.goEthSyncStatus, EthBlockNumberResponse — Ethereum eth_syncing/eth_blockNumber responses
cosmos.goCosmosSyncStatus, CosmosStatusResponse — Cosmos SDK /status response
Used by the observation collector system to parse chain-specific RPC responses into normalized metrics.