Extending HoodCloud

Adding a New Chain

No code changes required. HoodCloud is chain-agnostic — adding a chain requires only YAML configuration.

Required Files

File	Purpose
`hoodcloud-chain-configs/chains/{chain}/profile.yaml`	Chain identity, node types, resources, providers, sync data, binaries
`hoodcloud-chain-configs/chains/{chain}/runtime.yaml`	Node status provider, ports, key paths
`hoodcloud-chain-configs/recipes/{chain}/{nodeType}.yaml`	Installation and sync recipe
`hoodcloud-chain-configs/chains/{chain}/observation.yaml`	(Optional) Metric collectors and CEL health policies
`hoodcloud-chain-configs/upgrades/{chain}/`	Upgrade manifests — one YAML per binary × node_type, defines target version, artifact URL, checksum, state compatibility

profile.yaml

Defines chain identity, node types with resource requirements, provider mappings, sync data, and binary versions.

chain_id: new-chain-testnet-1
display_name: New Chain Testnet
network_type: testnet
chain_type: new-chain

node_types:
  full:
    resources:
      cpu: 4
      memory_gb: 16
      disk_gb: 500
      disk_type: ssd
    default_providers: [hetzner]
    providers:
      hetzner:
        instance_type: cx32
        default_region: fsn1
        regions: [fsn1, nbg1, hel1]
    sync_data:
      snapshot_url: "https://snapshots.new-chain.io/testnet/latest.tar.lz4"

binaries:
  new-chain:
    version: v1.0.0
    url_template: "https://releases.new-chain.io/{{.Version}}/new-chain-linux-amd64"
    checksum: "sha256:abc123..."

runtime.yaml

Declares health check expressions, port mappings, and key paths.

schema_version: "1.0"
chain_profile_id: new-chain-testnet

node_status:
  provider: prometheus
  healthy_expression: 'up == 1'
  synced_expression: 'syncing == 0'
  metrics_port: 9090

ports:
  rpc: 26657
  p2p: 26656
  prometheus: 9090

observation.yaml (Optional)

Defines metric collectors and CEL health policies. Enables the observation system for this chain.

schema_version: "1.0"
chain_profile_id: new-chain-testnet

collectors:
  - type: prometheus
    endpoint: "http://localhost:{{.Ports.prometheus}}/metrics"
    scrape_interval: 15s
    metrics:
      - name: block_height
        source_name: tendermint_consensus_height

policies:
  healthy:
    up:
      expression: 'up == 1'
  synced:
    not_syncing:
      expression: 'syncing == 0'

Recipe

Recipes define installation and sync logic. Sync data from profile.yaml available via {{.SyncData.xxx}}. Use when conditionals to enable/disable steps. Key concepts:

Data in Profile, Logic in Recipe: profile.yaml holds sync URLs/config, recipes define what to do with them
when conditionals: Steps execute only when condition is truthy (non-empty string)
One recipe per node type: A single full.yaml handles multiple sync methods via conditionals
Progress monitoring: metadata.progress_monitor tracks long-running steps (snapshot downloads)

Recipe types:

Systemd-based: Binary download + systemd service (e.g., Cosmos chains)
Docker-based: docker_container step type with auto-HOME, restart policy, volume/port mapping

See existing recipes in hoodcloud-chain-configs/recipes/ for reference implementations.

Adding a New Collector Type

Package: internal/observation/collector/

Implement Collector interface:

type CustomCollector struct { endpoint string }

func NewCustomCollector(config map[string]any) (Collector, error) { /* ... */ }

func (c *CustomCollector) Collect(ctx context.Context) ([]*io_prometheus_client.MetricFamily, error) {
    // Return Prometheus metric families
}

Register in collector/registry.go inside NewRegistry():
```
r.Register("custom", NewCustomCollector)
```

Use in chain observation.yaml:

collectors:
  - type: custom
    endpoint: "http://localhost:8080/custom-metrics"

Existing types: prometheus, http, script, otlp

Adding a Health Event Handler

Use case: React to node state transitions or dimension changes (custom alerting, audit logging, external webhook).

Implement the handler interface:

// For state transitions (healthy -> down, provisioning -> failed, etc.)
func (h *MyHandler) OnTransition(ctx context.Context, event *contracts.HealthTransitionEvent) error {
    // event.PreviousState, event.NewState, event.Trigger, event.Metadata
    return nil
}

// For dimension changes (application_health, sync_status)
func (h *MyHandler) OnDimensionChange(ctx context.Context, event *contracts.HealthDimensionEvent) error {
    // event.Dimension, event.OldValue, event.NewValue
    return nil
}

Register in bootstrap (internal/app/bootstrap/):

transitionHandler := &contracts.CompositeTransitionHandler{
    Handlers: []contracts.HealthTransitionHandler{
        incidentService,
        migrationHandler,
        stateLogHandler,
        myHandler, // Add here
    },
}

Semantics: Handler errors are logged but don’t block other handlers. Handlers must be idempotent (outbox guarantees at-least-once delivery).

See also: Health and Incidents — Health Event Outbox for the outbox processing model.

Adding a Runtime Adapter

Package: internal/opsagent/upgrade/adapters/ Runtime adapters implement HOW upgrade actions execute on a specific runtime. Adding a new runtime (e.g., LXC, Podman, Kubernetes) requires only a new adapter — zero changes to actions, executor, or workflows.

Implement RuntimeAdapter interface in internal/opsagent/upgrade/adapters/{runtime}.go:

type LXCAdapter struct {
    containerName string
    // runtime-specific fields
}

// Embed runtime.Runtime for Start/Stop/Restart/Status
func (a *LXCAdapter) AcquireArtifact(ctx context.Context, url, checksum string) (ArtifactHandle, error) { /* ... */ }
func (a *LXCAdapter) VerifyArtifact(ctx context.Context, handle ArtifactHandle, checksum string) error { /* ... */ }
func (a *LXCAdapter) IsArtifactInstalled(ctx context.Context, name, checksum string) (bool, error) { /* ... */ }
func (a *LXCAdapter) BackupState(ctx context.Context, backupDir string) error { /* ... */ }
func (a *LXCAdapter) InstallArtifact(ctx context.Context, handle ArtifactHandle, target string) error { /* ... */ }
func (a *LXCAdapter) WriteConfig(ctx context.Context, path string, content []byte) error { /* ... */ }
func (a *LXCAdapter) IsConfigWritten(ctx context.Context, path string, expectedHash string) (bool, error) { /* ... */ }
func (a *LXCAdapter) ReloadDaemon(ctx context.Context) error { /* ... */ }
func (a *LXCAdapter) RestoreState(ctx context.Context, backupDir string) error { /* ... */ }
func (a *LXCAdapter) IsArtifactAcquired(ctx context.Context, url, checksum string) (bool, error) { /* ... */ }
func (a *LXCAdapter) IsBackedUp(ctx context.Context, backupDir string) (bool, error) { /* ... */ }

Register in the adapter factory based on runtime.type from chain config.
Configure chain profiles with runtime: { type: lxc }.

Existing adapters: Systemd (adapters/systemd.go), Docker (adapters/docker.go). Key design rules:

The adapter handles ALL runtime-specific logic — actions never branch on runtime type
AcquireArtifact returns an opaque ArtifactHandle (file path for systemd, image ref for Docker)
BackupState writes manifest.json last (atomic marker for IsBackedUp())
RestoreState must be safe to call during compensation (reverse-order rollback)

Chain-agnostic verification: Adding a new chain that uses an existing runtime (systemd/Docker) requires zero code changes. The same 8 action primitives and same executor work automatically — the adapter handles runtime specifics.

See also: Workflows — Agent Upgrade Execution for the three-layer architecture.

Adding an Upgrade Action Primitive

Package: internal/opsagent/upgrade/actions/ v1 uses compiled action composition (8 built-in actions). Adding a new action primitive:

Implement UpgradeAction interface:

type MigrateData struct{}

func (a *MigrateData) Name() string { return "migrate_data" }
func (a *MigrateData) Timeout() time.Duration { return 10 * time.Minute }
func (a *MigrateData) Execute(ctx context.Context, rt RuntimeAdapter, p ActionParams) error { /* ... */ }
func (a *MigrateData) IsAlreadyDone(ctx context.Context, rt RuntimeAdapter, p ActionParams) (bool, error) { /* ... */ }
func (a *MigrateData) Compensate(ctx context.Context, rt RuntimeAdapter, p ActionParams) error { /* ... */ }

Add to the action sequence in internal/opsagent/upgrade/sequences.go.

Existing actions: BackupCurrentState, AcquireArtifact, VerifyArtifact, StopNode, InstallArtifact, WriteConfigs, ReloadDaemon, StartNode. Key rules:

IsAlreadyDone() must check actual system state via the RuntimeAdapter — no shell commands
Compensate() must be safe to call in reverse order
Timeout() returns a per-action time budget enforced by the executor

Adding a New Workflow

Package: internal/workflows/

Define workflow function:

func BackupNodeWorkflow(ctx workflow.Context, req BackupRequest) error {
    err := workflow.ExecuteActivity(ctx, activities.SendStopCommand, req.NodeID).Get(ctx, nil)
    if err != nil {
        return err
    }
    // ... backup logic ...
    return workflow.ExecuteActivity(ctx, activities.StartNode, req.NodeID).Get(ctx, nil)
}

Register in orchestrator (cmd/orchestrator/main.go):
```
w.RegisterWorkflow(workflows.BackupNodeWorkflow)
```
Trigger from API or background job via temporalClient.ExecuteWorkflow().

Patterns: Activities are idempotent and retryable. Workflows survive process restarts. Use compensation for cleanup on failure.

See also: Workflows for existing workflow patterns and compensation.

Adding a New Cloud Provider

No code changes required. The Terraform module registry is self-describing.

Create module directory: infrastructure/terraform/modules/node-host-{provider}/

Create interface.yaml (self-describing module):

name: node-host-aws
description: AWS EC2 node provisioning module
billing_model: hourly
provisioning_timeout_minutes: 15

required_env_vars:
  - name: AWS_ACCESS_KEY_ID
    description: AWS access key

variables:
  host_id:
    type: string
    required: true
    category: core
    source: computed
  instance_type:
    type: string
    required: true
    category: core
    source: chain_profile

Create Terraform files: main.tf, variables.tf, outputs.tf, cloud-init.yaml
Store credentials in Vault: vault kv put secret/hoodcloud/providers/{provider} ...
Update chain profiles to include the new provider in node_types.{type}.providers.

Discovery: Modules discovered automatically via interface.yaml. Paths derived from naming convention: {modules_dir}/node-host-{provider}. Credentials loaded from registry’s required_env_vars.

Adding a Notification Channel

Package: internal/incident/notifier/

Implement contracts.Notifier:

type PagerDutyChannel struct { /* config, client */ }

func (c *PagerDutyChannel) Send(ctx context.Context, n contracts.Notification) error {
    // Map notification to PagerDuty event
    return nil
}

func (c *PagerDutyChannel) Name() string { return "pagerduty" }

Register in bootstrap (internal/app/bootstrap/incident.go) alongside existing channels.

Existing channels: Slack (incoming webhook), Telegram (bot API), Email (generic HTTP API), Webhook (JSON POST with X-Webhook-Secret). Notification types: created, escalated, resolved, auto_resolved, flapping, correlated Built-in features: Rate limiting (per-node, per-user, global sliding window) and persistent retry via notification_outbox apply to all channels automatically.

See also: Health and Incidents — Notification Pipeline for the full dispatch architecture.

Adding a Payment Provider

Package: payment-service/internal/adapters/

Create adapter package at payment-service/internal/adapters/<provider>/adapter.go
Implement the Provider interface (see payment-service/internal/adapters/provider.go)
Add provider config in payment-service/internal/config/config.go
Register in provider factory in payment-service/cmd/payment-service/main.go
Add webhook/event handler if the provider uses callbacks

Reference: Tempo adapter (payment-service/internal/adapters/tempo/) — blockchain event watcher pattern. Stripe adapter (payment-service/internal/adapters/stripe/) — webhook-based pattern.

See also: Payment Service — Multi-Provider Architecture for the provider interface and existing adapters.

Adding Wallet Types

Use case: Support a new blockchain’s wallet signatures (e.g., Cosmos, Polkadot).

Add WalletType constant to internal/models/wallet.go:
```
WalletTypeCosmos WalletType = "cosmos"
```

Implement contracts.SignatureVerifier in internal/wallet/<chain>/verifier.go:

type Verifier struct{}

func (v *Verifier) VerifySignature(message, signature, address string) error { /* ... */ }
func (v *Verifier) DeriveExportKey(signature string) ([]byte, error) { /* ... */ }
func (v *Verifier) PublicKeyBytes(message, signature, address string) ([]byte, error) { /* ... */ }
func (v *Verifier) WalletType() models.WalletType { return models.WalletTypeCosmos }

Register in bootstrap (internal/app/bootstrap/services.go):

return wallet.NewRegistry(
    &ethereum.Verifier{},
    &solana.Verifier{},
    &cosmos.Verifier{}, // Add here
)

Update address detection in internal/wallet/detect.go if auto-detection is needed.

Existing: Ethereum (secp256k1), Solana (Ed25519).

See also: CLAUDE.md — Chain-Agnostic Wallet Operations for the SignatureVerifierRegistry pattern.

Adding a Leader-Gated Component

Package: internal/health/leader.go (generic, reusable) The leader election implementation is not health-evaluator-specific. Any future singleton component can use it:

Create a dedicated advisory lock connection via internal/app/bootstrap/leader_election.go:
```
leaderConn, err := bootstrap.NewLeaderElectionConn(ctx, dbConfig)
```

Create leader elector with a unique lock ID:

elector := leader.NewElector(leaderConn, leader.Config{
    LockID:        12345, // unique per component
    RetryInterval: 5 * time.Second,
})

Gate goroutine loops using the leader context:

go elector.Run(ctx, func(leaderCtx context.Context) {
    // This function runs only while this instance is the leader.
    // leaderCtx is canceled when leadership is lost.
    myWorker.Run(leaderCtx)
})

Key rules:

The advisory lock connection MUST be a dedicated pgx.Conn, not from the pool
Each component needs a unique lock ID to avoid conflicting with the health evaluator’s lock
Subsystems that are already multi-instance safe (e.g., FOR UPDATE SKIP LOCKED) should NOT be leader-gated

See also: Health and Incidents — Leader Election for the health evaluator’s implementation.

Adding New API Endpoints

Add handler method in internal/api/ (e.g., handler_node.go)
Register route in internal/api/router.go
Add service layer logic in internal/service/
Apply scope enforcement via RequireScope() middleware

Middleware stack (applied automatically): Correlation ID, Tracing, HTTP Metrics, Logging, Authentication (DualAuthMiddleware), Rate Limiting, Scope Enforcement, CORS. Scope enforcement (internal/api/scope.go): Both JWT and API key requests are subject to scope checks via RequireScope(). Missing scope returns 403 Forbidden. The admin scope is backend-granted only and required for rollout endpoints and future admin-only operations.

See also: CLAUDE.md — Server Separation for endpoint lists and auth details. For authentication provider setup, see Clerk Setup.

Adding Tracing to New Components

Distributed tracing is automatic for HTTP, gRPC, database, and Temporal activities when OTEL_ENABLED=true. For custom spans:

tracer := otel.Tracer("hoodcloud/myservice")
ctx, span := tracer.Start(ctx, "ProcessSomething")
defer span.End()
span.SetAttributes(attribute.String("entity.id", id))

Propagate trace context to external services via otel.GetTextMapPropagator().Inject(ctx, ...).

RPC Adapters

Package: internal/adapters/rpc/ Generic JSON-RPC types and chain-specific response structures for health data collection.

File	Purpose
`types.go`	`JSONRPCResponse[T]` — generic JSON-RPC 2.0 response with typed `Result`
`ethereum.go`	`EthSyncStatus`, `EthBlockNumberResponse` — Ethereum `eth_syncing`/`eth_blockNumber` responses
`cosmos.go`	`CosmosSyncStatus`, `CosmosStatusResponse` — Cosmos SDK `/status` response

Used by the observation collector system to parse chain-specific RPC responses into normalized metrics.

Overview — System overview, service descriptions
Domain Model — State machines, business rules
Workflows — Temporal workflows, provisioning inputs
Health and Incidents — Health pipeline, incidents, notifications, cleanup
Payment Service — Payment provider architecture
Developer Guide — Error handling patterns, debugging

​Adding a New Chain

​Required Files

​profile.yaml

​runtime.yaml

​observation.yaml (Optional)

​Recipe

​Adding a New Collector Type

​Adding a Health Event Handler

​Adding a Runtime Adapter

​Adding an Upgrade Action Primitive

​Adding a New Workflow

​Adding a New Cloud Provider

​Adding a Notification Channel

​Adding a Payment Provider

​Adding Wallet Types

​Adding a Leader-Gated Component

​Adding New API Endpoints

​Adding Tracing to New Components

​RPC Adapters

​Related Documents