Skip to main content
See also: Deployment & Operations for local dev and single-server staging.
From-scratch guide for deploying the full HoodCloud stack across two servers.

Server Topology

ServerRoleServices
Control PlaneOrchestration, API, Auth, Monitoringdocker-compose.yml (18+ containers)
Payment ServiceIsolated payment processingdocker-compose.payment.yml (2 containers)
Communication between servers:
  • gRPC + mTLS (control plane -> payment service, port 50051)
  • NATS JetStream (payment service -> control plane, port 4223)

Prerequisites

  • Two servers: Ubuntu 22.04+, 4 vCPU / 8GB RAM minimum each
  • DNS records pointing to control plane IP: api., auth., grafana., status. subdomains
  • AWS account with IAM user (S3, DynamoDB)
  • Hetzner Cloud API token
  • GitHub repo access

Part 1: AWS Resources

S3 Buckets

# Key backups
aws s3 mb s3://hoodcloud-prod-keys --region eu-central-1

# Terraform state (with versioning)
aws s3 mb s3://hoodcloud-terraform-state --region eu-central-1
aws s3api put-bucket-versioning \
  --bucket hoodcloud-terraform-state \
  --versioning-configuration Status=Enabled

# Chain configs
aws s3 mb s3://hoodcloud-chain-configs --region eu-central-1

# Block public access + enable encryption on all buckets
for bucket in hoodcloud-prod-keys hoodcloud-terraform-state hoodcloud-chain-configs; do
  aws s3api put-public-access-block --bucket $bucket \
    --public-access-block-configuration \
    'BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true'
  aws s3api put-bucket-encryption --bucket $bucket \
    --server-side-encryption-configuration \
    '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
done

DynamoDB (Terraform Locks)

aws dynamodb create-table \
  --table-name hoodcloud-terraform-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region eu-central-1

IAM Policy

envsubst < infrastructure/iam/hoodcloud-minimal-policy.json > /tmp/policy.json
aws iam create-user --user-name hoodcloud
aws iam create-policy --policy-name hoodcloud-minimal --policy-document file:///tmp/policy.json
aws iam attach-user-policy --user-name hoodcloud \
  --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/hoodcloud-minimal
aws iam create-access-key --user-name hoodcloud
Save the access key credentials for .env configuration. See infrastructure/iam/README.md for full documentation. GitHub OIDC for Chain Configs (optional): For automated chain config releases via GitHub Actions, set up OIDC federation. See Deployment & Operations - Release Chain Configs for the release workflow.
# Create OIDC provider
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1

# Create role + S3 policy (replace ACCOUNT_ID and YOUR_ORG)
# See infrastructure/iam/README.md for trust policy template

Part 1b: Vault

Vault is the secrets provider for all application secrets.
Full guide: Vault Operations covers setup, initialization, secret population, and day-to-day operations.
Quick sequence:
  1. Generate TLS certs: VAULT_SERVER_IP=<ip> ./scripts/generate-vault-certs.sh
  2. Deploy Vault: docker compose -f docker-compose.vault.yml --profile vault-prod up -d vault
  3. Initialize: vault operator init -key-shares=5 -key-threshold=3
  4. Unseal (3 of 5 keys)
  5. Run init script: ./scripts/vault-init.sh
  6. Populate secrets: ./scripts/vault-migrate-secrets.sh --manual
  7. Store Cloudflare API token: vault kv put secret/infra/certbot/cloudflare api_token=<cloudflare-api-token>
  8. Deploy AppRole credentials to control plane
  9. Revoke root token

Part 2: Control Plane Server

2.1 Server Setup

apt update && apt upgrade -y
curl -fsSL https://get.docker.com | sh
systemctl enable docker && systemctl start docker
apt install docker-compose-plugin -y

2.2 Clone Repository

git clone https://github.com/hoodrunio/hoodcloud /opt/hoodcloud
cd /opt/hoodcloud && git checkout main

2.3 Generate Certificates

# Ops-agent gRPC certs
./scripts/generate-certs.sh ./certs/prod production

# Payment mTLS certs (run on any machine with openssl)
./scripts/generate-payment-certs.sh ./certs/payment <payment-server-ip>
Deploy payment client certs to control plane:
mkdir -p /opt/hoodcloud/secrets/payment
cp certs/payment/{ca.crt,client.crt,client.key} /opt/hoodcloud/secrets/payment/
chmod 644 /opt/hoodcloud/secrets/payment/*
Note: Key files need 644 permissions because containers run as non-root users. The host directory (/opt/hoodcloud/secrets/, owned by root) provides access control.

2.4 Deploy Vault Credentials

mkdir -p /opt/hoodcloud/secrets
# Deploy AppRole secret_id (from Vault server or admin machine)
vault write -field=secret_id -f auth/approle/role/hoodcloud-control-plane/secret-id | \
  ssh root@<control-plane-ip> 'cat > /opt/hoodcloud/secrets/vault-secret-id && chmod 644 /opt/hoodcloud/secrets/vault-secret-id'

# Copy Vault CA cert
scp infrastructure/docker/config/vault/tls/vault-ca.crt root@<control-plane-ip>:/opt/hoodcloud/secrets/vault-ca.crt

# Deploy certbot-nats AppRole credentials (for NATS TLS certificate renewal)
echo "<certbot-nats-role-id>" > /opt/hoodcloud/secrets/certbot-role-id
vault write -f -field=secret_id auth/approle/role/certbot-nats/secret-id > /opt/hoodcloud/secrets/certbot-secret-id
chmod 0400 /opt/hoodcloud/secrets/certbot-role-id
chmod 0400 /opt/hoodcloud/secrets/certbot-secret-id

2.5 Configure Environment

cd /opt/hoodcloud/infrastructure/docker
cp .env.production.example .env
Edit .env with production values. The file covers: AWS credentials, domain names, database, NATS, authentication (Clerk Setup), chain configs, Terraform state, Vault (Vault Operations), and payment service mTLS.
Full variable reference with defaults and descriptions: Environment Variables

2.5b NATS TLS (Let’s Encrypt)

NATS uses Let’s Encrypt TLS via certbot DNS-01 challenge (Cloudflare). The nats.hoodcloud.io Docker network alias is configured in docker-compose.yml so internal services verify the certificate using the same hostname as the LE certificate’s CN. This is why NATS_URL=tls://nats.hoodcloud.io:4222 works both internally (via Docker network alias) and externally (via DNS). Install certbot:
apt-get install -y certbot python3-certbot-dns-cloudflare
Deploy scripts from repo to server:
mkdir -p /opt/hoodcloud/scripts
cp infrastructure/docker/scripts/certbot-renew.sh /opt/hoodcloud/scripts/
cp infrastructure/docker/scripts/nats-cert-deploy.sh /opt/hoodcloud/scripts/
chmod 0700 /opt/hoodcloud/scripts/certbot-renew.sh
chmod 0700 /opt/hoodcloud/scripts/nats-cert-deploy.sh
Issue initial certificate:
/opt/hoodcloud/scripts/certbot-renew.sh
Create deploy hook symlink:
ln -sf /opt/hoodcloud/scripts/nats-cert-deploy.sh /etc/letsencrypt/renewal-hooks/deploy/nats-reload.sh
Override certbot systemd timer to use Vault wrapper:
mkdir -p /etc/systemd/system/certbot.service.d
cat > /etc/systemd/system/certbot.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/opt/hoodcloud/scripts/certbot-renew.sh
EOF
systemctl daemon-reload
Verify:
openssl x509 -noout -dates -in /opt/hoodcloud/nats-tls/fullchain.pem

2.6 Deploy Control Plane

cd /opt/hoodcloud/infrastructure/docker

docker compose build

# Infrastructure
docker compose up -d postgres redis nats
sleep 10

# Workflow engine
docker compose up -d temporal temporal-init temporal-ui
sleep 15

# Run migrations BEFORE starting application services
docker compose run --rm migrate

# Observability
docker compose up -d victoria-metrics prometheus loki tempo otel-collector grafana alertmanager

# Application
docker compose up -d auth-server api-server agent-gateway orchestrator health-evaluator

# Reverse proxy + status page
docker compose up -d caddy gatus
Important: The cmd/migrate step must run before application services. It is idempotent and exits non-zero on failure. If migration fails, do NOT start application services — see Runbooks for recovery.

2.7 Verify Control Plane

docker compose ps                                           # All containers running
curl -s https://api.hoodcloud.io/health | jq .             # {"status":"healthy"}
curl -s https://auth.hoodcloud.io/health | jq .            # {"status":"healthy"}

# Payment client connected
docker logs hoodcloud-api-server 2>&1 | grep "Payment service client initialized"
# Expected: "Payment service client initialized" address=<ip>:50051

# Vault integration
docker compose logs api-server | grep "Vault secrets provider initialized"

Part 3: Payment Service Server

3.1 Server Setup

apt update && apt upgrade -y
curl -fsSL https://get.docker.com | sh
systemctl enable docker && systemctl start docker
apt install docker-compose-plugin -y

3.2 Clone & Deploy Certs

git clone https://github.com/hoodrunio/hoodcloud /opt/hoodcloud
cd /opt/hoodcloud && git checkout main

# Copy server certs from the machine where generate-payment-certs.sh ran:
mkdir -p /opt/hoodcloud/payment-certs
scp certs/payment/{ca.crt,server.crt,server.key} root@<payment-ip>:/opt/hoodcloud/payment-certs/
chmod 644 /opt/hoodcloud/payment-certs/*

3.3 Configure Environment

cd /opt/hoodcloud/infrastructure/docker
cat > .env << 'EOF'
# Database
PAYMENT_DB_USER=payment
PAYMENT_DB_PASSWORD=GENERATE_STRONG_PASSWORD
PAYMENT_DB_NAME=payment
DB_SSL_MODE=require

# NATS (connect to control plane — JWT operator mode)
NATS_URL=tls://nats.hoodcloud.io:4223
NATS_STREAM_NAME=PAYMENTS
NATS_CTRL_ACCOUNT_PUB=<CTRL account public key from nats-jwt-setup>
# NATS_CTRL_SIGNING_SEED loaded from Vault (nats_ctrl_signing_seed)

# mTLS
TLS_INSECURE=false
TLS_CERT_FILE=/certs/server.crt
TLS_KEY_FILE=/certs/server.key
TLS_CA_FILE=/certs/ca.crt
TLS_ALLOWED_CN=main-app.hoodcloud.internal

# Stripe (secrets from Vault at secret/payment-service/credentials)
STRIPE_ENABLED=true

# Domain (for Caddy reverse proxy)
DOMAIN_PAY=pay.hoodcloud.io
EOF

3.4 Deploy

docker compose -f docker-compose.payment.yml up -d --build

3.5 Verify

docker compose -f docker-compose.payment.yml ps
curl -s http://localhost:8085/health

# mTLS active
docker logs hoodcloud-payment-service 2>&1 | grep -E "mTLS|gRPC"
# Expected: "gRPC server configured with mTLS"

# NATS connected
docker logs hoodcloud-payment-service 2>&1 | grep "Connected to NATS"

Part 4: End-to-End Verification

mTLS Handshake

From the control plane server:
echo | openssl s_client \
  -connect <payment-ip>:50051 \
  -servername payment-service.hoodcloud.internal \
  -CAfile /opt/hoodcloud/secrets/payment/ca.crt \
  -cert /opt/hoodcloud/secrets/payment/client.crt \
  -key /opt/hoodcloud/secrets/payment/client.key \
  2>&1 | grep -E "Verify|TLS|Cipher"
# Expected: TLSv1.3, Verify return code: 0 (ok)

Service Health Matrix

# Control Plane
for svc in api-server auth-server agent-gateway orchestrator health-evaluator; do
  echo -n "$svc: "
  docker inspect hoodcloud-$svc --format='{{.State.Health.Status}}' 2>/dev/null || echo "running"
done

# Payment Service (on payment server)
for svc in payment-service postgres-payment; do
  echo -n "$svc: "
  docker inspect hoodcloud-$svc --format='{{.State.Health.Status}}' 2>/dev/null || echo "running"
done
Pre-deploy checklist: Deployment Checklist

Port Reference

Control Plane

PortBindServiceAccess
80, 4430.0.0.0CaddyPublic
90900.0.0.0Agent Gateway (gRPC)Public (mTLS)
42230.0.0.0NATS (external)Payment service
8080127.0.0.1API ServerInternal
8082127.0.0.1Auth ServerInternal
5432127.0.0.1PostgreSQLInternal
6379127.0.0.1RedisInternal
7233127.0.0.1TemporalInternal
8088127.0.0.1Temporal UIInternal
4222127.0.0.1NATS (internal, TLS)Internal
3000127.0.0.1GrafanaInternal (via Caddy)
8428127.0.0.1Victoria MetricsInternal
9091127.0.0.1PrometheusInternal
3100127.0.0.1LokiInternal
8081127.0.0.1GatusInternal (via Caddy)

Payment Service

PortBindServiceAccess
500510.0.0.0gRPC Server (mTLS)Control plane only
8085127.0.0.1HTTP health checkInternal
5433127.0.0.1PostgreSQL (payment)Internal

Stripe Setup

1. Configure Stripe Dashboard

Create a webhook endpoint in Stripe:
  • URL: https://pay.hoodcloud.io/webhooks/stripe
  • Events: checkout.session.completed, checkout.session.async_payment_succeeded, checkout.session.async_payment_failed, checkout.session.expired

2. Store Secrets in Vault

./scripts/vault-migrate-secrets.sh --manual
# Prompts for: stripe_secret_key, stripe_webhook_secret
# Stored at: secret/payment-service/credentials

3. Enable and Restart

# In payment .env: STRIPE_ENABLED=true
docker compose -f docker-compose.payment.yml up -d --force-recreate payment-service
docker logs hoodcloud-payment-service 2>&1 | grep -i stripe

Operations

Update Deployment

# On both servers
cd /opt/hoodcloud && git pull origin main

# Control plane
cd infrastructure/docker
docker compose build
docker compose run --rm migrate          # Run migrations first
docker compose up -d api-server           # or any changed service

# Payment service
docker compose -f docker-compose.payment.yml up -d --build payment-service
Important: Always run cmd/migrate before restarting services after a code update. Migrations are idempotent.

NATS TLS Certificate Renewal

NATS TLS certificates (Let’s Encrypt, 90-day validity) renew automatically via the certbot systemd timer, which is overridden to use the Vault-backed wrapper script. Automatic renewal: The certbot systemd timer runs twice daily and calls /opt/hoodcloud/scripts/certbot-renew.sh. The wrapper authenticates to Vault via AppRole, fetches the Cloudflare API token, and runs certbot renew. If a renewal occurs, the deploy hook copies certs and sends SIGHUP to NATS (zero-downtime reload). Manual force renewal (testing):
/opt/hoodcloud/scripts/certbot-renew.sh --force-renewal
Verify cert dates:
openssl x509 -noout -dates -in /opt/hoodcloud/nats-tls/fullchain.pem

Payment mTLS Certificate Renewal

Payment mTLS certificates expire after 90 days:
# Generate new certs
./scripts/generate-payment-certs.sh ./certs/payment <payment-ip>

# Deploy server certs to payment service
scp certs/payment/{ca.crt,server.crt,server.key} root@<payment-ip>:/opt/hoodcloud/payment-certs/
ssh root@<payment-ip> 'chmod 644 /opt/hoodcloud/payment-certs/* && docker restart hoodcloud-payment-service'

# Deploy client certs to control plane
scp certs/payment/{ca.crt,client.crt,client.key} root@<control-plane-ip>:/opt/hoodcloud/secrets/payment/
ssh root@<control-plane-ip> 'chmod 644 /opt/hoodcloud/secrets/payment/* && cd /opt/hoodcloud/infrastructure/docker && docker compose restart api-server'

Troubleshooting

Payment service: “permission denied”

chmod 644 /opt/hoodcloud/payment-certs/*       # payment server
chmod 644 /opt/hoodcloud/secrets/payment/*      # control plane

Payment service: NATS “authorization violation”

Verify NATS JWT credentials are configured correctly:
  1. NATS_CTRL_ACCOUNT_PUB must match the CTRL account public key from nats-jwt-setup
  2. The signing seed in Vault (nats_ctrl_signing_seed) must be valid
  3. NATS server must be running in JWT operator mode with the correct operator/account JWTs

mTLS handshake fails

Compare CA fingerprints on both servers:
openssl x509 -fingerprint -noout -in <path-to-ca.crt>

Temporal workflows stuck

docker compose logs temporal | tail -20
# SSH tunnel to Temporal UI:
ssh -L 8088:localhost:8088 root@<control-plane-ip>
See Vault Troubleshooting.