See also: Deployment & Operations for local dev and single-server staging.From-scratch guide for deploying the full HoodCloud stack across two servers.
Server Topology
| Server | Role | Services |
|---|---|---|
| Control Plane | Orchestration, API, Auth, Monitoring | docker-compose.yml (18+ containers) |
| Payment Service | Isolated payment processing | docker-compose.payment.yml (2 containers) |
- gRPC + mTLS (control plane -> payment service, port 50051)
- NATS JetStream (payment service -> control plane, port 4223)
Prerequisites
- Two servers: Ubuntu 22.04+, 4 vCPU / 8GB RAM minimum each
- DNS records pointing to control plane IP:
api.,auth.,grafana.,status.subdomains - AWS account with IAM user (S3, DynamoDB)
- Hetzner Cloud API token
- GitHub repo access
Part 1: AWS Resources
S3 Buckets
DynamoDB (Terraform Locks)
IAM Policy
.env configuration. See infrastructure/iam/README.md for full documentation.
GitHub OIDC for Chain Configs (optional):
For automated chain config releases via GitHub Actions, set up OIDC federation. See Deployment & Operations - Release Chain Configs for the release workflow.
Part 1b: Vault
Vault is the secrets provider for all application secrets.Full guide: Vault Operations covers setup, initialization, secret population, and day-to-day operations.Quick sequence:
- Generate TLS certs:
VAULT_SERVER_IP=<ip> ./scripts/generate-vault-certs.sh - Deploy Vault:
docker compose -f docker-compose.vault.yml --profile vault-prod up -d vault - Initialize:
vault operator init -key-shares=5 -key-threshold=3 - Unseal (3 of 5 keys)
- Run init script:
./scripts/vault-init.sh - Populate secrets:
./scripts/vault-migrate-secrets.sh --manual - Store Cloudflare API token:
vault kv put secret/infra/certbot/cloudflare api_token=<cloudflare-api-token> - Deploy AppRole credentials to control plane
- Revoke root token
Part 2: Control Plane Server
2.1 Server Setup
2.2 Clone Repository
2.3 Generate Certificates
Note: Key files need644permissions because containers run as non-root users. The host directory (/opt/hoodcloud/secrets/, owned by root) provides access control.
2.4 Deploy Vault Credentials
2.5 Configure Environment
.env with production values. The file covers: AWS credentials, domain names, database, NATS, authentication (Clerk Setup), chain configs, Terraform state, Vault (Vault Operations), and payment service mTLS.
Full variable reference with defaults and descriptions: Environment Variables
2.5b NATS TLS (Let’s Encrypt)
NATS uses Let’s Encrypt TLS via certbot DNS-01 challenge (Cloudflare). Thenats.hoodcloud.io Docker network alias is configured in docker-compose.yml so internal services verify the certificate using the same hostname as the LE certificate’s CN. This is why NATS_URL=tls://nats.hoodcloud.io:4222 works both internally (via Docker network alias) and externally (via DNS).
Install certbot:
2.6 Deploy Control Plane
Important: The cmd/migrate step must run before application services. It is idempotent and exits non-zero on failure. If migration fails, do NOT start application services — see Runbooks for recovery.
2.7 Verify Control Plane
Part 3: Payment Service Server
3.1 Server Setup
3.2 Clone & Deploy Certs
3.3 Configure Environment
3.4 Deploy
3.5 Verify
Part 4: End-to-End Verification
mTLS Handshake
From the control plane server:Service Health Matrix
Pre-deploy checklist: Deployment Checklist
Port Reference
Control Plane
| Port | Bind | Service | Access |
|---|---|---|---|
| 80, 443 | 0.0.0.0 | Caddy | Public |
| 9090 | 0.0.0.0 | Agent Gateway (gRPC) | Public (mTLS) |
| 4223 | 0.0.0.0 | NATS (external) | Payment service |
| 8080 | 127.0.0.1 | API Server | Internal |
| 8082 | 127.0.0.1 | Auth Server | Internal |
| 5432 | 127.0.0.1 | PostgreSQL | Internal |
| 6379 | 127.0.0.1 | Redis | Internal |
| 7233 | 127.0.0.1 | Temporal | Internal |
| 8088 | 127.0.0.1 | Temporal UI | Internal |
| 4222 | 127.0.0.1 | NATS (internal, TLS) | Internal |
| 3000 | 127.0.0.1 | Grafana | Internal (via Caddy) |
| 8428 | 127.0.0.1 | Victoria Metrics | Internal |
| 9091 | 127.0.0.1 | Prometheus | Internal |
| 3100 | 127.0.0.1 | Loki | Internal |
| 8081 | 127.0.0.1 | Gatus | Internal (via Caddy) |
Payment Service
| Port | Bind | Service | Access |
|---|---|---|---|
| 50051 | 0.0.0.0 | gRPC Server (mTLS) | Control plane only |
| 8085 | 127.0.0.1 | HTTP health check | Internal |
| 5433 | 127.0.0.1 | PostgreSQL (payment) | Internal |
Stripe Setup
1. Configure Stripe Dashboard
Create a webhook endpoint in Stripe:- URL:
https://pay.hoodcloud.io/webhooks/stripe - Events:
checkout.session.completed,checkout.session.async_payment_succeeded,checkout.session.async_payment_failed,checkout.session.expired
2. Store Secrets in Vault
3. Enable and Restart
Operations
Update Deployment
Important: Always run cmd/migrate before restarting services after a code update. Migrations are idempotent.
NATS TLS Certificate Renewal
NATS TLS certificates (Let’s Encrypt, 90-day validity) renew automatically via the certbot systemd timer, which is overridden to use the Vault-backed wrapper script. Automatic renewal: The certbot systemd timer runs twice daily and calls/opt/hoodcloud/scripts/certbot-renew.sh. The wrapper authenticates to Vault via AppRole, fetches the Cloudflare API token, and runs certbot renew. If a renewal occurs, the deploy hook copies certs and sends SIGHUP to NATS (zero-downtime reload).
Manual force renewal (testing):
Payment mTLS Certificate Renewal
Payment mTLS certificates expire after 90 days:Troubleshooting
Payment service: “permission denied”
Payment service: NATS “authorization violation”
Verify NATS JWT credentials are configured correctly:NATS_CTRL_ACCOUNT_PUBmust match the CTRL account public key fromnats-jwt-setup- The signing seed in Vault (
nats_ctrl_signing_seed) must be valid - NATS server must be running in JWT operator mode with the correct operator/account JWTs