Skip to main content
HoodCloud uses HashiCorp Vault for secrets management.

Overview

Vault provides:
  • KV v2 Engine: Application secrets (DB passwords, API keys, JWT RSA keys)
  • Transit Engine: DEK encryption without exposing master key
  • AppRole Auth: Machine-to-machine authentication
  • Audit Logging: Complete access audit trail

Architecture

Local Development

docker compose -f docker-compose.dev.yml up handles Vault automatically. No manual setup needed. The dev compose includes:
  1. vault — Dev mode (in-memory, no TLS, token: dev-token, http://localhost:8200)
  2. vault-init — Runs scripts/vault-dev-init.sh to seed engines, policies, AppRole, keys
Shared volume pattern: the sidecar writes AppRole credentials to /vault/config/. Services mount this volume read-only and use an entrypoint wrapper:
entrypoint: ["/bin/sh", "-c", "export VAULT_ROLE_ID=$(cat /vault/config/role-id) && exec ./service-name"]
E2E tests (tests/e2e/docker-compose.e2e.yml) use the same script on port 8201.

Production Setup

Prerequisites

  • Docker and Docker Compose v2+
  • OpenSSL (TLS cert generation)
  • jq (migration script)

1. Generate TLS Certificates

VAULT_SERVER_IP=<vault-server-ip> ./scripts/generate-vault-certs.sh
Creates in infrastructure/docker/config/vault/tls/:
  • vault-ca.crt — CA certificate (distribute to clients)
  • vault-ca.key — CA private key
  • vault.crt — Vault server certificate (includes IP in SAN)
  • vault.key — Vault server private key

2. Prepare Data Volumes

cd infrastructure/docker
docker network create hoodcloud 2>/dev/null || true
docker run --rm \
    -v hoodcloud-vault-data:/vault/data \
    -v hoodcloud-vault-logs:/vault/logs \
    hashicorp/vault:1.15 chown -R vault:vault /vault/data /vault/logs

3. Deploy Vault

# Default: bind to 127.0.0.1:8200 (local only)
docker compose -f docker-compose.vault.yml --profile vault-prod up -d vault

# Remote access (for separate control plane server):
VAULT_BIND_ADDR=0.0.0.0 docker compose -f docker-compose.vault.yml --profile vault-prod up -d vault
Persist VAULT_BIND_ADDR=0.0.0.0 in .env for remote access.
Note: The command must be server (no -config= flag). The Docker entrypoint adds -config=/vault/config automatically. Explicit -config causes “address already in use”.

4. Configure Firewall

ufw allow ssh
ufw allow from <control-plane-ip> to any port 8200 comment "Control plane -> Vault"
ufw allow from <admin-ip> to any port 8200 comment "Admin -> Vault"
ufw enable

5. Initialize Vault

docker exec -it hoodcloud-vault vault operator init -key-shares=5 -key-threshold=3
Save the output: 5 unseal keys + root token.
# Unseal (3 of 5 keys)
docker exec -it hoodcloud-vault vault operator unseal <key-1>
docker exec -it hoodcloud-vault vault operator unseal <key-2>
docker exec -it hoodcloud-vault vault operator unseal <key-3>

6. Run Initialization Script

export VAULT_ADDR=https://<vault-server-ip>:8200
export VAULT_CACERT=infrastructure/docker/config/vault/tls/vault-ca.crt
export VAULT_TOKEN=<root-token>

./scripts/vault-init.sh
Creates:
  • KV v2 at secret/, Transit engine with hoodcloud-master key
  • AppRole roles: hoodcloud-control-plane, hoodcloud-auth-server, hoodcloud-payment-service, hoodcloud-admin
  • Policies for each role
  • Audit logging at /vault/logs/audit.log
  • Admin AppRole credentials (printed at the end — save these)

7. Revoke Root Token

vault token revoke <root-token>
The admin AppRole replaces the root token for day-to-day management. To regenerate a root token (emergency):
vault operator generate-root -init
vault operator generate-root  # provide unseal keys

8. Generate JWT RS256 Keys

Generate the RSA keypair for JWT signing:
openssl genrsa -out jwt_private.pem 2048
openssl rsa -in jwt_private.pem -pubout -out jwt_public.pem

9. Generate X25519 Sealed Box Keypair

Generate the NaCl sealed box keypair for user-provided secret encryption. These are mandatory — api-server and orchestrator fail fast at startup without them.
// Use crypto.GenerateSealedBoxKeyPair() from internal/crypto/sealedbox.go
pub, priv, err := box.GenerateKey(rand.Reader)
fmt.Println("sealed_box_public_key=" + base64.StdEncoding.EncodeToString(pub[:]))
fmt.Println("sealed_box_private_key=" + base64.StdEncoding.EncodeToString(priv[:]))
Note: These must be a mathematically related X25519 keypair, not random bytes. For local dev, vault-dev-init.sh seeds them automatically.

10. Populate Secrets

vault kv put secret/hoodcloud/app-credentials \
    db_password="GENERATE_STRONG_PASSWORD" \
    redis_password="GENERATE_STRONG_PASSWORD" \
    jwt_private_key=@jwt_private.pem \
    jwt_public_key=@jwt_public.pem \
    nats_operator_signing_seed="<operator signing seed from nats-jwt-setup>" \
    nats_agent_account_signing_seed="<AGENT account signing seed from nats-jwt-setup>" \
    nats_ctrl_account_signing_seed="<CTRL account signing seed from nats-jwt-setup>" \
    nats_agent_account_pub="<AGENT account public key from nats-jwt-setup>" \
    nats_ctrl_account_pub="<CTRL account public key from nats-jwt-setup>" \
    grpc_config_signing_key="base64-encoded-ecdsa-key" \
    clerk_secret_key="sk_live_..." \
    clerk_webhook_signing_secret="whsec_..." \
    sealed_box_public_key="<base64-X25519-public-key>" \
    sealed_box_private_key="<base64-X25519-private-key>" \
    incident_slack_webhook_url="https://hooks.slack.com/services/T.../B.../..." \
    incident_telegram_bot_token="123456:ABC-..." \
    incident_telegram_chat_id="-100..." \
    incident_email_api_url="https://api.sendgrid.com/v3/mail/send" \
    incident_email_api_key="SG.xxxx" \
    incident_email_from="[email protected]" \
    incident_email_to="[email protected]" \
    incident_webhook_url="https://hooks.example.com/incidents" \
    incident_webhook_secret="whsec_..." \
    terraform_env_vars='{"HCLOUD_TOKEN":"your-token","OVH_ENDPOINT":"ovh-eu"}' \
    provider_env_vars='{"ovh_subsidiary":"EU"}'

vault kv put secret/hoodcloud/master-key key="$(openssl rand -base64 32)"

# Cloudflare API token for NATS TLS certificate renewal (DNS-01 challenge)
# Token must have Zone:DNS:Edit + Zone:Zone:Read permissions
vault kv put secret/infra/certbot/cloudflare api_token=<cloudflare-api-token>
Important: terraform_env_vars and provider_env_vars must be JSON objects, not JSON strings. The application deserializes them as map[string]string.
Incident notification fields are optional. If none are configured, incidents are tracked in the database but no external notifications are sent. Or use the interactive migration script:
./scripts/vault-migrate-secrets.sh --manual

11. Deploy AppRole Credentials

# Get role_id (static, reuse across deployments)
ROLE_ID=$(vault read -field=role_id auth/approle/role/hoodcloud-control-plane/role-id)

# Generate and deploy secret_id to control plane
vault write -field=secret_id -f auth/approle/role/hoodcloud-control-plane/secret-id | \
    ssh root@<control-plane-ip> 'mkdir -p /opt/hoodcloud/secrets && cat > /opt/hoodcloud/secrets/vault-secret-id && chmod 644 /opt/hoodcloud/secrets/vault-secret-id'
Note: Use chmod 644 — containers run as non-root. The volume is mounted :ro.

12. Configure Control Plane Services

Copy the Vault CA certificate:
scp infrastructure/docker/config/vault/tls/vault-ca.crt root@<control-plane-ip>:/opt/hoodcloud/secrets/vault-ca.crt
Add to .env:
VAULT_ADDR=https://<vault-server-ip>:8200
VAULT_ROLE_ID=<role-id>
VAULT_SECRET_ID_PATH=/secrets/vault-secret-id
VAULT_CACHE_TTL=5m
VAULT_MASTER_KEY_PATH=hoodcloud/master-key
VAULT_APP_CREDENTIALS_PATH=hoodcloud/app-credentials
VAULT_TLS_CA_FILE=/secrets/vault-ca.crt
All five services (api-server, auth-server, agent-gateway, orchestrator, health-evaluator) need:
  • Vault env vars in docker-compose.yml
  • Volume mount: /opt/hoodcloud/secrets:/secrets:ro
Recreate services:
docker compose up -d api-server auth-server orchestrator health-evaluator agent-gateway
Important: Use up -d, not restart. restart does not reload .env changes.

13. Save Credentials and Clean Up

Store securely (password manager):
CredentialStorage
5 unseal keysDistribute to separate trusted people
Admin AppRole role_id + secret_idPassword manager
Vault CA certificate (vault-ca.crt)Safe to keep in repo (gitignored keys are not committed)
Delete from Vault server:
ssh root@<vault-server-ip> 'rm -f /root/vault-init-keys.json'

Secret Structure

secret/hoodcloud/
├── app-credentials          # Application secrets
│   ├── db_password
│   ├── redis_password
│   ├── jwt_private_key      # RSA private key PEM (auth-server only)
│   ├── jwt_public_key       # RSA public key PEM (all services)
│   ├── nats_operator_signing_seed
│   ├── nats_agent_account_signing_seed
│   ├── nats_ctrl_account_signing_seed
│   ├── nats_agent_account_pub
│   ├── nats_ctrl_account_pub
│   ├── grpc_config_signing_key
│   ├── clerk_secret_key
│   ├── clerk_webhook_signing_secret
│   ├── sealed_box_public_key   # base64 X25519 (32 bytes, mandatory)
│   ├── sealed_box_private_key  # base64 X25519 (32 bytes, mandatory)
│   ├── incident_slack_webhook_url      # optional
│   ├── incident_telegram_bot_token     # optional
│   ├── incident_telegram_chat_id       # optional
│   ├── incident_email_api_url          # optional
│   ├── incident_email_api_key          # optional
│   ├── incident_email_from             # optional
│   ├── incident_email_to               # optional
│   ├── incident_webhook_url            # optional
│   ├── incident_webhook_secret         # optional
│   ├── payment_client_cert  # PEM (optional, mTLS)
│   ├── payment_client_key   # PEM (optional, mTLS)
│   ├── payment_ca_cert      # PEM (optional, mTLS)
│   ├── terraform_env_vars   # JSON object: {"HCLOUD_TOKEN": "..."}
│   └── provider_env_vars    # JSON object: {"ovh_subsidiary": "EU"}
├── master-key               # Master encryption key
│   └── key
└── providers/               # Cloud provider credentials
    ├── hetzner/
    │   └── HCLOUD_TOKEN
    └── ovh/
        ├── OVH_APPLICATION_KEY
        ├── OVH_APPLICATION_SECRET
        └── OVH_CONSUMER_KEY

secret/infra/
└── certbot/
    └── cloudflare               # Cloudflare API token for DNS-01 challenge
        └── api_token            # Zone:DNS:Edit + Zone:Zone:Read scoped

secret/payment-service/
└── credentials              # Payment service secrets
    ├── db_password
    ├── redis_password
    ├── stripe_secret_key
    ├── stripe_webhook_secret
    └── nats_ctrl_signing_seed   # CTRL account signing seed (for NATS JWT auth)

transit/keys/
└── hoodcloud-master         # Transit encryption key (DEK wrapping)
Note: The payment service’s nats_ctrl_account_pub (CTRL account public key) is not stored in Vault. It comes from the NATS_CTRL_ACCOUNT_PUB environment variable or YAML config. Only the signing seed is a secret and must be in Vault.

Configuration

All Vault-related environment variables (control plane and payment service) are documented in Environment Variables.
Note: The payment service uses VAULT_ADDRESS (not VAULT_ADDR). See Environment Variables - Payment Service Vault for details.

Features

Circuit Breaker

StateBehavior
ClosedNormal operation, requests go to Vault
OpenVault unreachable, return cached values
Half-OpenTest request to check recovery
  • Failure threshold: 3 consecutive failures
  • Recovery timeout: 30s (exponential backoff up to 5 min)

Secret Caching

  • Default TTL: 5 minutes (VAULT_CACHE_TTL)
  • Expired cache used as fallback when circuit is open
  • Cache cleared on service restart

Token Renewal

  • Renewal attempted at 75% of TTL
  • On failure, re-authenticates with AppRole credentials

Credential Expiry Monitoring

hoodcloud_credential_expiry_warning{credential_type="vault_token"} == 1
Warnings fire when < 25% of the original TTL remains. Configure alerts to schedule service restarts before expiry.

Operations

Unseal After Restart

Vault seals on every restart. Unseal with 3 of 5 keys:
docker exec -it hoodcloud-vault vault operator unseal <key-1>
docker exec -it hoodcloud-vault vault operator unseal <key-2>
docker exec -it hoodcloud-vault vault operator unseal <key-3>
While sealed, services use cached secrets (circuit breaker). If cache expires before unseal, services fail.

Authenticate as Admin

After root token is revoked:
export VAULT_ADDR=https://<vault-server-ip>:8200
export VAULT_CACERT=infrastructure/docker/config/vault/tls/vault-ca.crt

VAULT_TOKEN=$(vault write -field=token auth/approle/login \
    role_id=<admin-role-id> \
    secret_id=<admin-secret-id>)
export VAULT_TOKEN

vault kv list secret/hoodcloud/

Rotate Secrets

vault kv put secret/hoodcloud/app-credentials \
    db_password="new-password" \
    # ... all other fields (kv put replaces the entire secret)

# Restart services to pick up new values
docker compose restart api-server auth-server orchestrator health-evaluator agent-gateway

Rotate AppRole Secret ID

vault write -field=secret_id -f auth/approle/role/hoodcloud-control-plane/secret-id \
    > /opt/hoodcloud/secrets/vault-secret-id
chmod 644 /opt/hoodcloud/secrets/vault-secret-id

docker compose restart api-server auth-server orchestrator health-evaluator agent-gateway

View Audit Logs

docker exec hoodcloud-vault tail -f /vault/logs/audit.log | jq 'select(.request.path | startswith("secret/"))'

# Failed auth attempts
docker exec hoodcloud-vault grep -i "permission denied" /vault/logs/audit.log | jq .

Verify Secret Storage

# Verify terraform_env_vars is stored as JSON object (not string)
vault kv get -format=json secret/hoodcloud/app-credentials | \
    jq '.data.data.terraform_env_vars | type'
# Should output: "object"

Troubleshooting

Service Cannot Authenticate

Error: authenticate with Vault: permission denied
  1. Check Vault status: vault status (sealed?)
  2. Verify role exists: vault read auth/approle/role/hoodcloud-control-plane
  3. Regenerate secret_id:
    vault write -field=secret_id -f auth/approle/role/hoodcloud-control-plane/secret-id \
        > /opt/hoodcloud/secrets/vault-secret-id
    chmod 644 /opt/hoodcloud/secrets/vault-secret-id
    

Secret Not Found

vault kv get secret/hoodcloud/app-credentials  # Verify it exists
echo $VAULT_APP_CREDENTIALS_PATH                # Check path matches

Circuit Breaker Open

Warn: Vault circuit breaker open, using cached value
curl -sk ${VAULT_ADDR}/v1/sys/health | jq .
vault status  # If sealed, unseal

TLS Connection Errors

Error: x509: certificate signed by unknown authority
Set VAULT_TLS_CA_FILE=/secrets/vault-ca.crt in .env, or for CLI:
export VAULT_CACERT=infrastructure/docker/config/vault/tls/vault-ca.crt

Security Model

AppRole Credential Security (Phase 0)

  • secret_id_num_uses=0 — Reusable for re-authentication after Vault restarts
  • secret_id_ttl=0 — No expiry
  • secret_id_bound_cidrs — Usable only from bound server IP
  • token_bound_cidrs — Tokens valid only from bound server IP
Compromised SecretID is useless from any IP other than the bound server.

Phase 1 Upgrade: SecretID Rotation

Add TTL to SecretIDs (secret_id_ttl=7d) and implement rotation via CI/Ansible. See scripts/vault-init.sh for AppRole configuration. Options:
  • Vault Agent sidecar for automated token management
  • Response wrapping (secret_id_num_uses=1) for strongest security