Skip to main content

Deploy to Staging

The staging environment runs on a single EC2 instance with k3s (lightweight Kubernetes), managed data stores (RDS PostgreSQL, ElastiCache Redis), and in-cluster FalkorDB and Typesense. ArgoCD watches the stage branch and auto-syncs manifests. The CD workflow builds and pushes images on every push to stage.

Architecture Overview

ComponentStagingNotes
Computek3s on EC2 t3.medium1 replica per service
DatabaseRDS db.t3.micro (PostgreSQL 16 + pgvector)Free-tier eligible
CacheElastiCache cache.t3.micro (Redis 7)Free-tier eligible
Graph DBFalkorDB StatefulSet (in-cluster)Redis protocol on port 6379
SearchTypesense StatefulSet (in-cluster)HTTP on port 8108
Ingressnginx-ingress controllerInstalled via k3s user_data
TLScert-manager + Let's EncryptHTTP-01 solver via nginx
GitOpsArgoCDAuto-sync with prune + self-heal
CI/CDGitHub Actions (cd-staging.yml)Triggers on push to stage

Deployed services: gateway, content, auth, billing, ai, notifications

Services with Dockerfiles but no k8s manifests yet: ingest, plugin-registry

Prerequisites

Complete these one-time setup steps before starting.

1. AWS Account Bootstrapping

The Terraform S3 backend requires a state bucket and DynamoDB lock table. These must exist before terraform init.

aws s3api create-bucket \
--bucket gospelib-terraform-state \
--region us-east-1

aws s3api put-bucket-versioning \
--bucket gospelib-terraform-state \
--versioning-configuration Status=Enabled

aws s3api put-bucket-encryption \
--bucket gospelib-terraform-state \
--server-side-encryption-configuration \
'{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'

aws dynamodb create-table \
--table-name gospelib-terraform-lock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1

2. EC2 SSH Key Pair

Create a key pair in the AWS Console (or CLI) in us-east-1. The name must match the ssh_key_name variable in your terraform.tfvars.

aws ec2 create-key-pair \
--key-name gospelib-staging \
--query 'KeyMaterial' \
--output text > ~/.ssh/gospelib-staging.pem

chmod 600 ~/.ssh/gospelib-staging.pem

3. Route53 Hosted Zone

A hosted zone for gospelib.com must exist. Note the zone ID — you'll need it for terraform.tfvars.

aws route53 list-hosted-zones-by-name \
--dns-name gospelib.com \
--query 'HostedZones[0].Id' \
--output text

4. GitHub OIDC Provider for AWS

The CD workflow authenticates to AWS via OIDC (no long-lived credentials). Create the identity provider once per AWS account:

aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1

Then create an IAM role that trusts GitHub Actions for this repo. The role needs:

  • ecr:GetAuthorizationToken, ecr:BatchGetImage, ecr:GetDownloadUrlForLayer, ecr:PutImage, ecr:InitiateLayerUpload, ecr:UploadLayerPart, ecr:CompleteLayerUpload, ecr:BatchCheckLayerAvailability
  • Scoped to the repo via the trust policy's sub condition (e.g., repo:gospelib/main:ref:refs/heads/stage)

5. GitHub Repository Secrets

Set these in the repo's Settings > Secrets and variables > Actions under the staging environment:

SecretValue
AWS_ROLE_ARNARN of the IAM role from step 4
ECR_REGISTRY<account-id>.dkr.ecr.us-east-1.amazonaws.com

6. Local Tooling

Install on your workstation:

Step 1: Provision Infrastructure with Terraform

cd infra/terraform/environments/staging

# Copy the example and fill in real values
cp terraform.tfvars.example terraform.tfvars

Edit terraform.tfvars with your values:

environment = "staging"
aws_region = "us-east-1"
route53_zone_id = "Z0123456789ABCDEF" # Your actual zone ID
ssh_key_name = "gospelib-staging" # Must match the key pair name
k3s_instance_type = "t3.medium"

# REQUIRED: Your IP ranges for SSH and k3s API access
admin_cidr_blocks = ["203.0.113.0/24"]

# Pass the DB password via env var instead of committing it:
# export TF_VAR_db_password="<strong-random-password>"
danger

Never commit terraform.tfvars. It is gitignored, but double-check. The db_password should be passed via TF_VAR_db_password environment variable.

terraform init
terraform plan -out=plan.tfplan
terraform apply plan.tfplan

This creates:

  • ECR repositories for each service
  • RDS PostgreSQL instance (free-tier db.t3.micro)
  • ElastiCache Redis cluster (free-tier cache.t3.micro)
  • S3 artifacts bucket
  • Secrets Manager entries
  • Route53 DNS records (staging.gospelib.com, api-staging.gospelib.com)
  • EC2 instance with Elastic IP, pre-configured with:
    • k3s (Traefik disabled)
    • nginx-ingress controller
    • gospelib-staging namespace

Note the outputs — you'll need them for secrets:

terraform output

Step 2: Configure kubectl

The EC2 user_data installs k3s and writes kubeconfig to /home/ubuntu/.kube/config. Copy it locally:

K3S_IP=$(terraform output -raw k3s_public_ip)

scp -i ~/.ssh/gospelib-staging.pem \
ubuntu@${K3S_IP}:/home/ubuntu/.kube/config \
~/.kube/gospelib-staging.yaml

# Update the server address from localhost to the public IP
sed -i '' "s|127.0.0.1|${K3S_IP}|g" ~/.kube/gospelib-staging.yaml

export KUBECONFIG=~/.kube/gospelib-staging.yaml
kubectl get nodes

You should see a single node in Ready state.

Step 3: Install ArgoCD

ArgoCD watches the stage branch and auto-deploys when Kustomize manifests change.

kubectl create namespace argocd

kubectl apply -n argocd \
-f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Wait for ArgoCD to be ready
kubectl wait --for=condition=available deployment/argocd-server \
-n argocd --timeout=300s

Get the initial admin password:

kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d

Register the staging ArgoCD Application. The manifest at infra/k8s/argocd/application.yaml points to the stage branch and the infra/k8s/overlays/staging path:

kubectl apply -f infra/k8s/argocd/application.yaml
tip

To access the ArgoCD UI, port-forward: kubectl port-forward svc/argocd-server -n argocd 8080:443. Then visit https://localhost:8080 and log in with admin and the password from above.

Step 4: Install cert-manager

cert-manager provisions TLS certificates from Let's Encrypt automatically.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# Wait for cert-manager to be ready
kubectl wait --for=condition=available deployment/cert-manager \
-n cert-manager --timeout=120s
kubectl wait --for=condition=available deployment/cert-manager-webhook \
-n cert-manager --timeout=120s

The ClusterIssuer (letsencrypt-prod) is included in the base Kustomize resources and will be applied by ArgoCD automatically. It uses HTTP-01 challenges via the nginx ingress class.

Step 5: Create Kubernetes Secrets

The Kustomize overlay injects secrets per-service via envFrom. Each service references specific secrets by name. Create them in the gospelib-staging namespace.

Database credentials

RDS_ENDPOINT=$(terraform output -raw rds_endpoint)
REDIS_ENDPOINT=$(terraform output -raw redis_endpoint)

kubectl create secret generic gospelib-database \
-n gospelib-staging \
--from-literal=DATABASE_URL="postgresql://gospelib:${TF_VAR_db_password}@${RDS_ENDPOINT}:5432/gospelib?sslmode=require" \
--from-literal=REDIS_URL="redis://${REDIS_ENDPOINT}:6379"

Auth service secrets

kubectl create secret generic gospelib-auth \
-n gospelib-staging \
--from-literal=CLERK_SECRET_KEY="sk_test_xxx"

Billing service secrets

kubectl create secret generic gospelib-billing \
-n gospelib-staging \
--from-literal=STRIPE_SECRET_KEY="sk_test_xxx" \
--from-literal=STRIPE_WEBHOOK_SECRET="whsec_xxx"

AI service secrets

kubectl create secret generic gospelib-ai \
-n gospelib-staging \
--from-literal=ANTHROPIC_API_KEY="sk-ant-xxx" \
--from-literal=OPENAI_API_KEY="sk-xxx"

Search service secrets

kubectl create secret generic gospelib-search \
-n gospelib-staging \
--from-literal=TYPESENSE_API_KEY="$(openssl rand -hex 32)"

Notifications service secrets

kubectl create secret generic gospelib-notifications \
-n gospelib-staging \
--from-literal=RESEND_API_KEY="re_xxx"

Observability (optional)

kubectl create secret generic gospelib-observability \
-n gospelib-staging \
--from-literal=SENTRY_DSN="https://xxx@sentry.io/xxx"
info

The gospelib-observability secret is referenced with optional: true in the Kustomize patches. Services will start without it — add it when you're ready to enable Sentry.

Step 6: Initial Deployment

Build and push images manually (first time)

Authenticate with ECR and push all service images:

ECR_REGISTRY=$(terraform output -raw ecr_repository_urls | jq -r 'to_entries[0].value' | cut -d/ -f1)

aws ecr get-login-password --region us-east-1 \
| docker login --username AWS --password-stdin ${ECR_REGISTRY}

for svc in gateway content auth billing ai notifications; do
docker build -t ${ECR_REGISTRY}/gospelib-${svc}:latest services/${svc}/
docker push ${ECR_REGISTRY}/gospelib-${svc}:latest
echo "Pushed ${svc}"
done

Trigger ArgoCD sync

ArgoCD should auto-sync within 3 minutes. To force an immediate sync:

kubectl -n argocd exec deploy/argocd-server -- \
argocd app sync gospelib-staging --force

Or if you have the argocd CLI installed:

argocd app sync gospelib-staging

Step 7: Run Initial Data Ingest

kubectl apply -f infra/k8s/jobs/ingest-full.yaml -n gospelib-staging

# Follow the logs
kubectl logs -f job/ingest-full -n gospelib-staging

Step 8: Verify

# All pods running
kubectl get pods -n gospelib-staging

# Health endpoints
curl https://api-staging.gospelib.com/health
curl https://api-staging.gospelib.com/ready

# Test a passage query
curl https://api-staging.gospelib.com/api/v1/passages/gen.1.1

# Web app
curl -I https://staging.gospelib.com

Continuous Deployment (Automatic)

After the initial setup, deployments are fully automatic:

  1. Code is pushed/merged to the stage branch
  2. GitHub Actions (cd-staging.yml) detects affected services via pnpm nx show projects --affected
  3. Only changed services are built and pushed to ECR (tagged with the commit SHA)
  4. The workflow updates image tags in infra/k8s/overlays/staging/kustomization.yaml via kustomize edit set image and commits the change back to stage
  5. ArgoCD detects the manifest change and syncs the cluster
  6. The workflow polls https://api-staging.gospelib.com/health for up to 5 minutes to confirm the deploy succeeded

No manual intervention is needed after the initial setup.

Troubleshooting

Pods stuck in CrashLoopBackOff

kubectl logs <pod-name> -n gospelib-staging --previous
kubectl describe pod <pod-name> -n gospelib-staging

Common causes:

  • Missing secrets: A secret referenced in envFrom doesn't exist. Check the exact secret names match what's in the Kustomize overlay.
  • Incorrect DATABASE_URL: Verify the RDS endpoint and password are correct.
  • Port conflicts: FalkorDB and Redis both use port 6379 — they're disambiguated by service DNS names.

ArgoCD not syncing

# Check application status
kubectl -n argocd get applications

# Check sync status and any errors
kubectl -n argocd describe application gospelib-staging

Common causes:

  • ArgoCD can't reach the GitHub repo (check repo credentials)
  • targetRevision is set to the wrong branch (should be stage)
  • Kustomize build errors in the overlay

Cannot reach FalkorDB

kubectl get svc -n gospelib-staging | grep falkordb
kubectl exec -it <falkordb-pod> -n gospelib-staging -- redis-cli PING

ECR pull failures

Ensure the EC2 instance profile has ECR read permissions:

  • ecr:GetAuthorizationToken
  • ecr:BatchGetImage
  • ecr:GetDownloadUrlForLayer
  • ecr:BatchCheckLayerAvailability

DNS not resolving

Check that Route53 records point to the Elastic IP:

dig staging.gospelib.com
dig api-staging.gospelib.com

# Compare with
terraform output k3s_public_ip

TLS certificate not issuing

kubectl get certificates -n gospelib-staging
kubectl describe certificate <name> -n gospelib-staging
kubectl get challenges -n gospelib-staging

Common causes:

  • cert-manager not installed or not ready
  • Ingress class mismatch (ClusterIssuer expects nginx)
  • Port 80 not reachable from the internet (check security group)

Cost Estimate

ResourceMonthly Cost
EC2 t3.medium (on-demand)~$30
RDS db.t3.microFree tier (first 12 months), then ~$15
ElastiCache cache.t3.microFree tier (first 12 months), then ~$13
Elastic IPFree (while attached to running instance)
Route53 hosted zone$0.50
ECR storage~$1 (varies with image count)
S3 state bucket< $0.10
Total~$32/mo (or ~$2/mo within free tier)