In the event of emerency, we will need to bypass the usual operation process and restore custom instances in a timely manner.
What events qualify? For example:
- GCP zonal outage and we may need to perform manual failover
- Customer instance is corrupted or deleted somehow - disaster recovery
Make sure you are following our incident playbook to handle those events.
We use Terraform Cloud as a terraform state and remote execution platform using the VCS-driven model.
During daily operation, human operator usually shouldn’t be running terraform manually. However, this should be permited during incident.
config.yaml with the following:
apiVersion: sourcegraph.com/v1 kind: SourcegraphCloud metadata: name: <> spec: debug: tfcRunsMode: cli
Then run the command below to generate the updated terraform modules:
mi2 generate -e $ENVIRONMENT --domain $DOMAIN --slug $SLUG
This will configure the TFC workspaces to the CLI-driven model and permit a human operator to execute terraform locally:
cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID/terraform/stacks/tfc terraform init && terraform apply