Restoring a Cloud instance

SOC2/CI-110

Follow break glass process to ensure you have the proper access to perform this playbook.

Extract the instance from Control Plane if cloud.sourcegraph.com/control-plane-mode=true is in config.yaml. Follow the Extract instance from control plane (break glass) section from the Ops Dashboard of the instance, go/cloud-ops.

At the end, follow the Backfill instance into control plane section from the Ops Dashboard of the instance, go/cloud-ops

Restoring Cloud SQL

Use cases:

  • Cloud SQL data is corrupted by a broken database migration
  • Cloud SQL data is deleted

Restore from automated backup

Below process is derived from GCP documentation

The restoration process will be performed with gcloud. Learn more about why not terraform?.

List all backups, note the id of the latest (or the one right before database state is corrupted) SUCCESSFUL backup as SQL_BACKUP_ID

mi2 instance sql-backup list --slug $SLUG -e $ENVIRONMENT

Restore the backup to the current instance.

mi2 instance sql-restore create --backup-id $SQL_BACKUP_ID --slug $SLUG -e $ENVIRONMENT

List operations to watch for progress

mi2 instance sql-restore list --slug $SLUG -e $ENVIRONMENT

Restore GKE cluster application(s)

Use cases (tested scenarios):

  • GKE cluster was deleted
  • application namespaces was deleted
  • single application was deleted
  • PV from single application was deleted

Backup and restore uses native GKE mechanism.

  1. Follow break glass process
  2. List available backups
  3. [Extract the instance from control plane]
  4. Assess the damage
    1. GKE cluster is gone
    2. The namespace is gone
    3. Stateful services is corrupted or PV/PVC/disk is gone
    4. Missing stateless deployment
    5. Missing stateful services deploymnet

List backups

mi2 instance backup list --slug $SLUG -e $ENVIRONMENT

note the backup name, you will need it later.

Restore cluster and applications from backup

cd sourcegraph/cloud
cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID
mi2 instance tfc deploy -auto-approve -e $ENVIRONMENT --slug $SLUG
mi2 instance workon -e $ENVIRONMENT --slug $SLUG
mi2 instance restore create --backup-name $BACKUP_NAME --restore-type full-replace --slug $SLUG -e $ENVIRONMENT

Restore the full namespace

cd sourcegraph/cloud
mi2 instance restore create --backup-name <BACKUP_NAME> --restore-type full-replace --slug $SLUG -e $ENVIRONMENT

Note: if pod hangs with PVCs pending, use below command:

kubectl delete sc gce-pd-gkebackup-de && kubectl get sc sourcegraph -o json | jq '.metadata.name = "gce-pd-gkebackup-de"' | kubectl apply -f -

Restore stateless application

e.g. sourcegraph-frontend

cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID/kubernetes
kustomize build --load-restrictor LoadRestrictionsNone --enable-helm . | kubectl apply -f -

Restore statefull application from disk backup

e.g. gitserver, zoekt

cd sourcegraph/cloud
mi2 instance restore create --backup-name $BACKUP_NAME --restore-type [gitserver|indexed-search] --slug $SLUG -e $ENVIRONMENT

Restore statefull application with empty disk

e.g. gitserver, zoekt

cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID/kubernetes
kustomize build --load-restrictor LoadRestrictionsNone --enable-helm . | kubectl apply -f -

Restoring GCP deleted project

Notes:

  • accidental deletion of GCP project was performed using the following command: gcloud projects delete <PROJECT_ID> based on official GCP documentation

  • according to GCP official documentation, GCP project can be restored within 30 days since deletion

  • export environment variables

export ENVIRONMENT=[dev|prod]
export SLUG=<SLUG>
export GCP_PROJECT=$(mi2 instance get -e $ENVIRONMENT --slug $SLUG | jq -r '.status.gcp.projectId')
  • peform undelete
gcloud projects undelete $GCP_PROJECT
  • verify project is restored
gcloud projects describe $GCP_PROJECT
# should be: lifecycleState: ACTIVE