export SLUG=company export ENVIRONMENT=prod export TARGET_VERSION=4.1.0
gh workflow run mi_upgrade.yml -f customer=$SLUG -f environment=$ENVIRONMENT -f target_src_version=$TARGET_VERSION
mi2 you can generate commands to trigger automated upgrades for all instances:
for internal instances:
mi2 workflow run -filter '.metadata.labels."instance-type" == "internal" and .spec.sourcegraphApplicationVersion != "$TARGET_VERSION"' upgrade-instances
for production instances:
mi2 workflow run -filter '.metadata.labels."instance-type" == "production" and .spec.sourcegraphApplicationVersion != "$TARGET_VERSION"' upgrade-instances
for trial instances:
mi2 workflow run -filter '.metadata.labels."instance-type" == "trial" and .spec.sourcegraphApplicationVersion != "$TARGET_VERSION"' upgrade-instances
This automated workflow will generate a pull request for each instance to represent the upgrade that:
- Links to full logs of the automated upgrade process (retained for 90 days)
- Embeds a summary of an automated full instance healthcheck (retained permanently)
- Links to the tracking issue associated with the upgrade
To review currently open PRs for successful instance upgrades using
# Sanity check to see that the PRs correspond to instances you have upgraded gh pr list --label 'mi-upgrade' # Save the list of PRs you are going to work with gh pr list --label 'mi-upgrade' --json number --jq '..number' > upgrade-prs.txt # Review the test plan of each PR (press 'q' to review the next one) cat upgrade-prs.txt | xargs -n1 gh pr view
If an upgrade fails, follow the logs and figure out which step went wrong, then follow the manual upgrade to finish the upgrade.
If all is well, approve and merge each instance upgrade:
# Approve each PR cat upgrade-prs.txt | xargs -n1 gh pr review --approve # Merge each PR cat upgrade-prs.txt | xargs -n1 gh pr merge --squash
Finally, update the tracking issue.
Follow https://github.com/sourcegraph/controller#installation to install
TF_TOKEN_app_terraform_iois only temporary, this is expected to change in the future.
TARGET_VERSIONcould be one of, release version, release candidate version, or
mi2will automatically figure out the latest main branch tag and persist it in
export SLUG=company export ENVIRONMENT=dev export TF_TOKEN_app_terraform_io=$(gcloud secrets versions access latest --project=sourcegraph-secrets --secret=TFC_TEAM_TOKEN) export TARGET_VERSION=4.1.0
set -x SLUG company set -x ENVIRONMENT dev set -x TF_TOKEN_app_terraform_io (gcloud secrets versions access latest --project=sourcegraph-secrets --secret=TFC_TEAM_TOKEN) set -x TARGET_VERSION 4.1.0
git checkout -b $SLUG/upgrade-instance
All commands below are executed under the instance root and use auto inference to locate the instance, hence
-slug flags are omitted.
export INSTANCE_ID=$(mi2 instance get -e $ENVIRONMENT --slug $SLUG | jq -r '.metadata.name') cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID/
This will perform
- health check
- bump version in
config.yamland regenerate deployment artifacts (e.g. kustomize, cdktf)
- backup both the GKE cluster state (e.g. manifest, disks) and Cloud SQL to create a checkpoint
- deploy migrator as a k8s Job to perform database migration
mi2 instance upgrade -target-version $TARGET_VERSION
Confirm workload are healthy
mi2 instance check pods-health
Switch to the right gke cluster and configure
mi2 instance workon -exec
Deploy the new workload
cd kubernetes kustomize build --load-restrictor LoadRestrictionsNone --enable-helm . | kubectl apply -f -
Confirm workload are healthy and remote version is the same as target version
mi2 instance check pods-health src-application-version
You should expected changes are made to
Commit all changes
git add . git commit -m "$SLUG: upgrade to $TARGET_VERSION" gh pr create --title "$SLUG: upgrade to $TARGET_VERSION"
In the PR, include output of
mi2 instance check pods-health
Make sure to merge the pull request in time so we can rollout changes to
executors terraform module.
The module is configured to be auto-apply, hence no action is required.
Depending on the failure scenario, we have different fallback strategy
Our database schema is supposed to be at least
n is the current minor version) compatible.
A failing database migration usually does not result in direct impact on customers and we do not rollout workload until database migration goes through.
For now, follow how to troubleshoot dirty databases to resolve the issue.
In the future, we will add more commands to
mi2 cli to interact with migrator directlly or even provide auto-fix solution in the
mi2 instance upgrade command.
We use rolling update, and usually the older pod will still be serving traffic untill the new pod is healthy. Of course, with the exception of statefulset.
If it is causing breakage to the services (e.g. uptime check is failing), we should aim to rollback as soon as possible.
It could be as simple as reverting changes to the generated k8s manifest
git restore --source=origin/main --worktree kubernetes kustomize build --load-restrictor LoadRestrictionsNone --enable-helm . | kubectl apply -f -
or you can also rollback individual deploy/sts
kubectl rollout undo deploy <> kubectl rollout undo sts <>
If there is a data corruption or some other more complicated cases, follow the restore process.