Managed Instance v2.0

This documentation details significant changes of Managed Instance v2.0 comparing to the previous version.

Unless we explictly call it out, you may assume things are unchanged.

Learn more from:

Architecture

The largest architecture changes are moving from a standalone VM to GKE. Learn more from our Cloud v2 diagrams.

Postgres

SOC2/CI-79

Postgres database now uses a single [Cloud SQL] instance, which is a fully managed service by GCP. It provides fully automated daily backup with point-in-time-recovery and retains for 7 days. We also have on-demand backup prior to upgrade to provide fallback plan for unanticipated events.

GKE

SOC2/CI-79

All services of a Cloud instance are running on a dedicated GKE cluster. We utilize Backup for GKE to provides fully automated daily backup with retention set to 90 days. The backup includes all production disks and application state. Additionally, backup is always taken prior to upgrade or other major operation.

Deployment Environments

SOC2/CI-100

Deployment artifacts are stored in a centralized GitHub repoistory sourcegraph/cloud. Each enviornment is namespaced under environments/$env. A centralized repo makes sharing global configuration much easier comparing to having multiple repo.

Learn more from diagram

Development (dev) environment

All dev projects are created under the Sourcegraph Cloud V2 Dev GCP project folder and environments/dev directory in the sourcegraph/cloud repo.

This is our internal development environment. All dev deployment should be short-lived and they should always be teardown when they are no longer needed.

All engineering teammates are allowed to create instances and perform experiment under the dev environment. Access in general is unrestricted.

Production (prod) environment

All dev projects are created under the Sourcegraph Cloud V2 Prod GCP project folder and environments/prod directory in the sourcegraph/cloud repo.

Access to prod environment is restricted and follow our access policy.

This is our production environment and consists of internal and customer instances. All prod deployment is long-lived.

Below is a list of long-lived internal instances:

Internal instances are created for various testing purposes:

  • testing changes prior to the monthly upgrade on customer instances. upon a new release is made available, Cloud team will follow managed instances upgrade tracker (this is created prior to monthly upgrade) to proceed with upgrade process.
  • testing significant operational changes prior to applying to customer instances
  • long-lived instances for product teams to test important product changes, e.g. scaletesting.

All customer instances are considered part of the prod environment and all changes applied to these customers should be well-tested in the dev environment and internal instances.

Playbook

The following processes only apply to Cloud v2.0:

How to work with Cloud instances?

Below is the bare minimal prereq before you can work with Cloud instances

Let’s walkthrough the process of accessing a Cloud instance:

First locate the instance you are looking for

  • gh repo clone sourcegraph/cloud
  • cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID

Then you can start running various mi2 commands to work with a specific Cloud Instance (where we will infer the current instance base on current working directory).

# start a proxy to the database instance
mi2 instance db proxy

Learn more from the mi2 cli reference for detail usage and examples.

How to request access to Cloud instances infrastructure?

We utilize Entitle to provide time-bound access to GCP infrastructure for both production and development environment.

Use the slash command in Slack, type /access_request in any chat window and hit enter. Fill out the following values:

  • Search permission: One of Cloud V2 Dev Access, Cloud V2 Prod Access
  • Permission duration: Preferably to request the minimal amount of time
  • Add justification: Add a note to provide context why access is needed

The request will be routed to #cloud, #security, or your direct manager for approval. We will review the request and approve the access request.

Please tag @cloud-support or @security-support in #cloud for immediate attention if it is time sensitive. If the request is related to an ongoing incident, please page Cloud on-call engineer using OpsGenie.

How to request access to Cloud instances UI?

Learn more from Request access to Cloud instances UI

How to locate a Cloud instance in the deployment repo?

There are two ways

If you not sure about the slug or environment of an instance, go to s2

repo:^github\.com/sourcegraph/cloud$ file:config.yaml <insert customer name or domain name as keyword to filter>

INSTANCE_ID is the value of .metadata.name in config.yaml

If you know the slug of the instance, run below at the root of the sourcegraph/cloud deployment repo to retrieve the instance ID

mi2 instance get -e $ENVIRONMENT --slug $CUSTOMER | jq -r '.metadata.name'

Then cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID

How do I work with mi2 CLI?

Learn more from CLI reference.

How to work with k8s deployment of a Cloud instance

Run below command to retrieve the credentials and configure the proper kubectl context.

mi2 instance workon -exec

Then run the typical kubectl command to interact with the cluster. Additinoally, you can always use the GKE UI on GCP Console if you prefer.

How to update & apply terraform modules?

In v2, we use cdktf via mi2 cli to dynamically generate the cdktf stacks for each modules.

In cloud repo, run the following:

mi2 workflow run -e $ENVIRONMENT -exec -exec.concurrency 4 generate-cdktf

Commit the changes and open a pull request.

The following modules have auto-apply enabled, hence when they’re changed, no action is required once they are merged

  • monitoring
  • executors
  • security

For other modules, it’s recommended to utilize below process.

# retrieve status of the plan
# make sure to run `--help` to learn more about different output format options
mi2 instance tfc check $module_name

# confirm the plan and apply it
mi2 instance tfc confirm

We will add more step-by-step instruction in the future

Depending on how complex and the blast radius of the change, you may consider sample plan outputs of a few instances, and use the mi2 workflow command to apply across all instances at once. You can also utilize the mi2 workflow command to aggregate the raw plan output of all instances and perform precise check on them to ensure the plan output is exactly what you are looking for.

How to use a fork of cdktf-cli?

Sometime there is bugs (e.g. hashicorp/terraform-cdk#2397, hashicorp/terraform-cdk#2398) in the upstream and we have to maintain our own fork of cdktf-cli.

Use the fork in GitHub Actions, modify the setup-mi2 action to reference the fork and pin to a specific commit, branch, or tag.

https://github.com/sourcegraph/cloud/blob/64d3ddfb2ecbff5c1a200aa8ac981ff1a48abf5e/.github/workflows/mi_create.yml#L97-L106

- name: setup mi2 tooling
  uses: ./.github/actions/setup-mi2
  with:
    # Add a comment explain why a fork is required
    # cdktf-version: 0.13.3
    cdktf-repository: sourcegraph/terraform-cdk
    cdktf-ref: fix/tfc-planned-status

Use the fork locally:

gh repo clone sourcegraph/terraform-cdk
cd terraform-cdk
yarn install
yarn build
# in your shell config file or within the terminal session
alias cdktfl=/abspath-to-terraform-cdk-repo/packages/cdktf-cli/bundle/bin/cdktf

Then replace all cdktf command with cdktfl