Managed Instance v2.0
This documentation details significant changes of Managed Instance v2.0 comparing to the previous version.
Unless we explictly call it out, you may assume things are unchanged.
Learn more from:
- Cloud v2 MVP shortlist
- Cloud v2 migration and decision docs
- Cloud v2 Orchestration docs
- Cloud v2 diagrams
- Cloud v2 remaining work
- Cloud v2 SOC2 working docs
Architecture
The largest architecture changes are moving from a standalone VM to GKE. Learn more from our Cloud v2 diagrams.
Postgres
SOC2/CI-79
Postgres database now uses a single [Cloud SQL] instance, which is a fully managed service by GCP. It provides fully automated daily backup with point-in-time-recovery and retains for 7 days. We also have on-demand backup prior to upgrade to provide fallback plan for unanticipated events.
GKE
SOC2/CI-79
All services of a Cloud instance are running on a dedicated GKE cluster. We utilize Backup for GKE to provides fully automated daily backup with retention set to 90 days. The backup includes all production disks and application state. Additionally, backup is always taken prior to upgrade or other major operation.
Deployment Environments
SOC2/CI-100
Deployment artifacts are stored in a centralized GitHub repoistory sourcegraph/cloud.
Each enviornment is namespaced under environments/$env
. A centralized repo makes sharing global configuration much easier comparing to having multiple repo.
Learn more from diagram
Development (dev
) environment
All dev projects are created under the Sourcegraph Cloud V2 Dev GCP project folder and environments/dev directory in the sourcegraph/cloud repo.
This is our internal development environment. All dev
deployment should be short-lived and they should always be teardown when they are no longer needed.
All engineering teammates are allowed to create instances and perform experiment under the dev
environment. Access in general is unrestricted.
Production (prod
) environment
All dev projects are created under the Sourcegraph Cloud V2 Prod GCP project folder and environments/prod directory in the sourcegraph/cloud repo.
Access to prod
environment is restricted and follow our access policy.
This is our production environment and consists of internal and customer instances. All prod
deployment is long-lived.
Below is a list of long-lived internal instances:
- clouddev.sourcegraph.com
demo.sourcegraph.com(we will migratedemo
to v2 soon)
Internal instances are created for various testing purposes:
- testing changes prior to the monthly upgrade on customer instances. upon a new release is made available, Cloud team will follow managed instances upgrade tracker (this is created prior to monthly upgrade) to proceed with upgrade process.
- testing significant operational changes prior to applying to customer instances
- long-lived instances for product teams to test important product changes, e.g. scaletesting.
All customer instances are considered part of the prod
environment and all changes applied to these customers should be well-tested in the dev
environment and internal instances.
Playbook
The following processes only apply to Cloud v2.0:
- Create a Managed Instance
- Restore a Managed Instance
- Upgrade a Managed instance
- Delete a Managed instance
- Disaster Recovery process for a Managed instance
How to work with Cloud instances?
Below is the bare minimal prereq before you can work with Cloud instances
- sourcegraph/cloud: deployment repo where we persist all Cloud instances config and deployment artifacts
- sourcegraph/controller: mi2 - cloud controller source code
- Install
mi2
by following sourcegraph/controller#installation - Sufficient access to GCP projects, see below FAQ to learn how to request access.
Let’s walkthrough the process of accessing a Cloud instance:
First locate the instance you are looking for
gh repo clone sourcegraph/cloud
cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID
Then you can start running various mi2
commands to work with a specific Cloud Instance (where we will infer the current instance base on current working directory).
# start a proxy to the database instance
mi2 instance db proxy
Learn more from the mi2
cli reference for detail usage and examples.
How to request access to Cloud instances infrastructure?
We utilize Entitle to provide time-bound access to GCP infrastructure for both production and development environment.
Use the slash command in Slack, type /access_request
in any chat window and hit enter. Fill out the following values:
- Search permission: One of
Cloud V2 Dev Access
,Cloud V2 Prod Access
- Permission duration: Preferably to request the minimal amount of time
- Add justification: Add a note to provide context why access is needed
The request will be routed to #cloud, #security, or your direct manager for approval. We will review the request and approve the access request.
Please tag @cloud-support
or @security-support
in #cloud for immediate attention if it is time sensitive. If the request is related to an ongoing incident, please page Cloud on-call engineer using OpsGenie.
How to request access to Cloud instances UI?
Learn more from Request access to Cloud instances UI
How to locate a Cloud instance in the deployment repo?
There are two ways
If you not sure about the slug
or environment
of an instance, go to s2
repo:^github\.com/sourcegraph/cloud$ file:config.yaml <insert customer name or domain name as keyword to filter>
INSTANCE_ID
is the value of .metadata.name
in config.yaml
If you know the slug of the instance, run below at the root of the sourcegraph/cloud deployment repo to retrieve the instance ID
mi2 instance get -e $ENVIRONMENT --slug $CUSTOMER | jq -r '.metadata.name'
Then cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID
How do I work with mi2
CLI?
Learn more from CLI reference.
How to work with k8s deployment of a Cloud instance
Run below command to retrieve the credentials and configure the proper kubectl
context.
mi2 instance workon -exec
Then run the typical kubectl
command to interact with the cluster. Additinoally, you can always use the GKE UI on GCP Console if you prefer.
How to update & apply terraform modules?
In v2, we use cdktf
via mi2
cli to dynamically generate the cdktf stacks for each modules.
In cloud
repo, run the following:
mi2 workflow run -e $ENVIRONMENT -exec -exec.concurrency 4 generate-cdktf
Commit the changes and open a pull request.
The following modules have auto-apply enabled, hence when they’re changed, no action is required once they are merged
monitoring
executors
security
For other modules, it’s recommended to utilize below process.
# retrieve status of the plan
# make sure to run `--help` to learn more about different output format options
mi2 instance tfc check $module_name
# confirm the plan and apply it
mi2 instance tfc confirm
We will add more step-by-step instruction in the future
Depending on how complex and the blast radius of the change, you may consider sample plan outputs of a few instances,
and use the mi2 workflow
command to apply across all instances at once.
You can also utilize the mi2 workflow
command to aggregate the raw plan output of all instances and perform precise check on them to ensure
the plan output is exactly what you are looking for.
How to use a fork of cdktf-cli
?
Sometime there is bugs (e.g. hashicorp/terraform-cdk#2397, hashicorp/terraform-cdk#2398) in the upstream and we have to maintain our own fork of cdktf-cli
.
Use the fork in GitHub Actions, modify the setup-mi2
action to reference the fork and pin to a specific commit, branch, or tag.
- name: setup mi2 tooling
uses: ./.github/actions/setup-mi2
with:
# Add a comment explain why a fork is required
# cdktf-version: 0.13.3
cdktf-repository: sourcegraph/terraform-cdk
cdktf-ref: fix/tfc-planned-status
Use the fork locally:
gh repo clone sourcegraph/terraform-cdk
cd terraform-cdk
yarn install
yarn build
# in your shell config file or within the terminal session
alias cdktfl=/abspath-to-terraform-cdk-repo/packages/cdktf-cli/bundle/bin/cdktf
Then replace all cdktf
command with cdktfl