Please review the Managed Instances operations guide for instructions.
Sourcegraph upgrades every test and customer instances according to SLA.
The release process is performed in steps:
- New version is released via release guild
- GitHub issue in Sourcegraph Customer repository with the
mi2 env create-tracking-issue -e prod $TARGET_VERSIONcommand
- GitHub issue is labeled with
team/cloudand Cloud Team is automatically notified to perform Managed Instances upgrade. Label is part of the template.
- Cloud team performs upgrade of all instances in given order:
|Stage||Working days since release||Action||Condition not met?|
|1||0-2||Upgrade internal instances by Cloud Team (incl. demo and rctest)|
|2||3-4||Time for verification by Sourcegraph teams||New patch created -> start from 1st stage|
|3||5-6||Upgrade: 30% trials 10% customers||New patch created -> upgrade internal in 1 working day and start from 2nd stage|
|4||7-8||Upgrade: 100% trials 40% customers||New patch created -> upgrade internal in 1 working day and start from 3rd stage|
|5||9-10||Upgrade: 100% customers||New patch created -> upgrade internal in 1 working day and start from 3rd stage|
After upgrade of every single instances uptime checks are verified. This includes automated monitoring
With bi-weekly patch release schedule, Cloud Team is using simplified release process to ensure Cloud customers can obtain patch as soon as possible.
Sourcegraph managed instances are now running on Kubernetes, specifically GKE, today.
- Isolation: Each managed instance is created in an isolated GCP project with heavy gcloud access ACLs and network ACLs for security reasons.
- Admin access: Both the customer and Sourcegraph personnel will have access to an application-level admin account. Learn more about how we ensure secure access to your instance.
- VM/SSH access: Only Sourcegraph personnel will have access to the actual GCP environment, this is done securely through GCP IAP TCP proxy access only. Sourcegraph personnel can make changes or provide data from the environment upon request by the customer.
- Inbound network access: The customer may choose between having the deployment be accessible via the public internet and protected by their SSO provider, or for additional security have the deployment restricted to an allowlist of IP addresses only (such as their corporate VPN, etc.). Filtering of the IP allowlist is performed by our WAF provider, Cloudflare. Notes, in addition to the customer provided IP allowlist, traffic from well-known public code hosts (e.g. GitHub.com) is also permitted to access selected Sourcegraph endpoints to ensure functionality of certain features.
- Outbound network access: The Sourcegraph deployment will have unfettered egress TCP/ICMP access, and customers will need to allow the Sourcegraph deployment to contact their code host. This can be done by having their code-host be publicly accessible, or by allowing the static IP of the Sourcegraph deployment to access their code host.
- Web Application Firewall (WAF) protections: All managed instances are proxied through Cloudflare and leverage security features such as rate limiting and the Cloudflare WAF.
Access can be requested in #it-tech-ops WITH manager approval.
Each managed instance is created in an isolated GCP project, with exclusive resources.
Metrics are visible to Sourcegraph employees with access in a centralized GCP metrics scoping project project. All metrics can be seen in scoped projects dashboard.
Every customer managed instance has alerts configured:
- cloud provider-managed uptime check is configured in dedicated GCP managed instance project
- instance performance metrics alerts are configured in the scoped project for all managed instances
- additional v2.0 infrastructure pefrormance metrics configured per instance
- application performance metrics - based on application log events
When alert is triggered, it is sent to Opsgenie channel:
On-call Cloud engineer has to decide, what is the alert type and if incident should be opened and follow the procedure to perform the incident. On-call Cloud engineer should use the generated managed instances operations to check, assess and repair broken managed instance.
When alert is closed via incident resolution, post-mortem actions has to be assigned and performed.
Please visit go/cloud-ops