Cloud
The Cloud team is the special focus team reporting directly to CEO modeled on “if AWS were to offer ‘Managed Sourcegraph’ like they do Elasticsearch, Redis, PostgreSQL, etc., how would they do it?” The team is responsible for maintaining existing managed instances and building the next generation of them. The Cloud team has no other responsibilities.
Mission statement
Build a fully managed platform for using Sourcegraph that can (by EOFY23) support 200+ customers using dedicated Sourcegraph instances, providing feature compatibility with self-hosted while being cost-efficient for customers and Sourcegraph.
Fully managed
- Observability allowing Sourcegraph to react before user impact is noticed, while respecting user privacy
- Frequent, invisible Sourcegraph upgrades
- Invisible infrastructure updates
- Zero infrastructure access for customers
Platform
- Low customer onboarding cost
- Zero customer maintenance cost
- Secure (SOC 2, documented security posture)
- Reliable (ability to offer SLA, internal SLO of 99.9%)
- Automatable (in due time, feature releases / billing / upgrades / analytics are built-in)
Support 200+ customers
- Targeting 200+ customers in to invest in supporting 1000 in
- Support 300 production-grade instances (accommodating trials / testing)
- Compatible with current MI use cases
- Infrastructure / Domain / Isolation boundary per customer
Dedicated Sourcegraph instances
- One Sourcegraph instance serves a single customer
- (/) Dedicated, Sourcegraph-provided Cloud infrastructure
- (/) GCP only
Feature compatibility
- Feature set on-par with self-hosted
- With time, getting more powerful than self-hosted
- Features are opt-in (for a fee)
- New features available on Cloud before self-hosted
- Existing features have higher adoption on Cloud than self-hosted
Cost-efficiency
- Expected to support teams from 50 to 5000 users (EOFY23) at 500$/month minimal infrastructure cost
- Infrastructure cost covered by Sourcegraph
- Administration / operations provided by Sourcegraph
- () Self service provisioning / release channels for upgrades
Not in scope (for / ):
Roadmap
The Cloud team will define roadmap in upcoming weeks.
Q2FY23 goals
- Support up to 100 managed instances without greatly increasing engineering capacity invested in maintenance
- Provision sourcegraph.sourcegraph.com as production grade managed instance for Sourcegraph team
- Support migration Sourcegraph.com customers to the Cloud
- Support SOC2 audit for managed instances
- Enabling managed instances trials (limited capacity)
- Finalize design & implementation plan of RFC 706
Team
- Rafal Leszczynski, Engineering Manager, Cloud
- Dax McDonald, Software Engineer
- Rafal Gajdulewicz, Software Engineer
- Filip Haftek, Software Engineer
- Daniel Dides, Software Engineer
- Michael Lin, Software Engineer
How to contact the team and ask for help
- For emergencies and incidents, alert the team using Slack command
/genie alert [message] for devops
. - For internal Sourcegraph teammates, join us in
#cloud
slack channel to ask questions or request help from our team. - For special requests types or requests for help that requires action for the Cloud team engineers (exp. coding, infrastructure change etc.) please create a GH issue and assign a
team/cloud
label. You can also post a follow up message on the#cloud
slack channel
When to offer a Managed Instance
See below for the SLAs and Technical implementation details (including Security) related to managed instances.
Please message #cloud
for any answers or information missing from this page.
When offering customers a Managed Instance, CE and Sales should communicate and gather information for the following topics
- Customers are comfortable with security implication of using a managed instance
- Customers’ code host should be accessible publically or able to allow incoming traffic from Sourcegraph-owned static IP addresses. (Notes: we do not have proper support for other connectivity methods, e.g. site-to-site VPN)
Managed Instance Requests
Customer Engineers (CE) or Sales may request to:
- Create a managed instance - [Issue Template]
- For new customers or prospects who currently do not have a managed instance.
- After determining a managed instance is viable for a customer/prospect
- Suspend a managed instance - [Issue Template]
- For customers or prospects who currently have a managed instance that needs to pause their journey, but intend to come back within a couple of months.
- Tear down a managed instance - [Issue Template]
- For customers or prospects who have elected to stop their managed instance journey entirely. They accept that they will no longer have access to the data from the instance as it will be permanently deleted.
Workflow
- CE seeks Managed Instance approval from their regional CE Manager
- The Regional CE Manager will review the following criteria:
- Overall, is the deal qualified?
- Is it technically qualified? We have documented POC success criteria and the customer agrees to the criteria. We have documented the basic technical requirements of the customer (languages, repo types, security, etc.)
- If anything is non-standard, it must pass the tech review process
- If approved, then CE proceeds based on whether this is a standard or non-standard managed instance scenario:
- For standard managed instance requests (i.e., new instance, no scale concerns, no additional security requirements), CE submits a request to the Cloud team using the corresponding issue template in the sourcegraph/customer repo.
- For non-standard managed instance requests (i.e., any migrations, special scale or security requirements, or anything considered unusual), CE submits the opportunity to Tech Review before making a request to the Cloud team.
- Message the team in
#cloud
. - If denied, the CE/AE can appeal through the CE/AE leadership chain of command.
SLAs for managed instances
Support SLAs for Sev 1 and Sev 2 can be found here. Other engineering SLAs are listed below
Description | Response time | Resolution time | |
---|---|---|---|
New instance Creation | Spin up new instance for a new customer | Within 24 hours of becoming aware of the need | Within 7 working days from agreement |
Existing instance suspension | Suspend an existing managed instance temporarily | Within 24 hours of becoming aware of the need | Within 15 working days from agreement |
Existing instance deletion/teardown | Decommission/delete and existing managed instance | Within 24 hours of becoming aware of the need | Within 15 working days from agreement |
New Feature Request | Feature request from new or existing customers | Within 24 hours of becoming aware of the need | Dependent on the request |
Maintenance: Monthly Update to latest release | Updating an instance to the latest release | NA | Within 1 week after latest release |
Maintenance: patch/emergency release Update | Updating an instance with a patch or emergency release | NA | Within 1 week after patch / emergency release |
Recovery Time Objective and Recovery Point Objective (RTO & RPO)
We have a maximum Recovery Point Time objective of 24 hours. Snapshots are performed at-least daily on managed instances. Some components may have lower RPOs (e.g. database).
Our maximum Recovery Time Objective is defined by our support SLAs for P1 & P2 incidents.
Incident Response
Incidents which affect managed instances are handled according to our incidents process.
Accessing/Debugging Managed Instances
Action | Who can do it | Description | How |
---|---|---|---|
Reload config | CE/CS | Reload MI site config (restart frontend) | restart frontend |
View GCP project metrics | Cloud/Security/All SG employees via policy attachment | Access to all MI metrics aggregate in single project | GCP scoped dashboard |
View GCP project logs | Cloud/Security/All SG employees via policy attachment | Access customer GCP project logs | GCP logs - change to proper customer name |
GCP ssh, tunnel ports | Cloud/CS | Required for troubleshooting customer environment and perform pre-defined playbook | install mg cli ssh to MI port-forward to MI gcloudcommands |
Access CloudSQL database | Cloud/Security/CS | Login to CloudSQL DB | install mg cli access CloudSQL via mg cli gcloud commands |
Login to customer MI web UI | Cloud/CE | Login to customer web UI (requires enabled OIDC on customer instance or access to 1password customer instances vault ) - change URL to customer slug | login with GSuite (OIDC) or user/password from 1password (if OIDC not enabled) |
Login to customer Grafana | Cloud/CE | Login to customer Grafana (requires enabled OIDC on customer instance or access to 1password customer instances vault ) - change URL to customer slug | login with GSuite (OIDC) or user/password from 1password (if OIDC not enabled) |
More Managed Instances can be found here
How we work
Issue tracking
The Cloud team GitHub Project is the single source of truth.
How we use GitHub Projects (Beta)
tbd
On-call
We maintain an on-call rotation in Opsgenie. Responsibilities of the teammate who is on-call include:
- Acknowledging incoming alerts
- Initiating incident procedures
- Publishing postmortems
Managed Instance technical documentation
Team slack channels
#cloud
- external channel for the Cloud team where other Sourcegraphers can ask for help or leave questions for the team#cloud-internal
- internal channel for the Cloud team for all day to day communication within the team
FAQ
FAQ: Can customers disable the “Builtin username-password authentication”?
Yes, you may disable the builtin authentication provider and only allow creation of accounts from configured SSO providers.
However, in order to preserve site admin access for Sourcegraph operators, we need to add Sourcegraph’s internal Okta as an authentication provider. Please reach out to our team prior to disabling the builtin provider.
FAQ: How do I restart the frontend after changing the site-config?
Are you a member of our CE & CS teams?
- Visit sourcegraph/deploy-sourcegraph-managed
- Locate the
slug
of the customer instance from list of folders - Visit https://github.com/sourcegraph/deploy-sourcegraph-managed/actions/workflows/reload_frontend.yml
- Click
Run workflow
and input theslug
of customer instance - Click the
Run workflow
green button - Done! It shouldn’t take more than 2 minutes
FAQ: How to use mg cli for Managed Instances operations?
git clone https://github.com/sourcegraph/deploy-sourcegraph-managed
cd deploy-sourcegraph-managed
echo "export \$MG_DEPLOY_SOURCEGRAPH_MANAGED_PATH=$(pwd)" >> ~/.bashrc
mkdir -p ~/.bin
export GOBIN=$HOME/.bin
echo "export \$PATH=\$HOME/.bin:\$PATH" >> ~/.bashrc
make install
mg --help