Data sources

Product data

We collect two different levels of product data. The type of data we collect depends how an instance is being hosted and the customer’s contract.

  • Pings we collect pings from Sourcegraph cloud, self-hosted, and managed Sourcegraph instances. These pings contain anonymous and aggregated information. There are specific guidelines that must be followed for teams to add ping data.
  • User-level data: we collect anonymous, user-level, event stream data from Sourceraph.com and Sourcegraph managed instances using our event logger. The event stream is data that is collected at the time the user does something in the product. That means, what did they do, when did they do it, what was the outcome. Soon, we’ll be collecting this data from all customers regardless of hosting type, assuming their contract allows it - see the RFC for details on this project and this chart for details on which customer’s have contractually allowed this.

Some customers have the contractual right to send us no telemetry at all, or a much smaller subset of telemetry (called critical telemetry) you check the telemetry status of all customer instances here. The options are as follows:

  • No telemetry: This self-hosted customer has an airgapped instance, or has turned pings off. We don’t have any data about this customer’s product usage
  • Critical pings: This self-hosted customer has elected to send us only the pings that are required for billing, support, updates, and security notices
  • Full pings: This self-hosted customer sends us all the aggregated, anonymoous data we outline here

Which tool should I use to find product data?

Data tool flowchart

Tool Specific Resources

Onboarding to Looker

Onboarding to Amplitude

Other data sources

We also load data to BigQuery from:

  • Salesforce: Customer Relationship Management system (CRM)
  • Sourcegraph.com Site-admin pages: customer Enterprise subscriptions and license keys
  • Sourcegraph production database: we query a few particular tables from the production database via terraform to access data for Sourcegraph.com
  • Google Sheets: There are a number of spreadsheets that get loaded into BigQuery
  • Zendesk: customer support ticketing

We largely do this using Fivetran, but also leverage other tools and system-specific integrations (like BigQuery’s gsheets connector) when necessary.

Support Model

Supporting a Self-Service framework for BI and Analytics is imperative to being a data driven organization.

This framework consists of three principles:

  1. Give the right people the right set of tools and capabilities
  2. Create a collaborative environment
  3. Create a process to govern how this collaboration leads to continuous improvement of our analytics capabilities


In our Self-Service framework, we need to accommodate multiple personas because not all users will want to self-serve the same way.


  • Viewers are interested in standardized reporting on standardized metrics. They are interested to see how metrics change over time and may want to drill into specific datasets for those metrics. Viewers will request new standard metrics or new filters on current standard reports. They will also request new views and reports. Viewers may want to create their own views or dashboards for their own personal use.
  • Examples of use cases of a viewer
    • I want to view standard/trusted reporting on multiple topics
    • I want to filter results with a set of standard filters
    • I have a set of metrics that I am measured against and want to track progress of those metrics
    • I want to automate a set of metrics that I report out on regularly


  • Creators are interested in all the components of a Viewer, but will also want to create their own views/reports. Creators will also want to perform ad-hoc analysis. They are interested in both standard metrics and new/undefined metrics. They may be able to write queries directly against a data warehouse. And creators will use our standard tools, but may use other tools that are not part of the Data and Analytics tech stack.
  • Examples of use cases of a creator
    • I want to create and share views/reports on multiple topics across multiple standard tools
    • I want to perform ad hoc analysis
    • I want to query the DWH to understand and analyze data quickly
    • I want to automate analysis