Architecture

Product and Sales

image

Visit the live Miro board for links.

Data Sources

This diagram provides a high-level overview of the data sources we have: image

Pings

Pings are aggregated event telemetry we receive from on-prem and cloud instances.

User Events

User events are event telemetry that we receive from cloud instanecs.

For more info on our data stack tools, see the tools page.

Transcript Events from DotCom Users

For DotCom users we are permitted to store transcript data. To ensure safe handling of this sensitive data and restricting access. The following event pipeline has been built on top of the telemetry-v2 archetiture; and routes flagged transcript events seperately.

Considerations:

  1. Transcript data can only be collected through v2 telemetry and stored within privateMetadata argument of the event
  2. Transcript data should be stored as top-level fields within privateMetadata, using the keys promptText or responseText
  3. Transcript data can only be collected for DotCom (Free) Users
  4. Transcript data must include recordsPrivateMetadataTranscript:1 in the metadata argument of the event
Pub/Sub Topic Subscriptions
DataFlow
  • DataFlow Job that runs on the topic subscription event-telemtry-transcript-to-bq to redact transcripts (responseText, PromptText)
  • DataFlow UDF that the DataFlow Job references (custom javascript function we can run on each event)

Below is a system diagram to illustrate the flow of transcript data further: image