Code intelligence provides features and services that help developers better understand and navigate code. This page outlines the vision, strategy and goals of the Code intelligence team.
- Code Graph overall strategy
- Product & Engineering strategy
- Code intelligence backlog
- Latest demo
We generate and process rich metadata that powers compiler-accurate code navigation features such as jumping to a symbol’s definition and finding where it’s referenced across repositories. We help make developers’ lives easier by reducing the time needed to navigate and understand codebases. In the future, we also aim to leverage our metadata to power precise searches, code insights, and batch changes.
In the near term we want Code intelligence to provide seamless, out-of-the-box, precise code navigation for languages that cover 90% of the market usage. We see Code intelligence as the glue that sticks the product together, providing a platform for features from navigation to precise powered searches, compiler-accurate batch changes and insights. We aim to provide support for all widely used languages and for the ones we don’t, provide a platform for any developer to add and test their own indexers. We want our code navigation to reach IDE feature parity, while offering the option of plugging into developers’ favorite IDEs. In the longer term, we envision building a global knowledge graph that accurately maps the entire code universe.
We target developers independently of their career level and company size, helping them learn, onboard, find, and understand codebases faster.
We prioritize precise language support based on overall usage and market-share, while also taking into account our customers’ appetites. Given that supporting new languages requires deep knowledge of the language ecosystem, our team’s skillset also affects the order of language support priority. To mitigate the impact of this factor we’re working on putting in place the necessary infrastructure that will enable taking in community contributions in the future.
Code intelligence lies at the very center of the product as a whole, providing the metadata that will eventually power other product areas.
At the moment maturity varies depending on the area of ownership:
Language indexers: Maturity varies widely per language. See LSIF indexers documentation.
Code intelligence platform: We’re currently focusing on delivering an out-of-the-box precise code intelligence solution to a small number of eager customers. This will allow customers to easily set up precise code intelligence for Go, Java, Scala, Kotlin, and soon TS repositories. This feature will also enable automatic indexing of a repository’s dependencies, which means users will now be able to navigate and find references across repositories and dependencies alike.
Code navigation: Baseline features are implemented but have a considerable amount of debt and room for improvement. We’re currently focusing on researching and implementing features that will make the overall navigation experience faster and more intuitive.
In the last few months we’ve:
- Added precise support for the JVM ecosystem (Java, Scala and Kotlin).
- Added search-based support for Apex.
- Shipped critical platform features necessary to set up and run auto-indexing at on-prem customers that use GCP or AWS.
Recent key learnings:
- We learned that LSIF manual setup is non-trivial and requires various levels of team involvement at enterprise customers. There’s an appetite for a out-of-the-box precise code navigation solution.
- Investing in platform performance and scalability is critical for the Global code graph vision to become a reality. Cloud is already at 2TB of data at ~40k indices. The global code graph for Java alone is somewhere between 800k to 6 million indices, so anywhere from 40TB to 300TB would be needed for the JVM code graph alone.
- Requests for adding precise support for more languages (often including the addition of cross-repository and cross-dependency navigation features).
- Current manual setup is not straightforward, customers run into issues while setting up LSIF for their repositories. This varies per indexer maturity and language ecosystem complexity.
- There have been reports about performance and scaling issues when indexing large monorepos.
- Requests feature that helps visualize the code graph and its dependencies.
Code hosts: Both GitHub and GitLab offer navigation features, mostly powered by fuzzy code intelligence.
GitHub has recently released a new version of their Code Search which includes an enhanced version of our search-based code intelligence approach. This means their chosen technical path prioritizes wider language breath while sacrificing accuracy. However, if GitHub’s Code Search navigation proves to be “good enough” for most use cases, we’ll have to directly compete for adoption and usage.
Our precise code intelligence approach remains our key competitive advantage as it focuses on returning 100% accurate results and lays a solid foundation that enables implementing advanced navigation features like finding implementations and cross-dependency navigation. None of our competitors have mimicked the approach yet.
IDEs: Code navigation is a core part of a developer’s workflow and most IDEs have advanced navigation features we cannot avoid competing with. Most developers expect Sourcegraph’s code navigation to have feature parity with their favorite IDE.
Other navigation tools: These are usually focused on a small scope of languages, none of them have reached wide usage or compete directly with our current value proposition yet.
These tie back to Product/Engineering OKRs
Objective: Make the power of our features easier to find and use
- KR: Measure and increase WAU for customer facing product teams by 15%. For most teams this will require understanding and improving time-to-value as a driver for increasing active users.
Objective: Level up our enterprise-ready features
- KR: Enable out-of-the-box cross-repo and dependency navigation for 3 customers by successfully moving them to on-prem auto-indexing.
- KR: Improve precise code intelligence actions coverage by going from 3 to 5 languages (adding Kotlin and TS/JS), representing the languages in ~50% of all actions.
Cross repository and dependency navigation
We believe this is the global code graph’s killer feature. It elevates the code navigation experience to a new level of cross-project analysis. It includes enabling precise cross-repository navigation and the ability to navigate to any third party dependency a repository references. We’re solving this initially on Sourcegraph Cloud and plan to replicate the same functionality for on-premise usage.
Auto-indexing on-prem goes into Beta:
The current set up experience is not scalable for customers with a large amount of repositories. Enabling auto-indexing would mean a lower barrier for entry, a seamless experience and more engineers using precise code intelligence.
Building the code graph also means we need to generate and store increased amounts of LSIF data that will require scaling our infrastructure in an order of one to two magnitudes. We hypothesize that we’ll reach scaling concerns, we want to be proactive in identifying and removing bottlenecks.
Once we have validated our Alpha solution and have proven it’s running successfully at three customers, the next step is monitoring and weeding out any issues that might arise from our first trials to move on-prem auto-indexing into Beta. At this point we’ll be aiming to roll it out to a larger number of customers.
Ship precise language support
We’ve historically invested in broadening our span of supported languages. This is an ongoing effort that ties directly back to the Global Code Graph vision. We’re currently focusing on shipping Kotlin, Protobuf and a revamped version of our JS/TS indexer. Our next step is adding auto-indexing and cross dependency navigation for JS and TS to our on-prem offering.
Improve the user experience of code navigation and code intelligence
Begin measuring and reporting time-to-value as a metric so that we can understand and improve retention, activation, and sign-ups, and know where the high ROI items are for delivering a unified experience in future quarters.
Conduct research that helps us understand adoption drivers and pain points with the aim of identifying concrete improvements to increase discoverability and enhance the navigation experience.
- Adding precise Python support: Given its extense usage both at customers and in the market, we intend to add support for this language in the near term.
Incremental indexing for large monorepos: When we ship auto-indexing for enterprise instances, we will likely need to solve incremental indexing to support our customers’ monorepos. This feature has been on our mid-term roadmap for quite some time now, but pain points have been worked through using workarounds like spacing LSIF upload frequencies depending on the customer’s repo size and commit frequency. At this point, incremental indexing could become a clear blocker and would be bumped up on our priority list.
Adding C#, Ruby and other precise language support: Based on our team’s bandwidth, skill set, market share and current associated ARR, we are not planning to work on these languages this year. We do however intend to add support for them in the middle term.
Scale the C++ code graph: Given the fragmented nature of the C/C++ environment we won’t be investing in improving C/C++ language support or scaling. We do however intend to revisit our solution in the future.