Cody is an AI coding assistant that lives in your editor that can find, explain, and write code. Cody uses a combination of AI (specifically Large Language Models or LLMs), Sourcegraph search, and Sourcegraph code intelligence to provide answers that eliminate toil and keep human programmers in flow. You can think of Cody as your programmer buddy who has read through all the code on GitHub, all the questions on StackOverflow, and all your organization’s private code, and is always there to answer questions you might have or suggest ways of doing something based on prior knowledge.
There are two ways to use Cody:
- As an individual dev, using Cody with sourcegraph.com and/or Cody App.
- As a Sourcegraph Enterprise user, connect Cody to your Sourcegraph Enterprise instance.
To provide responses to requests, Cody does the following:
- A user asks Cody a question (or to write some code).
- Cody fetches relevant code snippets.
- Unlike Copilot, Cody knows about entire codebases, and fetches snippets directly relevant to you.
- Sourcegraph uses a combination of code search, code graph (SCIP), intelligent ranking, and an AI vector database to respond with snippets that are relevant to the user’s request.
- Sourcegraph passes a selection of these results along with the original question to a Large Language Model like Claude or OpenAI’s ChatGPT.
- The Large Language Model uses the contextual info from Sourcegraph to generate a factual answer and sends it to Cody.
- Cody then validates the output of the Large Language Model and sends the answer back to the user.
Cody uses a ChatGPT-like model as a component in its architecture (today we use Claude, but we could alternatively use ChatGPT or a similar Large Language Model). ChatGPT lacks the ability to search for contextual code snippets and docs, so its knowledge is therefore limited to open source it was trained on. It does not know about recent changes to code or your private codebase. Rather than telling you when it doesn’t know, ChatGPT will just confidently make stuff up that sounds correct but is false. The contextual snippets that Cody fetches from Sourcegraph are crucial to enabling Cody to generate factually accurate responses.
Look for the latest competitive landscape information in Highspot.
See here for the access conditions and exception process around customers using Cody.
In general, a good idea is to give use cases a try. LLMs are very powerful and generic, and we add new recipes all the time. Here’s a few answers:
Cody Enterprise uses the Sourcegraph API to fetch contextual code snippets and docs that are relevant to answering a user’s query, using embeddings and soon other APIs. One way to think about Cody is that it is a natural language layer on top of Sourcegraph that uses many of the same search and code navigation features a human might, and then synthesizes the results from these features into an answer to the user’s question or code-writing request.
(As an analogy, consider how ChatGPT is pretty smart on its own for stuff that happened in public in the past, but it doesn’t know about anything in the present or any specifics about stuff from your email or private documents, for example.)
Cody can read and write code in any major programming language. More esoteric languages might not work out of the box.
Yes, Cody can speak many languages, including Spanish, French, German, Italian, Chinese, Japanese, Korean, Latin, and Esperanto.
Can Cody write code referencing other parts of the codebase (Ex. Write a new function calling an existing function in another repo)?
See the CE Demo page
Cody is on by default for all Sourcegraph customers with managed instances. Self-hosted customers are not able to get access to Cody without a special exemption.
Technically, it can be done, but due to the complexities involved this is not supported out-of-the-box.
- Anthropic or OpenAI for the LLM
- OpenAI for embeddings
Having code snippets sent to third-party services will be a problem for
. Do we have a plan to address that? What if a customer or prospect is fully air-gapped?
- Short term
- We don’t have a short term plan to provide a completely self-hostable version of Cody. That’s because the LLM we use to generate the answers is provided by a third-party, and costs a few million dollars to train.
- Customers that ask for this will likely recognise that, and ask for options. What frequently works is offering them to use their own OpenAI (chatGPT and embeddings) or Anthropic contract, which we support (BYO Key).
The snippets that are sent are determined using Sourcegraph search, embeddings, or any other method that maximizes the relevance of the context fed to the LLM.
Does Cody train on customer code? How does Cody answer questions about your code if it does not train on customer’s code?
No, Cody doesn’t train on customer code. See docs
Please reference the Cody notice.
Yes. Cody sends code outside of a customer’s network when creating embeddings and during normal use by end users. Our zero-retention agreements with our third-party LLM providers apply to these requests and responses.
We don’t have a GA date set.
Usually people think of embeddings / vector search as complementary to other strategies. While it matches really well semantically (“what is this code about, what does it do?”), it drops syntax and other important precise matching info. So the results are often combined with exact/ precise approaches to get the “best of both worlds”.
Also see AI reading list
See Cody Marketing page for Cody messaging, one-pagers, and other marketing assets.
See this post.
To disable access to Cody, you must revoke the Anthropic/OpenAI API keys provided to the customer that they apply to the instance. Once the API key is revoked, the Cody extension becomes unusable.
For example to revoke the Anthropic API Key, from the Anthropic Console find the API Key specific to the customer and disable it.