Tabby (software)
Last reviewed
Jun 4, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 · 2,122 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 4, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 · 2,122 words
Add missing citations, update stale details, or suggest a clearer explanation.
Tabby is an open-source, self-hosted AI code generation assistant developed by TabbyML, Inc. It is positioned as a privacy-preserving, on-premises alternative to GitHub Copilot: organizations run the Tabby server on their own hardware so that source code and prompts never leave their infrastructure. The project is written almost entirely in Rust, distributed under the Apache 2.0 license, and provides code completion, an in-IDE chat, repository-aware retrieval, and a knowledge tool the company calls the Answer Engine. Tabby was created in early 2023 by two former Google engineers, Meng Zhang and Lucy Gao, and its GitHub repository (TabbyML/tabby) has grown to roughly 33,500 stars, making it one of the more widely starred open-source coding assistants.
Tabby packages a complete coding-assistant stack into a single self-contained server. Unlike cloud assistants that send code context to a vendor's servers, a Tabby deployment hosts the large language model, indexes a team's repositories, and serves completions and chat entirely within the customer's environment. The server requires no external database or cloud dependency, exposes an OpenAPI interface for integration with existing tooling, and can run on consumer-grade GPUs or, with smaller models, on CPU. Editor support is delivered through extensions for Visual Studio Code, the IntelliJ/JetBrains platform IDEs, and Vim/Neovim.
The product's core appeal is data control. Because everything runs on-premises, an organization can decide exactly which models are used and where its code is processed, which is the main reason Tabby is adopted by teams in regulated sectors such as banking and financial services, the semiconductor industry, and defense.
Tabby was created by Meng Zhang and Lucy Gao, who had previously worked together at Google. Zhang spent more than eight years at Google, with his final four years focused on generative AI models, particularly image generation. Gao spent about four years at Google working on computer vision and deep learning on the same team as Zhang, then joined TikTok in 2020 as one of the first members of its U.S. product team, where she worked on AI-powered content-creation tools, before spending roughly a year and a half as an entrepreneur in residence at a venture firm. The pair reconnected and, around late 2022, began seriously discussing what became TabbyML, motivated by the 2021 launch of GitHub Copilot and the belief that an open, self-hostable stack could make AI-assisted coding accessible to organizations that could not adopt a closed cloud service.
The TabbyML/tabby repository was created on GitHub in March 2023. Zhang publicly introduced the project on Hacker News on April 6, 2023, in a "Show HN: Tabby, a self-hosted GitHub Copilot" post, describing it as a self-hosted Copilot alternative that runs on the user's own hardware. The earliest version was built on the Hugging Face Transformers and Triton FasterTransformer stack and was demonstrated with small open models based on the GPT-J / GPT-NeoX and Salesforce CodeGen families. On August 31, 2023, the team shipped the first stable release, v0.0.1, which stabilized the Tabby API specification as a foundation for further development. By late August 2023 the repository had passed 11,000 GitHub stars.
On October 10, 2023, TabbyML announced a $3.2 million seed round to develop its open-source code assistant. The round was backed by Yunqi Partners and ZooCap; WVV Capital, a venture firm, also disclosed an investment in the company. At the time of the announcement the project had been starred more than 11,000 times on GitHub, having launched publicly earlier in 2023. The company said it would use the capital for product development.
| Round | Date | Amount | Investors |
|---|---|---|---|
| Seed | October 10, 2023 (announced) | $3.2 million | Yunqi Partners, ZooCap; WVV Capital also disclosed as an investor |
Founders Zhang and Gao framed the bet on a future in which most companies want customization and control over AI coding tools, and on enterprise developers who work with proprietary code that a hosted service like Copilot cannot see. The company has cited the broader market trend that AI coding assistants are expected to be used by a large majority of enterprise software engineers by the end of the decade.
Tabby develops in the open with frequent releases on a 0.x version line. Several releases broadened the product from a pure completion server into a team-oriented platform.
| Version | Date | Notable additions |
|---|---|---|
| v0.0.1 | August 31, 2023 | First stable release; stabilized API specification |
| v0.3.0 | 2023 | Retrieval-augmented code completion enabled by default, using local declarations and recently modified code |
| v0.10.0 | April 22, 2024 | Reports tab with team-wise usage analytics |
| v0.11.0 | May 11, 2024 | Storage usage stats, GitHub and GitLab integration, Activities page, "Ask Tabby" |
| v0.12.0 | June 6, 2024 | GitLab SSO, self-hosted GitHub/GitLab, HTTP API model integration, repo context in the code browser |
| v0.13.0 | July 5, 2024 | Answer Engine, a central knowledge engine for internal engineering teams |
By early 2026 the project had reached the v0.3x release series and roughly 33,500 GitHub stars.
Tabby's original and central feature is inline code completion. The server uses code models with native fill-in-the-middle support and augments suggestions with retrieval-augmented generation over the local codebase. Since v0.3.0, RAG-based completion is on by default: Tabby incorporates locally relevant snippets, such as declarations surfaced through the editor's language server and recently modified code, into the model's context. The IDE extensions apply an adaptive caching strategy to keep latency low, and Tabby parses source into Tree-sitter tags to extract structural context.
The Answer Engine, introduced in v0.13.0 (July 2024), is a question-answering tool surfaced on the Tabby web UI and in the IDE. It uses a chat model together with retrieved context to answer questions about a team's own code and documentation. Users can select a repository or reference internal documents with an @ mention, and the engine retrieves the relevant code and documents from its index to ground its answers. Conversations are organized into threads, which are temporary by default but can be made persistent and shared with the team. An optional, beta web-search capability can be enabled with a Serper API key. Tabby also provides an inline chat experience directly in the editor.
Tabby connects to source repositories from Git, GitHub, GitLab, and similar systems. It fetches the codebase (and, depending on configuration, pull/merge requests, issues, and commits), parses code into an abstract syntax tree, and stores it in a local index. Retrieval combines semantic search (embedding the query and searching a vector index) with BM25 keyword search, merging the two result sets with reciprocal rank fusion so that both chat/search and completion can draw on relevant repository context during inference.
Tabby ships official extensions for VS Code (and VSCodium), the IntelliJ platform IDEs (including IntelliJ IDEA, PyCharm, GoLand, WebStorm, PhpStorm, RubyMine, CLion, Rider, and Android Studio), and Vim/Neovim. The extensions communicate with the self-hosted server over its HTTP API.
On July 18, 2025, TabbyML launched Pochi, which it describes as a "full-stack AI teammate," extending the product beyond completion and chat into autonomous, multi-step task handling. Pochi decomposes a task, plans and executes the work across multiple files, integrates with GitHub (issues, pull requests, code review, and CI/lint/test results), and supports repeatable workflows. Pochi runs on Tabby Cloud and is billed on a usage-based model tied to LLM token consumption, with $20 of free credits included each month.
Tabby is written predominantly in Rust (about 93% of the codebase) and runs as a single self-contained binary or Docker container, with no external DBMS or cloud backend required. It can serve models on consumer-grade GPUs, and smaller models can run on CPU, which supports the project's "no GPU required" deployment options. Installation methods include Docker (with CUDA GPU support), Homebrew, Hugging Face Spaces, and one-click deployments on infrastructure platforms.
Tabby separates the roles of completion, chat, and embedding models, and it is compatible with a range of open code models without custom implementation, including StarCoder, Code Llama, CodeGen, DeepSeek Coder, the Qwen and CodeQwen families, and CodeGemma. The project maintains its own model registry of ready-to-run options. In addition to running models directly, Tabby can connect to external inference servers over HTTP, including OpenAI-compatible endpoints, llama.cpp, vLLM, and Ollama, letting operators point completion, chat, and embedding at whichever backend fits their hardware and policy.
The founders deliberately favored compact models in the range of one to three billion parameters, prioritizing low deployment cost and the ability to run on commodity hardware over matching the raw quality of the largest cloud models.
The bulk of Tabby is released under the Apache 2.0 license. The repository carves out an enterprise directory (ee/) governed by a separate license, and third-party components retain their own licenses; everything outside those carve-outs is Apache 2.0. Because of the ee/ exception, GitHub's license detector classifies the repository's overall license as "Other" rather than a clean Apache 2.0 label, even though the core open-source server is Apache 2.0. This open-core arrangement underpins TabbyML's paid Team and Enterprise tiers.
The self-hosted Community edition is free and open source. TabbyML also sells managed and licensed tiers, and the Pochi agent is billed separately on usage.
| Plan | Price | User limit | Highlights |
|---|---|---|---|
| Community | Free | Up to 5 users | Open source, local deployment, code completion, Answer Engine, inline chat, context providers |
| Team | $19 per seat / month | Up to 50 users | Adds code browser, usage reports/analytics, email support, flexible deployment |
| Enterprise | Custom | Unlimited | Adds SSO, authentication domain, telemetry policy enforcement, bespoke support, dedicated Slack channel, roadmap prioritization |
Tabby Cloud, the hosted option, runs the Pochi agent on usage-based billing with $20 of free monthly credits; tab completion is described as always free without usage limits. Pricing figures reported by third-party reviews have varied over time, so the company's published tiers are the authoritative reference.
Tabby competes in the crowded field of open-source and self-hostable coding assistants. Against GitHub Copilot, its pitch is privacy and control: Copilot is generally regarded as higher-polish and is inexpensive per seat, but it routes code context through GitHub's servers, which rules it out for many regulated teams. Against Continue, an open-source IDE extension that ships no backend of its own and instead connects to whatever model server the user supplies, Tabby differentiates by shipping a full server with repository indexing, a web UI, team administration, and SSO, giving a more complete out-of-the-box experience. Against Sourcegraph's Cody, Tabby offers comparable repository-aware retrieval without requiring the Sourcegraph platform; Cody, for its part, moved to an enterprise-only model in 2025.
Independent reviewers in 2026 generally describe Tabby as the strongest drop-in, self-hosted Copilot replacement for teams that must keep code on their own hardware, citing its swappable open models, lack of per-seat cost when self-hosted, and ability to run on modest GPUs. Commonly noted trade-offs are that it requires GPU hardware and technical setup, that suggestion quality depends heavily on the chosen model and hardware (smaller local models produce weaker completions than the largest cloud services), and that the chat experience is less polished than commercial tools such as Copilot or Cursor. The cost argument for self-hosting strengthens as team size grows, since one server can serve an unlimited number of developers.