Tabby (software)

AI Code Generation Developer Tools Open Source AI

11 min read

Updated Jul 17, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 17, 2026

Fact-checked

In review queue

Sources

17 citations

Revision

v2 · 2,122 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Tabby is an open-source, self-hosted AI code generation assistant developed by TabbyML, Inc.^[1] It is positioned as a privacy-preserving, on-premises alternative to GitHub Copilot: organizations run the Tabby server on their own hardware so that source code and prompts never leave their infrastructure.^[2] The project is written almost entirely in Rust, distributed under the Apache 2.0 license, and provides code completion, an in-IDE chat, repository-aware retrieval, and a knowledge tool the company calls the Answer Engine.^[1] Tabby was created in early 2023 by two former Google engineers, Meng Zhang and Lucy Gao,^[3] and its GitHub repository (TabbyML/tabby) has grown to roughly 33,500 stars,^[17] making it one of the more widely starred open-source coding assistants.

Overview

Tabby packages a complete coding-assistant stack into a single self-contained server.^[2] Unlike cloud assistants that send code context to a vendor's servers, a Tabby deployment hosts the large language model, indexes a team's repositories, and serves completions and chat entirely within the customer's environment. The server requires no external database or cloud dependency, exposes an OpenAPI interface for integration with existing tooling, and can run on consumer-grade GPUs or, with smaller models, on CPU.^[1] Editor support is delivered through extensions for Visual Studio Code, the IntelliJ/JetBrains platform IDEs, and Vim/Neovim.

The product's core appeal is data control. Because everything runs on-premises, an organization can decide exactly which models are used and where its code is processed, which is the main reason Tabby is adopted by teams in regulated sectors such as banking and financial services, the semiconductor industry, and defense.^[2]

History

Founding

Tabby was created by Meng Zhang and Lucy Gao, who had previously worked together at Google.^[3] Zhang spent more than eight years at Google, with his final four years focused on generative AI models, particularly image generation.^[3] Gao spent about four years at Google working on computer vision and deep learning on the same team as Zhang, then joined TikTok in 2020 as one of the first members of its U.S. product team, where she worked on AI-powered content-creation tools, before spending roughly a year and a half as an entrepreneur in residence at a venture firm.^[4] The pair reconnected and, around late 2022, began seriously discussing what became TabbyML, motivated by the 2021 launch of GitHub Copilot and the belief that an open, self-hostable stack could make AI-assisted coding accessible to organizations that could not adopt a closed cloud service.

The TabbyML/tabby repository was created on GitHub in March 2023.^[1] Zhang publicly introduced the project on Hacker News on April 6, 2023, in a "Show HN: Tabby, a self-hosted GitHub Copilot" post, describing it as a self-hosted Copilot alternative that runs on the user's own hardware.^[7] The earliest version was built on the Hugging Face Transformers and Triton FasterTransformer stack and was demonstrated with small open models based on the GPT-J / GPT-NeoX and Salesforce CodeGen families.^[7] On August 31, 2023, the team shipped the first stable release, v0.0.1, which stabilized the Tabby API specification as a foundation for further development.^[6] By late August 2023 the repository had passed 11,000 GitHub stars.^[3]

Funding

On October 10, 2023, TabbyML announced a $3.2 million seed round to develop its open-source code assistant.^[3] The round was backed by Yunqi Partners and ZooCap;^[3] WVV Capital, a venture firm, also disclosed an investment in the company.^[5] At the time of the announcement the project had been starred more than 11,000 times on GitHub, having launched publicly earlier in 2023.^[3] The company said it would use the capital for product development.^[3]

Round	Date	Amount	Investors
Seed	October 10, 2023 (announced)	$3.2 million	Yunqi Partners, ZooCap;^[3] WVV Capital also disclosed as an investor^[5]

Founders Zhang and Gao framed the bet on a future in which most companies want customization and control over AI coding tools, and on enterprise developers who work with proprietary code that a hosted service like Copilot cannot see. The company has cited the broader market trend that AI coding assistants are expected to be used by a large majority of enterprise software engineers by the end of the decade.

Release timeline

Tabby develops in the open with frequent releases on a 0.x version line. Several releases broadened the product from a pure completion server into a team-oriented platform.

Version	Date	Notable additions
v0.0.1	August 31, 2023	First stable release; stabilized API specification^[6]
v0.3.0	2023	Retrieval-augmented code completion enabled by default, using local declarations and recently modified code^[1]
v0.10.0	April 22, 2024	Reports tab with team-wise usage analytics^[1]
v0.11.0	May 11, 2024	Storage usage stats, GitHub and GitLab integration, Activities page, "Ask Tabby"^[1]
v0.12.0	June 6, 2024	GitLab SSO, self-hosted GitHub/GitLab, HTTP API model integration, repo context in the code browser^[1]
v0.13.0	July 5, 2024	Answer Engine, a central knowledge engine for internal engineering teams^[8]

By early 2026 the project had reached the v0.3x release series and roughly 33,500 GitHub stars.^[17]

Features

Code completion

Tabby's original and central feature is inline code completion. The server uses code models with native fill-in-the-middle support and augments suggestions with retrieval-augmented generation over the local codebase. Since v0.3.0, RAG-based completion is on by default: Tabby incorporates locally relevant snippets, such as declarations surfaced through the editor's language server and recently modified code, into the model's context.^[9] The IDE extensions apply an adaptive caching strategy to keep latency low, and Tabby parses source into Tree-sitter tags to extract structural context.

Answer Engine and chat

The Answer Engine, introduced in v0.13.0 (July 2024), is a question-answering tool surfaced on the Tabby web UI and in the IDE.^[8] It uses a chat model together with retrieved context to answer questions about a team's own code and documentation. Users can select a repository or reference internal documents with an @ mention, and the engine retrieves the relevant code and documents from its index to ground its answers.^[9] Conversations are organized into threads, which are temporary by default but can be made persistent and shared with the team. An optional, beta web-search capability can be enabled with a Serper API key.^[8] Tabby also provides an inline chat experience directly in the editor.

Repository context and indexing

Tabby connects to source repositories from Git, GitHub, GitLab, and similar systems.^[9] It fetches the codebase (and, depending on configuration, pull/merge requests, issues, and commits), parses code into an abstract syntax tree, and stores it in a local index. Retrieval combines semantic search (embedding the query and searching a vector index) with BM25 keyword search, merging the two result sets with reciprocal rank fusion so that both chat/search and completion can draw on relevant repository context during inference.^[9]

IDE and editor extensions

Tabby ships official extensions for VS Code (and VSCodium), the IntelliJ platform IDEs (including IntelliJ IDEA, PyCharm, GoLand, WebStorm, PhpStorm, RubyMine, CLion, Rider, and Android Studio), and Vim/Neovim.^[1] The extensions communicate with the self-hosted server over its HTTP API.

Pochi agent

On July 18, 2025, TabbyML launched Pochi, which it describes as a "full-stack AI teammate," extending the product beyond completion and chat into autonomous, multi-step task handling.^[13] Pochi decomposes a task, plans and executes the work across multiple files, integrates with GitHub (issues, pull requests, code review, and CI/lint/test results), and supports repeatable workflows. Pochi runs on Tabby Cloud and is billed on a usage-based model tied to LLM token consumption, with $20 of free credits included each month.^[13]

Technology

Tabby is written predominantly in Rust (about 93% of the codebase) and runs as a single self-contained binary or Docker container, with no external DBMS or cloud backend required.^[1] It can serve models on consumer-grade GPUs, and smaller models can run on CPU, which supports the project's "no GPU required" deployment options. Installation methods include Docker (with CUDA GPU support), Homebrew, Hugging Face Spaces, and one-click deployments on infrastructure platforms.

Tabby separates the roles of completion, chat, and embedding models, and it is compatible with a range of open code models without custom implementation, including StarCoder, Code Llama, CodeGen, DeepSeek Coder, the Qwen and CodeQwen families, and CodeGemma.^[10] The project maintains its own model registry of ready-to-run options. In addition to running models directly, Tabby can connect to external inference servers over HTTP, including OpenAI-compatible endpoints, llama.cpp, vLLM, and Ollama, letting operators point completion, chat, and embedding at whichever backend fits their hardware and policy.^[11]

The founders deliberately favored compact models in the range of one to three billion parameters, prioritizing low deployment cost and the ability to run on commodity hardware over matching the raw quality of the largest cloud models.

Licensing

The bulk of Tabby is released under the Apache 2.0 license.^[1] The repository carves out an enterprise directory (ee/) governed by a separate license, and third-party components retain their own licenses; everything outside those carve-outs is Apache 2.0. Because of the ee/ exception, GitHub's license detector classifies the repository's overall license as "Other" rather than a clean Apache 2.0 label, even though the core open-source server is Apache 2.0.^[17] This open-core arrangement underpins TabbyML's paid Team and Enterprise tiers.

Pricing

The self-hosted Community edition is free and open source.^[14] TabbyML also sells managed and licensed tiers, and the Pochi agent is billed separately on usage.

Plan	Price	User limit	Highlights
Community	Free	Up to 5 users	Open source, local deployment, code completion, Answer Engine, inline chat, context providers^[14]
Team	$19 per seat / month	Up to 50 users	Adds code browser, usage reports/analytics, email support, flexible deployment^[14]
Enterprise	Custom	Unlimited	Adds SSO, authentication domain, telemetry policy enforcement, bespoke support, dedicated Slack channel, roadmap prioritization^[14]

Tabby Cloud, the hosted option, runs the Pochi agent on usage-based billing with $20 of free monthly credits; tab completion is described as always free without usage limits.^[14] Pricing figures reported by third-party reviews have varied over time, so the company's published tiers are the authoritative reference.

Positioning and reception

Tabby competes in the crowded field of open-source and self-hostable coding assistants. Against GitHub Copilot, its pitch is privacy and control: Copilot is generally regarded as higher-polish and is inexpensive per seat, but it routes code context through GitHub's servers, which rules it out for many regulated teams.^[16] Against Continue, an open-source IDE extension that ships no backend of its own and instead connects to whatever model server the user supplies, Tabby differentiates by shipping a full server with repository indexing, a web UI, team administration, and SSO, giving a more complete out-of-the-box experience. Against Sourcegraph's Cody, Tabby offers comparable repository-aware retrieval without requiring the Sourcegraph platform; Cody, for its part, moved to an enterprise-only model in 2025.

Independent reviewers in 2026 generally describe Tabby as the strongest drop-in, self-hosted Copilot replacement for teams that must keep code on their own hardware, citing its swappable open models, lack of per-seat cost when self-hosted, and ability to run on modest GPUs.^[15] Commonly noted trade-offs are that it requires GPU hardware and technical setup, that suggestion quality depends heavily on the chosen model and hardware (smaller local models produce weaker completions than the largest cloud services), and that the chat experience is less polished than commercial tools such as Copilot or Cursor.^[16] The cost argument for self-hosting strengthens as team size grows, since one server can serve an unlimited number of developers.

References

"TabbyML/tabby: Self-hosted AI coding assistant." GitHub, accessed 2026-06-04. https://github.com/TabbyML/tabby ↩
"Tabby, Opensource, self-hosted AI coding assistant." TabbyML, accessed 2026-06-04. https://www.tabbyml.com/ ↩
"TabbyML, an open source challenger to GitHub Copilot, raises $3.2 million." TechCrunch, 2023-10-10. https://techcrunch.com/2023/10/10/tabbyml-github-copilot-alternative-raises-3-2-million/ ↩
"TabbyML raises $3.2M for its open-source AI coding assistant." SiliconANGLE, 2023-10-10. https://siliconangle.com/2023/10/10/tabbyml-raises-3-2m-open-source-ai-coding-assistant/ ↩
"WVV Capital's Investment in TabbyML." WVV Capital (Medium), 2023. https://medium.com/@WVVCapital/wvv-capitals-investment-in-tabbyml-842403708e27 ↩
"Introducing First Stable Release: v0.0.1." TabbyML, 2023-08-31. https://www.tabbyml.com/blog/first-stable-release ↩
"Show HN: Tabby, a self-hosted GitHub Copilot." Hacker News, 2023-04-06. https://news.ycombinator.com/item?id=35470915 ↩
"Answer Engine." Tabby Documentation, accessed 2026-06-04. https://tabby.tabbyml.com/docs/administration/answer-engine/ ↩
"Context Providers." Tabby Documentation, accessed 2026-06-04. https://tabby.tabbyml.com/docs/administration/context/ ↩
"Model Configuration." Tabby Documentation, accessed 2026-06-04. https://tabby.tabbyml.com/docs/administration/model/ ↩
"Models HTTP API: llama.cpp / vLLM / Ollama / OpenAI." Tabby Documentation, accessed 2026-06-04. https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/ ↩
"Single Sign-On." Tabby Documentation, accessed 2026-06-04. https://tabby.tabbyml.com/docs/administration/sso/
"Agent (Pochi)." TabbyML, accessed 2026-06-04. https://www.tabbyml.com/agent ↩
"Pricing." TabbyML, accessed 2026-06-04. https://www.tabbyml.com/pricing ↩
"TabbyML Review (2026)." MakerStack, 2026. https://makerstack.co/reviews/tabbyml-review/ ↩
"Self-Hosted AI Coding Assistants: Copilot Alternatives for Privacy-First Teams (2026)." DanubeData, 2026. https://danubedata.ro/blog/self-host-ai-coding-assistant-2026 ↩
"TabbyML/tabby." GitHub REST API (repository metadata), accessed 2026-06-04. https://api.github.com/repos/TabbyML/tabby ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Cline (AI coding agent)

Overview

History

Founding

Funding

Release timeline

Features

Code completion

Answer Engine and chat

Repository context and indexing

IDE and editor extensions

Pochi agent

Technology

Licensing

Pricing

Positioning and reception

Related

References

Improve this article

Related Articles

Cline (AI coding agent)

Roo Code

Gemini CLI

OpenHands

opencode (SST)

Claude Code

What links here

Related Articles

Cline (AI coding agent)

Roo Code

Gemini CLI

OpenHands

opencode (SST)

Claude Code