Dev tools
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,991 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,991 words
Add missing citations, update stale details, or suggest a clearer explanation.
AI dev tools are the software products, libraries, and platforms that developers use to build, ship, and operate applications powered by [[artificial intelligence]] models. The category covers everything from AI-native code editors that complete or rewrite code on demand, to inference engines that serve [[large language models]] on GPUs, to vector databases that store embeddings for retrieval, and to observability platforms that monitor what production agents are doing. The space exploded in size and capital between 2023 and 2026 as foundation models from [[OpenAI]], [[Anthropic]], [[Google DeepMind]], [[Meta AI]], and [[Mistral AI]] became capable enough to write production-quality code and orchestrate multi-step tasks. By early 2026 individual companies in the space, such as [[Anysphere]] (the maker of [[Cursor]]), had crossed the US$2 billion annual recurring revenue mark, and the broader landscape supports millions of professional developers across every major language and runtime.
This gateway page collects the dominant tools in each subcategory, with notes on what each is best at, who builds it, and how the pieces fit together. The categories overlap heavily in practice. A typical AI-native team might write code in [[Cursor]], run an autonomous agent like [[Claude Code]] in the terminal for refactors, host a fine-tuned model on [[Together AI]], retrieve context from [[Pinecone]], orchestrate the pipeline with [[LangGraph]], proxy traffic through [[LiteLLM]], and monitor everything with [[Langfuse]] or [[LangSmith]]. Each layer has its own competitive market and its own learning curve.
A new generation of code editors treats the [[language model]] as a first-class citizen rather than a sidecar. These tools index the entire repository, expose tools to the model so it can read and write files and run shells, and provide chat or agent interfaces alongside the traditional editor pane.
[[Cursor]] is the best-known example. Built by [[Anysphere]] as a fork of [[Visual Studio Code]], Cursor combines completions, multi-file edits, an agent mode, and a built-in chat that can reason over the open project. The company raised US$900 million in mid-2025 and another US$2.3 billion in November 2025 at a US$29.3 billion valuation, with reported annual recurring revenue above US$2 billion by early 2026. In November 2024 Cursor acquired [[Supermaven]], a low-latency completion startup, and in December 2025 it acquired [[Graphite]], a code-review tool. Notable 2025 features include Bugbot, a debugging agent that integrates with [[GitHub]], and a visual web designer.
[[Windsurf]] is the AI-native editor from [[Codeium]], launched in late 2024 around an agentic system called Cascade that can plan, edit across files, and run terminal commands. In May 2025 [[OpenAI]] announced a US$3 billion deal to acquire Windsurf. The exclusivity expired in July 2025 and the deal collapsed. In December 2025 [[Cognition AI]] acquired Windsurf for roughly US$250 million, with [[Google]] taking a separate licensing arrangement.
[[Claude Code]] is [[Anthropic]]'s terminal-native coding agent, launched as a research preview in February 2025. It runs as a CLI inside any project directory and can read, write, and execute code. By 2026 it had grown sub-agent delegation, persistent hierarchical memory, custom slash commands, scheduled routines, and first-class [[Model Context Protocol]] integration.
[[Devin]], from [[Cognition AI]], was unveiled in March 2024 as the self-described "first AI software engineer." Devin runs in a sandboxed environment with a shell, browser, and editor. On the SWE-bench benchmark it resolved 13.86 percent of issues end to end at launch, far above the prior state of the art. Goldman Sachs publicly described Devin as employee number one of its hybrid workforce in 2025.
[[Codex CLI]] is [[OpenAI]]'s open-source terminal coding agent, written in [[Rust]]. It launches in agent mode by default, reads files, runs commands, and writes patches. Codex CLI installs through npm i -g @openai/codex or brew install --cask codex and is bundled with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. It can script repeatable workflows through codex exec and connect to external tools through MCP.
[[Aider]] is a long-running open-source AI pair programmer, distributed as a Python CLI under an Apache 2.0 license with around 41,000 stars on [[GitHub]]. It works in any terminal, edits files in place, and commits changes through Git so every step is reviewable.
[[Cline]] is an open-source coding agent that runs as a [[VS Code]] extension. With around 58,600 stars and an Apache 2.0 license, it can create and modify files, execute shell commands, browse the web through a headless browser, and call MCP tools. [[Roo Code]] is a fork of Cline that introduces a multi-mode workflow with separate Code, Architect, Ask, and Debug personas.
[[Continue]] is a popular open-source IDE extension for [[VS Code]] and [[JetBrains]] editors. It has around 31,000 stars and lets users plug in any model provider. [[Augment Code]] is a coding agent for large enterprise codebases; its proprietary Context Engine semantically indexes hundreds of thousands of files and is also exposed through [[Model Context Protocol]] so agents like Cursor, Claude Code, and Codex can borrow its retrieval.
[[Zed]] is a high-performance multiplayer code editor from the creators of [[Atom]] and [[Tree-sitter]]. Zed open-sourced its codebase in January 2024. Zed AI bundles an in-line transformation feature, an agentic editing mode, and Zeta2, an open-source prediction model for next-token suggestions. The editor offers real-time collaborative editing built on conflict-free replicated data types.
[[Replit Agent]] is the autonomous build-from-prompt agent inside the [[Replit]] cloud development platform. Agent provisions environments, installs dependencies, writes code, manages databases, deploys applications, and verifies its own work in a headless browser. Replit Agent 3, launched in 2025, runs independent tasks in parallel.
Where AI-native editors rebuild the editor around the model, code assistants slot into existing IDEs through extensions and focus mainly on completions, chat, and inline edits.
[[GitHub Copilot]] is the largest code assistant by paid seats. It started as a completion tool for [[VS Code]] in 2021, expanded into chat, and during 2025 added agent mode and an asynchronous coding agent that runs background tasks through [[GitHub Actions]] and submits results as pull requests. The 2025 generation of Copilot supports multi-file edits, model selection between [[GPT]], [[Claude]], and [[Gemini]] backends, next edit suggestions, custom agents defined in .github/agents/, and inline agent mode in JetBrains, Eclipse, and Xcode IDEs. Copilot's coding agent now reviews its own changes, runs code scanning, secret scanning, and dependency vulnerability checks before opening pull requests.
[[Codeium]] (rebranded as Windsurf for the editor product) offers a free completion and chat assistant for almost every IDE, monetized mainly through enterprise self-hosted deployments. The Windsurf Editor is the company's flagship paid product.
[[Tabnine]] focuses on the enterprise market with a hybrid local and cloud architecture. The local model handles short completions on-device, while a cloud model handles longer completions and chat. Tabnine offers air-gapped on-premises deployments, custom model training on private codebases, and strict privacy controls aimed at regulated industries.
[[Cody]], built by [[Sourcegraph]], leans on Sourcegraph's universal code search index to give the model deep, repository-wide context. Cody offers chat, inline edits, and prompt recipes, and uses third-party LLMs from [[OpenAI]], [[Anthropic]], and others rather than a proprietary model.
[[Supermaven]] is a low-latency completion engine that pioneered very long context windows for code completion. Anysphere acquired Supermaven in November 2024 and folded the team into Cursor.
Application frameworks help developers compose calls to LLMs into pipelines, agents, and retrieval workflows without writing the plumbing each time.
[[LangChain]] is the most widely adopted framework. It provides abstractions for prompt templates, output parsers, memory, retrieval, and tool use, and integrates with more than fifty LLM providers and over a hundred vector stores. It is often paired with [[LangGraph]] for stateful workflows and [[LangSmith]] for tracing.
[[LangGraph]] is LangChain's stateful orchestration library. It expresses an agent as a directed graph of nodes and edges, where nodes run tool or model calls and edges determine state transitions. This pattern suits long-running agents, human-in-the-loop workflows, and pipelines where deterministic control flow matters.
[[LlamaIndex]] specializes in retrieval-augmented generation. It provides data connectors for hundreds of sources, document parsers, indexing strategies, and query engines, and is widely used as the data ingestion layer in production [[RAG]] systems.
[[Haystack]], maintained by [[deepset]], is an open-source Python framework for production LLM applications. Haystack 2.0 reorganized the framework around composable pipelines and agent workflows. It integrates with [[OpenAI]], [[Anthropic]], [[Mistral]], [[Hugging Face]], [[Weaviate]], [[Pinecone]], [[Elasticsearch]], and many others.
[[AutoGen]], originally developed at [[Microsoft Research]], focuses on multi-agent conversations. Agents converse with each other and with humans to solve tasks, each specialized for a role such as planner, coder, or critic.
[[CrewAI]] adopts a role-based model inspired by real-world teams. Developers define a crew of agents with role descriptions, goals, and tools, and CrewAI orchestrates them through sequential or parallel processes.
[[DSPy]], from [[Stanford NLP]], reframes prompting as program optimization. Developers declare modules with input and output signatures, and DSPy automatically optimizes prompts and few-shot examples against labeled data. It is most useful when teams have evaluation datasets.
[[Smol Agents]] (smolagents) is [[Hugging Face]]'s minimalist agent library and the successor to transformers.agents. The core agent logic fits in roughly a thousand lines of code. Smol Agents emphasizes Code Agents, where the agent writes and executes Python rather than emitting JSON tool calls, an approach Hugging Face reports cuts steps and LLM calls by about thirty percent. It supports sandboxed execution through Docker, E2B, Modal, Pyodide and Deno WebAssembly, or Blaxel.
Inference engines turn raw model weights into running services that answer requests fast enough for production use. The choice of engine has large effects on throughput, latency, hardware utilization, and operational complexity.
[[vLLM]] is the production default for many teams. It uses [[PagedAttention]] to keep GPU memory packed and continuous batching to keep accelerators busy, delivering near-instant time to first token and low inter-token latency. vLLM ships an [[OpenAI]]-compatible HTTP server and runs on [[NVIDIA]] [[CUDA]], [[AMD]] [[ROCm]], and other backends.
[[SGLang]] shines on workloads with shared context such as chat with long histories, [[RAG]], and agent loops. Its [[RadixAttention]] feature caches shared prefix computation across requests, and benchmarks have shown SGLang outperforming vLLM by roughly thirty percent on shared-context workloads.
[[Hugging Face Text Generation Inference]] (TGI) is the inference server [[Hugging Face]] maintains. TGI emphasizes ease of deployment, broad model support, and strong tooling integration with the rest of the Hugging Face ecosystem.
[[llama.cpp]] is the CPU-first, portable inference engine famous for running [[Llama]] and many open models on laptops, phones, and embedded devices. It supports [[CUDA]], [[ROCm]], [[Metal]], [[Vulkan]], [[SYCL]], pure CPU, and [[WebAssembly]] backends, and is the default backend for many quantized [[GGUF]] workflows.
[[MLX]] is [[Apple]]'s open-source array framework optimized for [[Apple Silicon]]. MLX leverages Metal 4 Tensor Operations and the Neural Accelerators introduced in the M5 chip to deliver up to four times the time-to-first-token of an M4 baseline. [[Ollama]] on Apple Silicon is now built on top of MLX.
[[TensorRT-LLM]] is [[NVIDIA]]'s production inference library for [[NVIDIA GPUs]]. It compiles models to highly optimized engines and supports in-flight batching and speculative decoding. [[Apple]] collaborated with NVIDIA to integrate ReDrafter, an RNN-based speculative decoding approach, achieving up to 3.5 extra tokens per generation step on open models.
[[Ollama]] is the developer-experience champion for running models locally. It bundles model weights, a server, and a CLI into a single install. Ollama is widely used for prototyping, though it is not designed for high-concurrency production traffic. [[LM Studio]] offers a similar local experience with a graphical UI.
Vector databases store the embeddings used in [[RAG]], semantic search, recommendation, and many agent memory patterns. Different products optimize for different points on the cost, scale, and operational-complexity curve.
[[Pinecone]] is a fully managed serverless vector database. It handles scaling, replication, and indexing automatically, supports billions of vectors, and is widely used by teams that want to build applications rather than operate databases.
[[Weaviate]] is an open-source vector database with strong support for hybrid search that mixes BM25 keyword scoring with vector similarity. It integrates pre-trained text and image vectorizers and is available self-hosted or as Weaviate Cloud Service.
[[Qdrant]] is an open-source vector search engine written in [[Rust]]. It supports billions of vectors with rich payload filtering, scalar quantization, and binary quantization, and offers self-hosted and cloud deployments.
[[Chroma]] is a developer-first open-source embedding database focused on simplicity. It runs in process, supports persistence to disk, and exposes a small Python API that pairs cleanly with [[LangChain]] and [[LlamaIndex]].
[[Milvus]], from [[Zilliz]], emphasizes raw performance and indexing flexibility, supporting [[HNSW]], [[IVF]], [[DiskANN]], and other algorithms. The hosted Zilliz Cloud product extends Milvus with managed scaling.
[[pgvector]] is a [[PostgreSQL]] extension that adds vector columns and approximate nearest neighbor search to a standard relational database. It works well up to roughly 10 to 100 million vectors before performance degrades.
[[LanceDB]] is a serverless vector database built on the [[Lance]] columnar format. It targets multimodal data and runs both as an embedded library and as a managed service.
LLM applications are non-deterministic, expensive, and easy to regress, so observability and evaluation tools have become essential to running them in production.
[[LangSmith]] is the commercial observability and evaluation product from the LangChain team. It captures traces, supports human and automated evaluations, and offers prompt management and dataset tools. It works with any provider.
[[Langfuse]] is an open-source ([[MIT]] license) observability platform that uses a centralized [[PostgreSQL]] backend. It provides detailed tracing for complex workflows, prompt management, and dataset-driven evaluation, and is popular with self-hosted teams.
[[Helicone]] takes a one-line proxy approach: developers point their LLM client at the Helicone endpoint and get traces, caching, rate limiting, and cost analytics. Helicone is open source and can be self-hosted with [[Docker]] or [[Kubernetes]]. Its backend uses [[ClickHouse]] and [[Kafka]].
[[Weights & Biases]] [[Weave]] is the LLM-focused product line from the W&B suite, building on the ML experiment-tracking heritage of the parent platform.
[[Braintrust]] specializes in evaluations with an optimized trace search database, widely used by teams that want to run large evaluation sweeps before shipping prompt or model changes.
[[Promptfoo]] is an open-source CLI and library for prompt evaluation, red-teaming, and security testing. It compares outputs across more than fifty providers and is reportedly used by [[OpenAI]], [[Anthropic]], and 156 of the [[Fortune 500]].
[[Inspect AI]] is an open-source evaluation framework developed by the [[UK AI Security Institute]] and Meridian Labs. It includes more than 200 prebuilt evaluations covering coding, agentic tasks, reasoning, knowledge, behavior, and multimodal understanding.
Model platforms host model weights, expose them as APIs, and often provide fine-tuning, hardware, and deployment tooling on top.
[[Hugging Face]] is the central hub for open-source machine learning. The Hub hosts hundreds of thousands of models across every domain, plus datasets and Spaces for hosted demos. Hugging Face Inference Providers unify fifteen-plus partners (including [[Replicate]], [[Together AI]], [[Fireworks AI]], and [[SambaNova]]) under a single OpenAI-compatible endpoint.
[[Replicate]] is an API-first serverless inference platform. It excels at generative AI models, especially niche image, video, and audio models, and lets developers run models without provisioning hardware. For text models 7B and below, Replicate typically averages 2 to 5 seconds of cold latency, suitable for batch and async work but not always for low-latency chat.
[[Together AI]] focuses on production deployments of open-source models. It offers tuned versions of [[Llama]], [[Mixtral]], [[Qwen]], and other popular families, with optimized infrastructure and fine-tuning options for chat, image, code, and audio models.
[[Fireworks AI]] targets low-latency, high-throughput inference for production. It is one of the fastest hosting options for major open-source models and offers fine-tuning, but typically requires higher minimum spend than the alternatives.
[[Anyscale]], the company behind [[Ray]], offers managed Ray clusters and Ray-based serving for both training and inference at scale. It is favored by teams running custom ML workloads alongside LLM inference.
[[OpenRouter]] is a unified gateway and marketplace for hosted models. It exposes virtually every commercial and open model behind a single OpenAI-compatible API, with provider routing, automatic fallback, model discovery, and consolidated billing. OpenRouter also offers a Free Models Router that selects free models at random and an Auto Router that picks a provider based on the features the request requires.
Gateway and proxy tools sit between applications and model providers to add routing, caching, cost control, and policy enforcement.
[[LiteLLM]], from [[BerriAI]], is the dominant open-source AI gateway. It provides a [[Python]] SDK and a self-hostable proxy server that exposes more than 100 LLM providers behind an OpenAI-compatible interface. LiteLLM adds cost tracking, guardrails, load balancing, retries, fallbacks, virtual API keys, and an admin UI. It is the default choice for teams that want self-hosted routing inside their own infrastructure.
[[Helicone]] (also covered under observability) doubles as a gateway when used as a proxy, offering caching and rate limiting alongside tracing.
[[Portkey]] is a managed AI gateway that emphasizes production features such as semantic caching, observability, prompt management, and policy controls. Companies using Portkey commonly report 30 to 50 percent reductions in LLM costs from semantic caching alone, making it popular at startups where LLM spend has become significant.
[[Kong AI Gateway]] extends the [[Kong]] API gateway with LLM-specific features such as prompt templating, request transformation, and provider routing for teams that already standardize on Kong.
Notebook environments and IDEs have a rich plugin ecosystem so that AI features can be added on top of familiar surfaces.
[[Jupyter AI]] is the official open-source [[JupyterLab]] extension for AI agents. It provides a chat interface and supports [[Claude]], [[Codex]], [[GitHub Copilot]], [[Gemini]], [[Goose]], [[Kiro]], Mistral Vibe, and [[OpenCode]] through the [[Agent Client Protocol]]. Agents can read and write files, run terminal commands, and interact with notebooks through a built-in [[MCP]] server. It also supports [[IPython]], [[Colab]], and [[VS Code]] through %%ai magic commands.
[[Notebook Intelligence]] is an extensible AI framework for [[JupyterLab]] that uses [[GitHub Copilot]] under the hood. [[Colab AI]] is the [[Google Colab]] integration that surfaces [[Gemini]] inside the hosted notebook product, with code generation, code explanation, and error help. Cursor extensions, JetBrains Copilot plugins, and community VS Code extensions round out the IDE plugin space.
The [[Model Context Protocol]] (MCP) is an open standard introduced by [[Anthropic]] in November 2024 that defines how AI applications connect to data sources and tools. MCP separates the model client from the systems it acts on through MCP servers, which expose tools, resources, and prompts over a structured JSON-RPC channel.
The ecosystem grew rapidly. By late 2025 there were more than ten thousand active public MCP servers covering developer tools, productivity apps, databases, and Fortune 500 systems. Anthropic shipped reference servers for [[Google Drive]], [[Slack]], [[GitHub]], [[Git]], [[Postgres]], and [[Puppeteer]], and the community built thousands more.
Early adopters included [[Block]] and [[Apollo]], with developer-tool companies such as [[Zed]], [[Replit]], [[Codeium]], and [[Sourcegraph]] integrating MCP into their products. Major AI providers, including [[OpenAI]] and [[Google DeepMind]], adopted the protocol during 2025, making it the default integration channel for agent applications across vendors.
In December 2025 [[Anthropic]] donated MCP to the [[Agentic AI Foundation]] (AAIF), a directed fund under the [[Linux Foundation]] co-founded by [[Anthropic]], [[Block]], and [[OpenAI]]. The donation was intended to make MCP a vendor-neutral standard with broad governance.
The following table summarizes representative tools in each subcategory.
| Category | Representative tools | Typical use |
|---|---|---|
| AI-native editors | [[Cursor]], [[Windsurf]], [[Zed]], [[Replit Agent]] | Day-to-day coding with model in the loop |
| CLI coding agents | [[Claude Code]], [[Codex CLI]], [[Aider]], [[Devin]] | Long-running automated edits and refactors |
| IDE coding agents | [[Cline]], [[Roo Code]], [[Continue]], [[Augment Code]] | Open-source agents inside [[VS Code]] |
| Code assistants | [[GitHub Copilot]], [[Codeium]], [[Tabnine]], [[Cody]], [[Supermaven]] | Completions and chat in any IDE |
| App frameworks | [[LangChain]], [[LangGraph]], [[LlamaIndex]], [[Haystack]], [[AutoGen]], [[CrewAI]], [[DSPy]], [[Smol Agents]] | Building agents and RAG pipelines |
| Inference engines | [[vLLM]], [[SGLang]], [[TGI]], [[llama.cpp]], [[MLX]], [[TensorRT-LLM]], [[Ollama]], [[LM Studio]] | Serving open-source models |
| Vector databases | [[Pinecone]], [[Weaviate]], [[Qdrant]], [[Chroma]], [[Milvus]], [[pgvector]], [[LanceDB]] | Embedding storage and retrieval |
| Observability | [[LangSmith]], [[Langfuse]], [[Helicone]], [[Weights & Biases]] [[Weave]], [[Braintrust]], [[Promptfoo]], [[Inspect AI]] | Tracing, evaluation, and monitoring |
| Model platforms | [[Hugging Face]], [[Replicate]], [[Together AI]], [[Fireworks AI]], [[Anyscale]], [[OpenRouter]] | Hosted inference and fine-tuning |
| AI gateways | [[LiteLLM]], [[Helicone]], [[Portkey]], [[Kong AI Gateway]] | Routing, caching, cost control |
| Notebook plugins | [[Jupyter AI]], [[Notebook Intelligence]], [[Colab AI]] | AI assistance inside notebooks |
| Protocol | [[Model Context Protocol]] | Standard for tool and data integration |
Most production AI systems combine tools across categories. A retrieval-augmented chat application might ingest documents through [[LlamaIndex]], embed them with an [[OpenAI]] or [[Cohere]] embedding model, store them in [[Pinecone]] or [[pgvector]], orchestrate the chat flow through [[LangGraph]], call models through [[LiteLLM]] for cost control, and trace everything to [[Langfuse]] or [[LangSmith]]. A coding agent product might be built on top of [[Claude]] or [[GPT]] APIs, use [[MCP]] servers to access GitHub, files, and the shell, route through [[OpenRouter]], and run evaluations with [[Inspect AI]] or [[Promptfoo]] before each release.
The boundaries blur quickly. [[Helicone]] is both a gateway and an observability platform. [[Hugging Face]] is a model hub, an inference provider, and a framework author through [[Smol Agents]]. [[Cursor]] is an editor that has acquired both a completion engine ([[Supermaven]]) and a code-review tool ([[Graphite]]).
Many of the tools above ship in both open-source and commercial forms. Open-source options such as [[LangChain]], [[LlamaIndex]], [[vLLM]], [[Aider]], [[Cline]], [[Continue]], [[Langfuse]], [[Promptfoo]], and [[LiteLLM]] give teams full control, the ability to self-host, and predictable costs that scale with hardware rather than usage. Commercial options such as [[Cursor]], [[GitHub Copilot]], [[Devin]], [[Pinecone]], [[LangSmith]], and [[Together AI]] offer turnkey experiences, hosted infrastructure, vendor support, and faster onboarding at the cost of usage-based fees and vendor dependence.
A common pattern is to prototype with managed services, then migrate hot paths to self-hosted alternatives once cost or latency justifies the operational burden. A team might start on [[Pinecone]], then move to self-hosted [[Qdrant]] or [[Weaviate]] when monthly spend crosses a threshold, while continuing to use [[OpenAI]] or [[Anthropic]] APIs for the highest-quality model calls.