Dev tools
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,991 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,991 words
Add missing citations, update stale details, or suggest a clearer explanation.
AI dev tools are the software products, libraries, and platforms that developers use to build, ship, and operate applications powered by artificial intelligence models. The category covers everything from AI-native code editors that complete or rewrite code on demand, to inference engines that serve large language models on GPUs, to vector databases that store embeddings for retrieval, and to observability platforms that monitor what production agents are doing. The space exploded in size and capital between 2023 and 2026 as foundation models from OpenAI, Anthropic, Google DeepMind, Meta AI, and Mistral AI became capable enough to write production-quality code and orchestrate multi-step tasks. By early 2026 individual companies in the space, such as Anysphere (the maker of Cursor), had crossed the US$2 billion annual recurring revenue mark, and the broader landscape supports millions of professional developers across every major language and runtime.
This gateway page collects the dominant tools in each subcategory, with notes on what each is best at, who builds it, and how the pieces fit together. The categories overlap heavily in practice. A typical AI-native team might write code in Cursor, run an autonomous agent like Claude Code in the terminal for refactors, host a fine-tuned model on Together AI, retrieve context from Pinecone, orchestrate the pipeline with LangGraph, proxy traffic through LiteLLM, and monitor everything with Langfuse or LangSmith. Each layer has its own competitive market and its own learning curve.
A new generation of code editors treats the language model as a first-class citizen rather than a sidecar. These tools index the entire repository, expose tools to the model so it can read and write files and run shells, and provide chat or agent interfaces alongside the traditional editor pane.
Cursor is the best-known example. Built by Anysphere as a fork of Visual Studio Code, Cursor combines completions, multi-file edits, an agent mode, and a built-in chat that can reason over the open project. The company raised US$900 million in mid-2025 and another US$2.3 billion in November 2025 at a US$29.3 billion valuation, with reported annual recurring revenue above US$2 billion by early 2026. In November 2024 Cursor acquired Supermaven, a low-latency completion startup, and in December 2025 it acquired Graphite, a code-review tool. Notable 2025 features include Bugbot, a debugging agent that integrates with GitHub, and a visual web designer.
Windsurf is the AI-native editor from Codeium, launched in late 2024 around an agentic system called Cascade that can plan, edit across files, and run terminal commands. In May 2025 OpenAI announced a US$3 billion deal to acquire Windsurf. The exclusivity expired in July 2025 and the deal collapsed. In December 2025 Cognition AI acquired Windsurf for roughly US$250 million, with Google taking a separate licensing arrangement.
Claude Code is Anthropic's terminal-native coding agent, launched as a research preview in February 2025. It runs as a CLI inside any project directory and can read, write, and execute code. By 2026 it had grown sub-agent delegation, persistent hierarchical memory, custom slash commands, scheduled routines, and first-class Model Context Protocol integration.
Devin, from Cognition AI, was unveiled in March 2024 as the self-described "first AI software engineer." Devin runs in a sandboxed environment with a shell, browser, and editor. On the SWE-bench benchmark it resolved 13.86 percent of issues end to end at launch, far above the prior state of the art. Goldman Sachs publicly described Devin as employee number one of its hybrid workforce in 2025.
Codex CLI is OpenAI's open-source terminal coding agent, written in Rust. It launches in agent mode by default, reads files, runs commands, and writes patches. Codex CLI installs through npm i -g @openai/codex or brew install --cask codex and is bundled with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. It can script repeatable workflows through codex exec and connect to external tools through MCP.
Aider is a long-running open-source AI pair programmer, distributed as a Python CLI under an Apache 2.0 license with around 41,000 stars on GitHub. It works in any terminal, edits files in place, and commits changes through Git so every step is reviewable.
Cline is an open-source coding agent that runs as a VS Code extension. With around 58,600 stars and an Apache 2.0 license, it can create and modify files, execute shell commands, browse the web through a headless browser, and call MCP tools. Roo Code is a fork of Cline that introduces a multi-mode workflow with separate Code, Architect, Ask, and Debug personas.
Continue is a popular open-source IDE extension for VS Code and JetBrains editors. It has around 31,000 stars and lets users plug in any model provider. Augment Code is a coding agent for large enterprise codebases; its proprietary Context Engine semantically indexes hundreds of thousands of files and is also exposed through Model Context Protocol so agents like Cursor, Claude Code, and Codex can borrow its retrieval.
Zed is a high-performance multiplayer code editor from the creators of Atom and Tree-sitter. Zed open-sourced its codebase in January 2024. Zed AI bundles an in-line transformation feature, an agentic editing mode, and Zeta2, an open-source prediction model for next-token suggestions. The editor offers real-time collaborative editing built on conflict-free replicated data types.
Replit Agent is the autonomous build-from-prompt agent inside the Replit cloud development platform. Agent provisions environments, installs dependencies, writes code, manages databases, deploys applications, and verifies its own work in a headless browser. Replit Agent 3, launched in 2025, runs independent tasks in parallel.
Where AI-native editors rebuild the editor around the model, code assistants slot into existing IDEs through extensions and focus mainly on completions, chat, and inline edits.
GitHub Copilot is the largest code assistant by paid seats. It started as a completion tool for VS Code in 2021, expanded into chat, and during 2025 added agent mode and an asynchronous coding agent that runs background tasks through GitHub Actions and submits results as pull requests. The 2025 generation of Copilot supports multi-file edits, model selection between GPT, Claude, and Gemini backends, next edit suggestions, custom agents defined in .github/agents/, and inline agent mode in JetBrains, Eclipse, and Xcode IDEs. Copilot's coding agent now reviews its own changes, runs code scanning, secret scanning, and dependency vulnerability checks before opening pull requests.
Codeium (rebranded as Windsurf for the editor product) offers a free completion and chat assistant for almost every IDE, monetized mainly through enterprise self-hosted deployments. The Windsurf Editor is the company's flagship paid product.
Tabnine focuses on the enterprise market with a hybrid local and cloud architecture. The local model handles short completions on-device, while a cloud model handles longer completions and chat. Tabnine offers air-gapped on-premises deployments, custom model training on private codebases, and strict privacy controls aimed at regulated industries.
Cody, built by Sourcegraph, leans on Sourcegraph's universal code search index to give the model deep, repository-wide context. Cody offers chat, inline edits, and prompt recipes, and uses third-party LLMs from OpenAI, Anthropic, and others rather than a proprietary model.
Supermaven is a low-latency completion engine that pioneered very long context windows for code completion. Anysphere acquired Supermaven in November 2024 and folded the team into Cursor.
Application frameworks help developers compose calls to LLMs into pipelines, agents, and retrieval workflows without writing the plumbing each time.
LangChain is the most widely adopted framework. It provides abstractions for prompt templates, output parsers, memory, retrieval, and tool use, and integrates with more than fifty LLM providers and over a hundred vector stores. It is often paired with LangGraph for stateful workflows and LangSmith for tracing.
LangGraph is LangChain's stateful orchestration library. It expresses an agent as a directed graph of nodes and edges, where nodes run tool or model calls and edges determine state transitions. This pattern suits long-running agents, human-in-the-loop workflows, and pipelines where deterministic control flow matters.
LlamaIndex specializes in retrieval-augmented generation. It provides data connectors for hundreds of sources, document parsers, indexing strategies, and query engines, and is widely used as the data ingestion layer in production RAG systems.
Haystack, maintained by deepset, is an open-source Python framework for production LLM applications. Haystack 2.0 reorganized the framework around composable pipelines and agent workflows. It integrates with OpenAI, Anthropic, Mistral, Hugging Face, Weaviate, Pinecone, Elasticsearch, and many others.
AutoGen, originally developed at Microsoft Research, focuses on multi-agent conversations. Agents converse with each other and with humans to solve tasks, each specialized for a role such as planner, coder, or critic.
CrewAI adopts a role-based model inspired by real-world teams. Developers define a crew of agents with role descriptions, goals, and tools, and CrewAI orchestrates them through sequential or parallel processes.
DSPy, from Stanford NLP, reframes prompting as program optimization. Developers declare modules with input and output signatures, and DSPy automatically optimizes prompts and few-shot examples against labeled data. It is most useful when teams have evaluation datasets.
Smol Agents (smolagents) is Hugging Face's minimalist agent library and the successor to transformers.agents. The core agent logic fits in roughly a thousand lines of code. Smol Agents emphasizes Code Agents, where the agent writes and executes Python rather than emitting JSON tool calls, an approach Hugging Face reports cuts steps and LLM calls by about thirty percent. It supports sandboxed execution through Docker, E2B, Modal, Pyodide and Deno WebAssembly, or Blaxel.
Inference engines turn raw model weights into running services that answer requests fast enough for production use. The choice of engine has large effects on throughput, latency, hardware utilization, and operational complexity.
vLLM is the production default for many teams. It uses PagedAttention to keep GPU memory packed and continuous batching to keep accelerators busy, delivering near-instant time to first token and low inter-token latency. vLLM ships an OpenAI-compatible HTTP server and runs on NVIDIA CUDA, AMD ROCm, and other backends.
SGLang shines on workloads with shared context such as chat with long histories, RAG, and agent loops. Its RadixAttention feature caches shared prefix computation across requests, and benchmarks have shown SGLang outperforming vLLM by roughly thirty percent on shared-context workloads.
Hugging Face Text Generation Inference (TGI) is the inference server Hugging Face maintains. TGI emphasizes ease of deployment, broad model support, and strong tooling integration with the rest of the Hugging Face ecosystem.
llama.cpp is the CPU-first, portable inference engine famous for running Llama and many open models on laptops, phones, and embedded devices. It supports CUDA, ROCm, Metal, Vulkan, SYCL, pure CPU, and WebAssembly backends, and is the default backend for many quantized GGUF workflows.
MLX is Apple's open-source array framework optimized for Apple Silicon. MLX leverages Metal 4 Tensor Operations and the Neural Accelerators introduced in the M5 chip to deliver up to four times the time-to-first-token of an M4 baseline. Ollama on Apple Silicon is now built on top of MLX.
TensorRT-LLM is NVIDIA's production inference library for NVIDIA GPUs. It compiles models to highly optimized engines and supports in-flight batching and speculative decoding. Apple collaborated with NVIDIA to integrate ReDrafter, an RNN-based speculative decoding approach, achieving up to 3.5 extra tokens per generation step on open models.
Ollama is the developer-experience champion for running models locally. It bundles model weights, a server, and a CLI into a single install. Ollama is widely used for prototyping, though it is not designed for high-concurrency production traffic. LM Studio offers a similar local experience with a graphical UI.
Vector databases store the embeddings used in RAG, semantic search, recommendation, and many agent memory patterns. Different products optimize for different points on the cost, scale, and operational-complexity curve.
Pinecone is a fully managed serverless vector database. It handles scaling, replication, and indexing automatically, supports billions of vectors, and is widely used by teams that want to build applications rather than operate databases.
Weaviate is an open-source vector database with strong support for hybrid search that mixes BM25 keyword scoring with vector similarity. It integrates pre-trained text and image vectorizers and is available self-hosted or as Weaviate Cloud Service.
Qdrant is an open-source vector search engine written in Rust. It supports billions of vectors with rich payload filtering, scalar quantization, and binary quantization, and offers self-hosted and cloud deployments.
Chroma is a developer-first open-source embedding database focused on simplicity. It runs in process, supports persistence to disk, and exposes a small Python API that pairs cleanly with LangChain and LlamaIndex.
Milvus, from Zilliz, emphasizes raw performance and indexing flexibility, supporting HNSW, IVF, DiskANN, and other algorithms. The hosted Zilliz Cloud product extends Milvus with managed scaling.
pgvector is a PostgreSQL extension that adds vector columns and approximate nearest neighbor search to a standard relational database. It works well up to roughly 10 to 100 million vectors before performance degrades.
LanceDB is a serverless vector database built on the Lance columnar format. It targets multimodal data and runs both as an embedded library and as a managed service.
LLM applications are non-deterministic, expensive, and easy to regress, so observability and evaluation tools have become essential to running them in production.
LangSmith is the commercial observability and evaluation product from the LangChain team. It captures traces, supports human and automated evaluations, and offers prompt management and dataset tools. It works with any provider.
Langfuse is an open-source (MIT license) observability platform that uses a centralized PostgreSQL backend. It provides detailed tracing for complex workflows, prompt management, and dataset-driven evaluation, and is popular with self-hosted teams.
Helicone takes a one-line proxy approach: developers point their LLM client at the Helicone endpoint and get traces, caching, rate limiting, and cost analytics. Helicone is open source and can be self-hosted with Docker or Kubernetes. Its backend uses ClickHouse and Kafka.
Weights & Biases Weave is the LLM-focused product line from the W&B suite, building on the ML experiment-tracking heritage of the parent platform.
Braintrust specializes in evaluations with an optimized trace search database, widely used by teams that want to run large evaluation sweeps before shipping prompt or model changes.
Promptfoo is an open-source CLI and library for prompt evaluation, red-teaming, and security testing. It compares outputs across more than fifty providers and is reportedly used by OpenAI, Anthropic, and 156 of the Fortune 500.
Inspect AI is an open-source evaluation framework developed by the UK AI Security Institute and Meridian Labs. It includes more than 200 prebuilt evaluations covering coding, agentic tasks, reasoning, knowledge, behavior, and multimodal understanding.
Model platforms host model weights, expose them as APIs, and often provide fine-tuning, hardware, and deployment tooling on top.
Hugging Face is the central hub for open-source machine learning. The Hub hosts hundreds of thousands of models across every domain, plus datasets and Spaces for hosted demos. Hugging Face Inference Providers unify fifteen-plus partners (including Replicate, Together AI, Fireworks AI, and SambaNova) under a single OpenAI-compatible endpoint.
Replicate is an API-first serverless inference platform. It excels at generative AI models, especially niche image, video, and audio models, and lets developers run models without provisioning hardware. For text models 7B and below, Replicate typically averages 2 to 5 seconds of cold latency, suitable for batch and async work but not always for low-latency chat.
Together AI focuses on production deployments of open-source models. It offers tuned versions of Llama, Mixtral, Qwen, and other popular families, with optimized infrastructure and fine-tuning options for chat, image, code, and audio models.
Fireworks AI targets low-latency, high-throughput inference for production. It is one of the fastest hosting options for major open-source models and offers fine-tuning, but typically requires higher minimum spend than the alternatives.
Anyscale, the company behind Ray, offers managed Ray clusters and Ray-based serving for both training and inference at scale. It is favored by teams running custom ML workloads alongside LLM inference.
OpenRouter is a unified gateway and marketplace for hosted models. It exposes virtually every commercial and open model behind a single OpenAI-compatible API, with provider routing, automatic fallback, model discovery, and consolidated billing. OpenRouter also offers a Free Models Router that selects free models at random and an Auto Router that picks a provider based on the features the request requires.
Gateway and proxy tools sit between applications and model providers to add routing, caching, cost control, and policy enforcement.
LiteLLM, from BerriAI, is the dominant open-source AI gateway. It provides a Python SDK and a self-hostable proxy server that exposes more than 100 LLM providers behind an OpenAI-compatible interface. LiteLLM adds cost tracking, guardrails, load balancing, retries, fallbacks, virtual API keys, and an admin UI. It is the default choice for teams that want self-hosted routing inside their own infrastructure.
Helicone (also covered under observability) doubles as a gateway when used as a proxy, offering caching and rate limiting alongside tracing.
Portkey is a managed AI gateway that emphasizes production features such as semantic caching, observability, prompt management, and policy controls. Companies using Portkey commonly report 30 to 50 percent reductions in LLM costs from semantic caching alone, making it popular at startups where LLM spend has become significant.
Kong AI Gateway extends the Kong API gateway with LLM-specific features such as prompt templating, request transformation, and provider routing for teams that already standardize on Kong.
Notebook environments and IDEs have a rich plugin ecosystem so that AI features can be added on top of familiar surfaces.
Jupyter AI is the official open-source JupyterLab extension for AI agents. It provides a chat interface and supports Claude, Codex, GitHub Copilot, Gemini, Goose, Kiro, Mistral Vibe, and OpenCode through the Agent Client Protocol. Agents can read and write files, run terminal commands, and interact with notebooks through a built-in MCP server. It also supports IPython, Colab, and VS Code through %%ai magic commands.
Notebook Intelligence is an extensible AI framework for JupyterLab that uses GitHub Copilot under the hood. Colab AI is the Google Colab integration that surfaces Gemini inside the hosted notebook product, with code generation, code explanation, and error help. Cursor extensions, JetBrains Copilot plugins, and community VS Code extensions round out the IDE plugin space.
The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 that defines how AI applications connect to data sources and tools. MCP separates the model client from the systems it acts on through MCP servers, which expose tools, resources, and prompts over a structured JSON-RPC channel.
The ecosystem grew rapidly. By late 2025 there were more than ten thousand active public MCP servers covering developer tools, productivity apps, databases, and Fortune 500 systems. Anthropic shipped reference servers for Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer, and the community built thousands more.
Early adopters included Block and Apollo, with developer-tool companies such as Zed, Replit, Codeium, and Sourcegraph integrating MCP into their products. Major AI providers, including OpenAI and Google DeepMind, adopted the protocol during 2025, making it the default integration channel for agent applications across vendors.
In December 2025 Anthropic donated MCP to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation co-founded by Anthropic, Block, and OpenAI. The donation was intended to make MCP a vendor-neutral standard with broad governance.
The following table summarizes representative tools in each subcategory.
| Category | Representative tools | Typical use |
|---|---|---|
| AI-native editors | Cursor, Windsurf, Zed, Replit Agent | Day-to-day coding with model in the loop |
| CLI coding agents | Claude Code, Codex CLI, Aider, Devin | Long-running automated edits and refactors |
| IDE coding agents | Cline, Roo Code, Continue, Augment Code | Open-source agents inside VS Code |
| Code assistants | GitHub Copilot, Codeium, Tabnine, Cody, Supermaven | Completions and chat in any IDE |
| App frameworks | LangChain, LangGraph, LlamaIndex, Haystack, AutoGen, CrewAI, DSPy, Smol Agents | Building agents and RAG pipelines |
| Inference engines | vLLM, SGLang, TGI, llama.cpp, MLX, TensorRT-LLM, Ollama, LM Studio | Serving open-source models |
| Vector databases | Pinecone, Weaviate, Qdrant, Chroma, Milvus, pgvector, LanceDB | Embedding storage and retrieval |
| Observability | LangSmith, Langfuse, Helicone, Weights & Biases Weave, Braintrust, Promptfoo, Inspect AI | Tracing, evaluation, and monitoring |
| Model platforms | Hugging Face, Replicate, Together AI, Fireworks AI, Anyscale, OpenRouter | Hosted inference and fine-tuning |
| AI gateways | LiteLLM, Helicone, Portkey, Kong AI Gateway | Routing, caching, cost control |
| Notebook plugins | Jupyter AI, Notebook Intelligence, Colab AI | AI assistance inside notebooks |
| Protocol | Model Context Protocol | Standard for tool and data integration |
Most production AI systems combine tools across categories. A retrieval-augmented chat application might ingest documents through LlamaIndex, embed them with an OpenAI or Cohere embedding model, store them in Pinecone or pgvector, orchestrate the chat flow through LangGraph, call models through LiteLLM for cost control, and trace everything to Langfuse or LangSmith. A coding agent product might be built on top of Claude or GPT APIs, use MCP servers to access GitHub, files, and the shell, route through OpenRouter, and run evaluations with Inspect AI or Promptfoo before each release.
The boundaries blur quickly. Helicone is both a gateway and an observability platform. Hugging Face is a model hub, an inference provider, and a framework author through Smol Agents. Cursor is an editor that has acquired both a completion engine (Supermaven) and a code-review tool (Graphite).
Many of the tools above ship in both open-source and commercial forms. Open-source options such as LangChain, LlamaIndex, vLLM, Aider, Cline, Continue, Langfuse, Promptfoo, and LiteLLM give teams full control, the ability to self-host, and predictable costs that scale with hardware rather than usage. Commercial options such as Cursor, GitHub Copilot, Devin, Pinecone, LangSmith, and Together AI offer turnkey experiences, hosted infrastructure, vendor support, and faster onboarding at the cost of usage-based fees and vendor dependence.
A common pattern is to prototype with managed services, then migrate hot paths to self-hosted alternatives once cost or latency justifies the operational burden. A team might start on Pinecone, then move to self-hosted Qdrant or Weaviate when monthly spend crosses a threshold, while continuing to use OpenAI or Anthropic APIs for the highest-quality model calls.