See also: Vibe Coding
Vibe coding is the practice of building software by describing what you want in natural language and letting an AI write the code. The term was coined by Andrej Karpathy, a co-founder of OpenAI, in a post on X dated February 2, 2025. He wrote that "there's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." Karpathy demoed the workflow inside Cursor Composer with Claude Sonnet, dictating to the IDE through the SuperWhisper voice tool. Collins Dictionary selected vibe coding as its Word of the Year for 2025, defining it as the use of artificial intelligence prompted by natural language to write computer code.
The tips below collect community knowledge on how to make this style work without leaving a trail of broken commits behind you. They cover planning, code structure, task execution, debugging, security, and the specific quirks of popular tools.
Origin and current state of the term
Karpathy's post was, in his own words, "a shower of thoughts throwaway tweet." The original framing described throwaway weekend projects where you stop reading the diffs and trust the model. By 2026 the everyday usage covers any AI-assisted coding session, including production work that does require reading every diff. "Pure" vibe coding (no review, no tests, accept everything) is fine for a weekend prototype and dangerous for anything else. The advice below assumes you care whether the code works.
Planning and preparation
Requirements, not solutions
- Concept: LLMs fill unspecified gaps with guesses from their training data, which might not align with your needs.
- Tip: Always spell out your requirements clearly before asking an LLM to code. For example, asking for a visualization might get you an SVG unless you specify "interactive," prompting a React app instead. Clear specs prevent chaos for both AI and human teams.
Preparatory refactoring
- Concept: Refactoring first makes changes easier, but LLMs tend to mash everything together without a plan.
- Tip: Refactor your code before tasking the LLM with new features. Without guidance, an LLM might fix an import error but also add unrelated (sometimes wrong) type annotations. Divide and conquer is key. Avoid letting the AI rewrite everything at once.
Mise en place
- Concept: A broken setup derails LLMs, much like missing ingredients ruin a recipe.
- Tip: Set up your environment (rules, MCP servers, docs) beforehand. Vibe coding against a live production database is, as one practitioner put it, like randomly copy-pasting Stack Overflow commands; disaster awaits.
Walking skeleton
- Concept: A minimal end-to-end system helps identify next steps early.
- Tip: Build a basic working system first, then flesh it out. LLMs shine here, letting you see what's next at every step. Get the request, response, and a single test path working before you ask the model for features.
Use plan mode before you generate code
- Concept: In agentic IDEs the model will happily start editing files before you have agreed on what should change.
- Tip: In Claude Code, press Shift+Tab twice to enter Plan Mode, where the agent reads and reasons but cannot write to disk. Cursor and Windsurf have similar planning toggles. Have the agent return a numbered list of files it intends to touch, then approve, push back, or rewrite the plan before letting it generate any code. The Product Talk "vibe coding doom loop" usually starts with someone skipping this step.
Code structure and management
Keep files small
- Concept: Large files overwhelm LLM context windows and slow down agentic tools.
- Tip: Split big files (such as 471 KB) into smaller modules. Some models struggle with a single 128 KB file even when their nominal context limit is 200 K tokens. Applying dozens of edits to a 64 KB file in older Cursor builds also creates noticeable lag. LLMs are good at the import-shuffling drudgery that splitting creates.
Use static types
- Concept: Static types catch errors LLMs might miss, though Python and JavaScript dominate their training.
- Tip: Use strict type checker settings (for example TypeScript with
strict: true, or mypy) to guide LLMs via type errors. Type errors give the agent a deterministic signal to iterate against, which is more reliable than re-reading prompts.
Rule of three
- Concept: Duplication is fine once or twice, but three times calls for refactoring. LLMs don't naturally do this.
- Tip: Prompt LLMs to refactor after the third instance. They will copy-paste endlessly otherwise. Some argue an AI can stretch to ten near-duplicates inside one file, but across files three is the point where you should pull out a shared helper.
- Concept: LLMs falter at mechanical style rules (such as no trailing spaces).
- Tip: Use tools like gofmt, prettier, or black so the LLM does not have to think about layout. That saves tokens and attention for the real work.
- Concept: Tools with hidden state (for example the shell's working directory) confuse LLMs.
- Tip: Avoid stateful tools, or always run commands from one directory. Many models lose track of a
cd command halfway through a session, so isolating workspaces (such as one terminal per NPM module) helps.
Prompting and task execution
Write scoped, constrained prompts
- Concept: Open-ended prompts cause sprawling, unwanted changes.
- Tip: A useful template is Goal: what to accomplish. Constraints: what to touch and what not to touch. Process: the steps. Verification: how you will check. Reference real files ("match the pattern in
src/routes/users.ts") instead of describing style verbally.
One feature, one session
- Concept: Conversation history bleeds across features and creates inconsistent assumptions.
- Tip: Start a new chat for each meaningful feature. Use
/clear in Claude Code, or open a fresh composer in Cursor. Anna Arteeva's prompting guide and the roadmap.sh best-practices list both flag context bleed as a top mistake.
Stop digging
- Concept: LLMs grind away at bad tasks instead of pivoting.
- Tip: Pre-plan with a reasoning model or use a separate watchdog LLM. A common failure mode: a Monte Carlo test fix spiraling into relaxed assertions instead of a deterministic refactor.
Bulldozer method
- Concept: LLMs excel at brute-force tasks humans avoid.
- Tip: Let them plow through mass refactoring (such as renaming a symbol across hundreds of files). Inspect the output yourself, since they will not self-improve.
Black box testing and respect the spec
- Concept: LLMs peek at implementations and break black-box principles, and sometimes rename APIs or delete tests unless constrained.
- Tip: Mask internals to enforce boundaries. One case saw an LLM compute a test constant from the implementation instead of preserving it as a literal, which hid the bug the test was meant to find. Another LLM once replaced a failing test body with
assert True. Review every diff before accepting.
Small commits, real commit messages
- Concept: A vibe coding session that makes ten commits is much easier to debug than one that makes a single 4,000-line commit.
- Tip: Ask the agent to commit after each green run, with a real message that explains why, not just what. When something breaks two days later you will be able to
git bisect your way back to the offending change. Many teams add a project rule that the agent must run tests before every commit.
Debugging and review
Scientific debugging
- Concept: LLMs guess fixes instead of reasoning systematically.
- Tip: Switch to a reasoning model, or root-cause the bug yourself and then hand the agent a targeted fix. Avoid open-ended "why is this broken?" loops.
Diagnose, then fix
- Concept: When the same agent both writes and tests code, both reflect the same wrong assumptions and the bug never shows up.
- Tip: When stuck in a bug, open a new conversation, give it only the failing output, and ask it to diagnose without changing any code. Run the same diagnostic prompt against a second model. Only start coding after at least one independent agent confirms the root cause.
Memento and context hygiene
- Concept: LLMs have no memory beyond context, like the protagonist of the film Memento. Irrelevant context also distracts them.
- Tip: Feed docs and decisions back in explicitly, and prune what is no longer relevant. Claude Code's subagent feature, which spawns a fresh agent for a subtask, helps keep the parent context focused.
Know your limits
- Concept: LLMs will not admit ignorance without prompting.
- Tip: Give the agent tools (shell, web fetch, MCP servers) or refine prompts. An LLM once faked shell output by writing a Python script that printed plausible-looking results instead of running the requested command.
Security, secrets, and supply chain
Vibe coding output frequently lands in repositories without a careful security pass. Veracode reported in 2025 that around 45% of AI-generated code tasks contained at least one security issue, and other 2025 research put the rate of insecure completions at 40% to 62%.
- Hallucinated packages and slopsquatting: An analysis of 576,000 AI-generated samples found that 20% recommended non-existent dependencies, with about 205,000 unique hallucinated names. Attackers register these names on npm or PyPI to ship malicious code. Verify a package exists, is maintained, and has expected download counts before installing it.
- SQL injection and unsafe deserialization: Models default to string concatenation in SQL queries and to
pickle-style deserialization. Ask explicitly for parameterized queries and safe parsers, or run a security linter against every diff.
- Cross-site scripting: Generated front-end code skips consistent output encoding more often than handwritten code.
- Leaked secrets: API keys end up hard-coded, logged during debugging, or committed to
.env files. Use a pre-commit secret scanner and a CI gate that fails the build on any new credential.
- Shadow APIs: Agents wire up new endpoints to third-party services without flagging it. Audit the network calls your app actually makes.
Treat AI-generated code like a pull request from a contractor you have not vetted. Never merge anything sensitive (auth, payments, schema migrations) without a second human reviewer.
Use screenshots and visual feedback
For UI work, words go only so far. Drop a screenshot of the broken layout into Cursor, Claude Code, or ChatGPT and ask for a fix. The Tweag agentic coding handbook recommends a tight loop: capture the current state, paste it into the prompt with one specific change, accept the diff, re-screenshot. Each prompt should address one concern. Asking for five visual changes in a single round almost always introduces regressions on the unchanged parts. For design parity, screenshot the reference mock and ask the model to describe the components, hierarchy, and spacing in text. Then use that description as a builder prompt for a separate session.
Culture and style
Culture eats strategy
- Concept: LLMs reflect their "latent space" (prompts, codebase, fine-tuning).
- Tip: Shift the local culture to match your style. In one case a team rewrote a Python codebase to use async/await throughout so the agent stopped emitting synchronous patterns it had learned elsewhere.
Read the docs
- Concept: LLMs hallucinate APIs for niche frameworks.
- Tip: Feed manuals via Cursor's URL drop, Claude Code's docs reading, or an MCP docs server. One team resolved a long-running YAML schema bug simply by attaching the official spec to the chat.
Use MCP servers
- Concept: Model Context Protocol servers give the agent typed, sandboxed access to external systems (databases, file systems, web search, ticketing).
- Tip: Prefer MCP servers over giving the agent raw shell access. Anthropic's reference servers for filesystem, GitHub, and Postgres are safer starting points than letting the model run arbitrary
bash.
Examples from practice
- Nondeterministic chaos: An LLM was asked to fix Monte Carlo tests but relaxed the pass conditions instead of spotting the nondeterminism. Pin random seeds.
- Directory confusion: In a TypeScript monorepo, an LLM lost track of
cd commands. A single-directory setup with absolute paths fixed it.
- Spec violation: An LLM renamed a dict key
pass to pass_ in a public API. Review for alignment before merging.
- 15-hour cleanup: Engineer Bas Nijholt wrote a widely-shared 2025 post titled "Vibe coding worked, until the 15-hour cleanup," describing how a Saturday of casual prompting produced a working app that took the rest of the weekend to make safe to deploy. The pattern is common enough that "vibe code cleanup specialist" briefly appeared as a self-applied LinkedIn title.
| Tool | Type | Notes |
|---|
| Cursor | IDE fork (VS Code) | Strong agent mode, .cursorrules for project rules, tab completion praised in the community. |
| Claude Code | Terminal CLI | Plan Mode (Shift+Tab twice), subagents, skills as reusable prompt templates, /clear and /compact for context hygiene. |
| Windsurf | IDE fork | Cascade agent, local indexing for codebase-wide context, generous free tier in 2025. |
| GitHub Copilot | IDE plugin | Default for VS Code users; agent mode added in 2025. |
| ChatGPT | Chat | Best for explanations, planning, and code review of pasted snippets; weaker for repo-wide edits. |
| Replit Agent | Browser IDE | Spins up auth, storage, and deploy in one session; good for prototypes, slow on long runs. |
| v0 by Vercel | Web | UI-focused; React components with tight publish-to-Vercel flow. |
| Bolt | Browser IDE | Quick scaffolding; database and backend reliability varies. |
| Lovable | Browser IDE | End-to-end prompt-to-deploy; comprehensive but locks you into its stack. |
| Base44 | Visual builder | Targets enterprise; strongest out-of-the-box functionality in head-to-head 2025 comparisons. |
Karpathy's original demo used Cursor Composer with Anthropic's Claude Sonnet. As of 2026 the same combination, along with Claude Code and Cursor with the Claude 4-class models, is what most experienced practitioners reach for when they actually have to ship.
Workflow recommendations
- Plan before coding: Use Plan Mode or a reasoning model to outline tasks before any file is touched.
- One feature per session: Start a fresh chat for each feature to avoid context bleed.
- Sandbox changes: Run agent output on a branch. Never let it commit straight to
main.
- Automate the boring checks: Pair the agent with linters, formatters, and a CI pipeline that runs type checks, tests, and a secret scanner on every push.
- Maintain context: Keep
CLAUDE.md, .cursorrules, or the equivalent under 50 lines, focused on build commands, conventions, and off-limits zones.
- Independent verification: When the same agent writes the code and the tests, treat a green run as suggestive, not conclusive. Have a second agent or a human review the test logic.
- Don't vibe alone for production: For anything user-facing, security-relevant, or hard to roll back, add human review, observability, and a rollback path before deploying.
- LLMs err differently from humans. Human intuition for "what a coder would mess up" does not transfer; workflows have to be designed to catch AI-specific failure modes.
- The popular shorthand is that an LLM is like a very fast junior programmer: broad knowledge, weak big-picture judgment. Pair-programming with one feels like arguing with a stubborn intern.
- Karpathy revised his framing in 2026, telling followers that "programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny." He started using the term "agentic engineering" to distinguish disciplined AI-assisted development from unvetted vibe coding.
- Do not bank on future model upgrades to fix today's habits. The reliability issues with hallucinated APIs, deleted tests, and silent renames have persisted across multiple frontier model releases.
References