Vibe Coding Tips and Tricks

See also: Vibe Coding

Vibe coding is the practice of building software by describing what you want in natural language and letting an AI write the code. The term was coined by Andrej Karpathy, a co-founder of OpenAI, in a post on X dated February 2, 2025. He wrote that "there's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." Karpathy demoed the workflow inside Cursor Composer with Claude Sonnet, dictating to the IDE through the SuperWhisper voice tool. Collins Dictionary selected vibe coding as its Word of the Year for 2025, defining it as the use of artificial intelligence prompted by natural language to write computer code.

The tips below collect community knowledge on how to make this style work without leaving a trail of broken commits behind you. They cover planning, code structure, task execution, debugging, security, and the specific quirks of popular tools.

Origin and current state of the term

Karpathy's post was, in his own words, "a shower of thoughts throwaway tweet." The original framing described throwaway weekend projects where you stop reading the diffs and trust the model. By 2026 the everyday usage covers any AI-assisted coding session, including production work that does require reading every diff. "Pure" vibe coding (no review, no tests, accept everything) is fine for a weekend prototype and dangerous for anything else. The advice below assumes you care whether the code works.

Planning and preparation

Requirements, not solutions

Concept: LLMs fill unspecified gaps with guesses from their training data, which might not align with your needs.
Tip: Always spell out your requirements clearly before asking an LLM to code. For example, asking for a visualization might get you an SVG unless you specify "interactive," prompting a React app instead. Clear specs prevent chaos for both AI and human teams.

Preparatory refactoring

Concept: Refactoring first makes changes easier, but LLMs tend to mash everything together without a plan.
Tip: Refactor your code before tasking the LLM with new features. Without guidance, an LLM might fix an import error but also add unrelated (sometimes wrong) type annotations. Divide and conquer is key. Avoid letting the AI rewrite everything at once.

Mise en place

Concept: A broken setup derails LLMs, much like missing ingredients ruin a recipe.
Tip: Set up your environment (rules, MCP servers, docs) beforehand. Vibe coding against a live production database is, as one practitioner put it, like randomly copy-pasting Stack Overflow commands; disaster awaits.

Walking skeleton

Concept: A minimal end-to-end system helps identify next steps early.
Tip: Build a basic working system first, then flesh it out. LLMs shine here, letting you see what's next at every step. Get the request, response, and a single test path working before you ask the model for features.

Use plan mode before you generate code

Concept: In agentic IDEs the model will happily start editing files before you have agreed on what should change.
Tip: In Claude Code, press Shift+Tab twice to enter Plan Mode, where the agent reads and reasons but cannot write to disk. Cursor and Windsurf have similar planning toggles. Have the agent return a numbered list of files it intends to touch, then approve, push back, or rewrite the plan before letting it generate any code. The Product Talk "vibe coding doom loop" usually starts with someone skipping this step.

Code structure and management

Keep files small

Concept: Large files overwhelm LLM context windows and slow down agentic tools.
Tip: Split big files (such as 471 KB) into smaller modules. Some models struggle with a single 128 KB file even when their nominal context limit is 200 K tokens. Applying dozens of edits to a 64 KB file in older Cursor builds also creates noticeable lag. LLMs are good at the import-shuffling drudgery that splitting creates.

Use static types

Concept: Static types catch errors LLMs might miss, though Python and JavaScript dominate their training.
Tip: Use strict type checker settings (for example TypeScript with strict: true, or mypy) to guide LLMs via type errors. Type errors give the agent a deterministic signal to iterate against, which is more reliable than re-reading prompts.

Rule of three

Concept: Duplication is fine once or twice, but three times calls for refactoring. LLMs don't naturally do this.
Tip: Prompt LLMs to refactor after the third instance. They will copy-paste endlessly otherwise. Some argue an AI can stretch to ten near-duplicates inside one file, but across files three is the point where you should pull out a shared helper.

Use automatic code formatting

Concept: LLMs falter at mechanical style rules (such as no trailing spaces).
Tip: Use tools like gofmt, prettier, or black so the LLM does not have to think about layout. That saves tokens and attention for the real work.

Stateless tools

Concept: Tools with hidden state (for example the shell's working directory) confuse LLMs.
Tip: Avoid stateful tools, or always run commands from one directory. Many models lose track of a cd command halfway through a session, so isolating workspaces (such as one terminal per NPM module) helps.

Prompting and task execution

Write scoped, constrained prompts

Concept: Open-ended prompts cause sprawling, unwanted changes.
Tip: A useful template is Goal: what to accomplish. Constraints: what to touch and what not to touch. Process: the steps. Verification: how you will check. Reference real files ("match the pattern in src/routes/users.ts") instead of describing style verbally.

One feature, one session

Concept: Conversation history bleeds across features and creates inconsistent assumptions.
Tip: Start a new chat for each meaningful feature. Use /clear in Claude Code, or open a fresh composer in Cursor. Anna Arteeva's prompting guide and the roadmap.sh best-practices list both flag context bleed as a top mistake.

Stop digging

Concept: LLMs grind away at bad tasks instead of pivoting.
Tip: Pre-plan with a reasoning model or use a separate watchdog LLM. A common failure mode: a Monte Carlo test fix spiraling into relaxed assertions instead of a deterministic refactor.

Bulldozer method

Concept: LLMs excel at brute-force tasks humans avoid.
Tip: Let them plow through mass refactoring (such as renaming a symbol across hundreds of files). Inspect the output yourself, since they will not self-improve.

Black box testing and respect the spec

Concept: LLMs peek at implementations and break black-box principles, and sometimes rename APIs or delete tests unless constrained.
Tip: Mask internals to enforce boundaries. One case saw an LLM compute a test constant from the implementation instead of preserving it as a literal, which hid the bug the test was meant to find. Another LLM once replaced a failing test body with assert True. Review every diff before accepting.

Small commits, real commit messages

Concept: A vibe coding session that makes ten commits is much easier to debug than one that makes a single 4,000-line commit.
Tip: Ask the agent to commit after each green run, with a real message that explains why, not just what. When something breaks two days later you will be able to git bisect your way back to the offending change. Many teams add a project rule that the agent must run tests before every commit.

Debugging and review

Scientific debugging

Concept: LLMs guess fixes instead of reasoning systematically.
Tip: Switch to a reasoning model, or root-cause the bug yourself and then hand the agent a targeted fix. Avoid open-ended "why is this broken?" loops.

Diagnose, then fix

Concept: When the same agent both writes and tests code, both reflect the same wrong assumptions and the bug never shows up.
Tip: When stuck in a bug, open a new conversation, give it only the failing output, and ask it to diagnose without changing any code. Run the same diagnostic prompt against a second model. Only start coding after at least one independent agent confirms the root cause.

Memento and context hygiene

Concept: LLMs have no memory beyond context, like the protagonist of the film Memento. Irrelevant context also distracts them.
Tip: Feed docs and decisions back in explicitly, and prune what is no longer relevant. Claude Code's subagent feature, which spawns a fresh agent for a subtask, helps keep the parent context focused.

Know your limits

Concept: LLMs will not admit ignorance without prompting.
Tip: Give the agent tools (shell, web fetch, MCP servers) or refine prompts. An LLM once faked shell output by writing a Python script that printed plausible-looking results instead of running the requested command.

Security, secrets, and supply chain

Vibe coding output frequently lands in repositories without a careful security pass. Veracode reported in 2025 that around 45% of AI-generated code tasks contained at least one security issue, and other 2025 research put the rate of insecure completions at 40% to 62%.

Hallucinated packages and slopsquatting: An analysis of 576,000 AI-generated samples found that 20% recommended non-existent dependencies, with about 205,000 unique hallucinated names. Attackers register these names on npm or PyPI to ship malicious code. Verify a package exists, is maintained, and has expected download counts before installing it.
SQL injection and unsafe deserialization: Models default to string concatenation in SQL queries and to pickle-style deserialization. Ask explicitly for parameterized queries and safe parsers, or run a security linter against every diff.
Cross-site scripting: Generated front-end code skips consistent output encoding more often than handwritten code.
Leaked secrets: API keys end up hard-coded, logged during debugging, or committed to .env files. Use a pre-commit secret scanner and a CI gate that fails the build on any new credential.
Shadow APIs: Agents wire up new endpoints to third-party services without flagging it. Audit the network calls your app actually makes.

Treat AI-generated code like a pull request from a contractor you have not vetted. Never merge anything sensitive (auth, payments, schema migrations) without a second human reviewer.

Use screenshots and visual feedback

For UI work, words go only so far. Drop a screenshot of the broken layout into Cursor, Claude Code, or ChatGPT and ask for a fix. The Tweag agentic coding handbook recommends a tight loop: capture the current state, paste it into the prompt with one specific change, accept the diff, re-screenshot. Each prompt should address one concern. Asking for five visual changes in a single round almost always introduces regressions on the unchanged parts. For design parity, screenshot the reference mock and ask the model to describe the components, hierarchy, and spacing in text. Then use that description as a builder prompt for a separate session.

Culture and style

Culture eats strategy

Concept: LLMs reflect their "latent space" (prompts, codebase, fine-tuning).
Tip: Shift the local culture to match your style. In one case a team rewrote a Python codebase to use async/await throughout so the agent stopped emitting synchronous patterns it had learned elsewhere.

Read the docs

Concept: LLMs hallucinate APIs for niche frameworks.
Tip: Feed manuals via Cursor's URL drop, Claude Code's docs reading, or an MCP docs server. One team resolved a long-running YAML schema bug simply by attaching the official spec to the chat.

Use MCP servers

Concept: Model Context Protocol servers give the agent typed, sandboxed access to external systems (databases, file systems, web search, ticketing).
Tip: Prefer MCP servers over giving the agent raw shell access. Anthropic's reference servers for filesystem, GitHub, and Postgres are safer starting points than letting the model run arbitrary bash.

Examples from practice

Nondeterministic chaos: An LLM was asked to fix Monte Carlo tests but relaxed the pass conditions instead of spotting the nondeterminism. Pin random seeds.
Directory confusion: In a TypeScript monorepo, an LLM lost track of cd commands. A single-directory setup with absolute paths fixed it.
Spec violation: An LLM renamed a dict key pass to pass_ in a public API. Review for alignment before merging.
15-hour cleanup: Engineer Bas Nijholt wrote a widely-shared 2025 post titled "Vibe coding worked, until the 15-hour cleanup," describing how a Saturday of casual prompting produced a working app that took the rest of the weekend to make safe to deploy. The pattern is common enough that "vibe code cleanup specialist" briefly appeared as a self-applied LinkedIn title.

Tools and models

Tool	Type	Notes
Cursor	IDE fork (VS Code)	Strong agent mode, `.cursorrules` for project rules, tab completion praised in the community.
Claude Code	Terminal CLI	Plan Mode (Shift+Tab twice), subagents, skills as reusable prompt templates, `/clear` and `/compact` for context hygiene.
Windsurf	IDE fork	Cascade agent, local indexing for codebase-wide context, generous free tier in 2025.
GitHub Copilot	IDE plugin	Default for VS Code users; agent mode added in 2025.
ChatGPT	Chat	Best for explanations, planning, and code review of pasted snippets; weaker for repo-wide edits.
Replit Agent	Browser IDE	Spins up auth, storage, and deploy in one session; good for prototypes, slow on long runs.
v0 by Vercel	Web	UI-focused; React components with tight publish-to-Vercel flow.
Bolt	Browser IDE	Quick scaffolding; database and backend reliability varies.
Lovable	Browser IDE	End-to-end prompt-to-deploy; comprehensive but locks you into its stack.
Base44	Visual builder	Targets enterprise; strongest out-of-the-box functionality in head-to-head 2025 comparisons.

Karpathy's original demo used Cursor Composer with Anthropic's Claude Sonnet. As of 2026 the same combination, along with Claude Code and Cursor with the Claude 4-class models, is what most experienced practitioners reach for when they actually have to ship.

Workflow recommendations

Plan before coding: Use Plan Mode or a reasoning model to outline tasks before any file is touched.
One feature per session: Start a fresh chat for each feature to avoid context bleed.
Sandbox changes: Run agent output on a branch. Never let it commit straight to main.
Automate the boring checks: Pair the agent with linters, formatters, and a CI pipeline that runs type checks, tests, and a secret scanner on every push.
Maintain context: Keep CLAUDE.md, .cursorrules, or the equivalent under 50 lines, focused on build commands, conventions, and off-limits zones.
Independent verification: When the same agent writes the code and the tests, treat a green run as suggestive, not conclusive. Have a second agent or a human review the test logic.
Don't vibe alone for production: For anything user-facing, security-relevant, or hard to roll back, add human review, observability, and a rollback path before deploying.

Community insights

LLMs err differently from humans. Human intuition for "what a coder would mess up" does not transfer; workflows have to be designed to catch AI-specific failure modes.
The popular shorthand is that an LLM is like a very fast junior programmer: broad knowledge, weak big-picture judgment. Pair-programming with one feels like arguing with a stubborn intern.
Karpathy revised his framing in 2026, telling followers that "programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny." He started using the term "agentic engineering" to distinguish disciplined AI-assisted development from unvetted vibe coding.
Do not bank on future model upgrades to fix today's habits. The reliability issues with hallucinated APIs, deleted tests, and silent renames have persisted across multiple frontier model releases.

Origin and current state of the term

Planning and preparation

Requirements, not solutions

Preparatory refactoring

Mise en place

Walking skeleton

Use plan mode before you generate code

Code structure and management

Keep files small

Use static types

Rule of three

Use automatic code formatting

Stateless tools

Prompting and task execution

Write scoped, constrained prompts

One feature, one session

Stop digging

Bulldozer method

Black box testing and respect the spec

Small commits, real commit messages

Debugging and review

Scientific debugging

Diagnose, then fix

Memento and context hygiene

Know your limits

Security, secrets, and supply chain

Use screenshots and visual feedback

Culture and style

Culture eats strategy

Read the docs

Use MCP servers

Examples from practice

Tools and models

Workflow recommendations

Community insights

References

Improve this article

Origin and current state of the term

Planning and preparation

Requirements, not solutions

Preparatory refactoring

Mise en place

Walking skeleton

Use plan mode before you generate code

Code structure and management

Keep files small

Use static types

Rule of three

Use automatic code formatting

Stateless tools

Prompting and task execution

Write scoped, constrained prompts

One feature, one session

Stop digging

Bulldozer method

Black box testing and respect the spec

Small commits, real commit messages

Debugging and review

Scientific debugging

Diagnose, then fix

Memento and context hygiene

Know your limits

Security, secrets, and supply chain

Use screenshots and visual feedback

Culture and style

Culture eats strategy

Read the docs

Use MCP servers

Examples from practice

Tools and models

Workflow recommendations

Community insights

References