Programming with ChatGPT
Last reviewed
May 13, 2026
Sources
30 citations
Review status
Source-backed
Revision
v2 ยท 5,407 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
30 citations
Review status
Source-backed
Revision
v2 ยท 5,407 words
Add missing citations, update stale details, or suggest a clearer explanation.
Programming with ChatGPT is the practice of using OpenAI's conversational chatbot to read, write, refactor, document, debug, test, and explain source code. Since ChatGPT was released as a research preview on November 30, 2022, it has become one of the most widely used tools for writing software, sitting alongside specialized assistants like GitHub Copilot, Cursor, and Claude Code. Developers paste snippets into the chat, ask questions in plain English, and receive working code, explanations, or fixes. The pattern is informal compared to traditional IDE tooling, but for many tasks it has proven faster than a search engine and more flexible than autocomplete.
ChatGPT can act as a junior collaborator that has read most of the public internet. It will happily explain a regex, port a Python function to Rust, sketch the skeleton of a CRUD app, or guess at what an unfamiliar error message means. It is not a compiler and it is not a tester, so the burden of verification stays with the human. The most common workflow is short and iterative: ask for something, read the result, run it, paste back the error, and repeat.
What ChatGPT does well for programming tasks:
What it does poorly:
ChatGPT launched on November 30, 2022 as a free research preview built on a fine-tuned version of GPT-3.5. OpenAI's announcement framed it as a dialogue model that could follow up, admit mistakes, and refuse inappropriate requests. Within five days it had over a million users, which OpenAI CEO Sam Altman noted on Twitter, and it quickly set the record for the fastest growing consumer product up to that point.
The coding use case caught on within days. Developers posted screenshots of ChatGPT writing scripts, explaining unfamiliar code, and answering questions that would normally have gone to Stack Overflow. The traffic to Stack Overflow itself started to slip, and on December 5, 2022, the site's moderators announced a temporary ban on ChatGPT-generated answers. They wrote that the rate of correct answers was too low, but that the answers looked plausible enough to swamp the volunteer review system. The temporary ban became one of the first formal institutional reactions to a generative AI tool.
GPT-4 arrived on March 14, 2023 and was substantially better at code than GPT-3.5. OpenAI's technical report emphasized large gains on reasoning, retention, and programming benchmarks. The same launch demoed GPT-4 turning a hand-drawn website mockup into working HTML. Coding then improved in steps with every model release that followed:
| Model | Release date | Notes for coding |
|---|---|---|
| GPT-3.5 (powering original ChatGPT) | November 30, 2022 | Useful for snippets, brittle on longer programs |
| GPT-4 | March 14, 2023 | Large jump in reasoning and code quality |
| GPT-4o | May 13, 2024 | Multimodal, faster, and 50% cheaper via the API than GPT-4 Turbo |
| o1 preview and mini | September 12, 2024 | First reasoning model, ranked in the 89th percentile on Codeforces |
| o1 full release | December 5, 2024 | Better long-chain reasoning for code |
| o3 (announced) | December 20, 2024 | 71.7% on SWE-bench Verified, 2727 Elo on Codeforces |
| o3-mini | January 31, 2025 | Cheaper reasoning model for STEM |
| o3 and o4-mini (full) | April 16, 2025 | Tool use plus reasoning |
| GPT-5 | August 7, 2025 | 74.9% on SWE-bench Verified, 88% on Aider Polyglot |
GPT-5 introduced a router that hands a prompt to either a fast model or a deeper reasoning model based on the kind of question being asked. For programming, that mostly means harder problems get more thinking time without the user having to switch models manually.
Several ChatGPT features matter specifically for programming.
Code Interpreter began as a plugin in March 2023 and went into wider beta on July 6, 2023 for ChatGPT Plus users. It is a sandboxed Python environment inside the chat. Users can upload files, run code, and have the model iterate on results without leaving the conversation. OpenAI renamed it Advanced Data Analysis in August 2023 alongside the launch of ChatGPT Enterprise, and the feature was later folded into the default model behavior so the name became less prominent. The underlying capability remains: you can ask ChatGPT to clean a CSV, plot a histogram, fit a model, or run unit tests and it will write and execute Python in the background.
Canvas launched in open beta on October 3, 2024 as a side panel for longer writing and coding sessions. It opens a document next to the chat that ChatGPT can edit in place rather than reprinting the whole file every turn. For code it includes shortcut buttons for reviewing code with inline suggestions, adding print statements for debugging, adding comments, fixing bugs, and porting code into JavaScript, TypeScript, Python, Java, C++, or PHP. Canvas was first built on top of GPT-4o.
OpenAI rolled out Custom Instructions on July 20, 2023, starting with ChatGPT Plus and expanding to free users in August 2023. Custom Instructions are two short text fields, with a combined 1,500 character limit, that ChatGPT reads at the start of every conversation. A developer can use them to say "I write Python, prefer typing, hate trailing commas, do not use f-strings inside docstrings" once and have those preferences apply across every chat.
Custom GPTs let anyone build a pre-configured ChatGPT for a specific use case, with its own system prompt, files, and tools. The GPT Store opened to ChatGPT Plus, Team, and Enterprise users on January 10, 2024. By that point, users had already built more than three million custom GPTs. Many of the most-used ones are coding focused, including Code Tutor by Khan Academy, ScholarGPT, Diagram and MindMap GPT, and tools for specific frameworks.
The ChatGPT desktop app for macOS shipped on May 13, 2024 alongside the GPT-4o announcement. It supports a global keyboard shortcut (Option + Space) and lets users send screenshots into the chat, which is useful when explaining what a tool's output looks like. The Windows app followed later that year. OpenAI launched ChatGPT Atlas, a Chromium-based browser with ChatGPT in the sidebar and an Agent Mode that can take actions in the page, on October 21, 2025.
The simplest use case is also the one that hooked many developers first. Paste a snippet, ask "what does this do?" and read the explanation. For a piece of unfamiliar code, including legacy code at a new job or an obscure utility from a Github repository, ChatGPT will normally describe the flow line by line and call out tricky parts like recursion, closures, or implicit type coercions.
Good prompts for understanding code tend to be specific about what kind of explanation you want:
x = [1, 2, 3]."For short, self contained code, ChatGPT is reliable. For longer code that references other files, it does better when you paste in the relevant imports or type signatures alongside the function you actually care about. Without that context, it will sometimes invent plausible behavior for symbols it has never seen.
The pattern works in the other direction too. If you have a high-level description and want to know what code that does it would look like in a particular library, ChatGPT will usually produce a runnable sketch. The catch, again, is that it will sometimes invent a function name that does not exist.
Refactoring is one of the use cases that benefits most from ChatGPT, because the model can be told what shape the new code should have without being asked to invent the logic from scratch.
Common refactor prompts:
For stylistic refactors, results are normally clean. For deeper structural refactors, the output should be treated as a draft. ChatGPT is happy to reorganize code in ways that look right but quietly break behavior, especially around error handling, default arguments, and edge cases. Running the existing tests after every refactor catches most of this, which is the usual reason ChatGPT-assisted refactors work best in codebases that have decent test coverage to begin with.
For large changes, Canvas helps. Without Canvas, every refactor reprints the entire file, which makes diffs hard to read. With Canvas, the model edits in place and you can see exactly what changed.
Documenting an existing codebase is tedious and lends itself well to chat-based AI. Paste a function, ask for docstrings, and ChatGPT will return the same function with a comment block describing parameters, return values, and side effects. For larger files, it can produce Google-style, NumPy-style, JSDoc, or Sphinx-style docstrings on request.
A few prompt patterns that work well:
The quality of comments depends on whether the function name and parameter names already communicate intent. Well named functions get well named comments. Cryptically named functions get plausible but wrong comments, because the model is filling in based on what code that looks like that usually does. The honest move is to read every generated comment and either keep it or rewrite it.
For a whole-file pass, the Canvas "Add comments" shortcut button does this in one click. For a library, asking ChatGPT to draft a README is a fast way to bootstrap documentation, although the resulting README needs editing to remove generic filler.
ChatGPT is often used as a faster, friendlier version of reading official documentation. Ask "how do I do X in library Y" and you usually get a working snippet plus an explanation, where searching the docs might mean clicking through three pages.
This works well for popular libraries that the training data has seen a lot of, like Pandas, NumPy, React, Express, requests, FastAPI, and the standard libraries of Python, JavaScript, Java, and Go. It works less well for newer or niche libraries. Two failure modes show up:
The workaround for both is the same: paste a snippet of the actual library documentation into the chat and ask ChatGPT to base its answer on that. Some users keep a small file of canonical examples for the libraries they use most often and paste it as context.
The Atlas browser takes this further. If you have the docs page open, the sidebar ChatGPT can read it directly without you having to copy and paste.
Scaffolding a new project is a strong fit for chat-based AI. Most projects start with a directory layout, a package.json or pyproject.toml, a basic test runner, and a hello-world entry point. ChatGPT can produce all of that in a single response.
Usable scaffolding prompts:
pyproject.toml for a Python library using uv, with pytest, ruff, and mypy configured."/health endpoint, plus Pydantic models, plus a Dockerfile."users, posts, and comments tables."The generated scaffolding tends to follow the most common conventions, which is usually what you want for a new project. Where it falls down is on combinations: "a Django project using Vite for the frontend and tRPC for the API" might produce code that looks plausible but does not actually wire those tools together correctly. For unusual stacks, asking ChatGPT to describe the architecture first, then implementing it in passes, is more reliable than asking for everything at once.
For production scaffolding, dedicated tools like create-next-app, cargo new, or django-admin startproject remain the safer baseline. ChatGPT is most useful for the work that those tools do not cover: adding a second service to a monorepo, sketching a webhook handler, or scaffolding a one-off script.
Debugging is one of the highest value use cases. The standard recipe is:
With all three pieces of context, ChatGPT can often point at the bug or at least narrow the search. Common patterns it catches well include off-by-one errors, missing awaits on async functions, incorrect array indexing, type mismatches, missing null checks, and import path mistakes.
Where it struggles:
One practical pattern is to use ChatGPT to add logging rather than to find the bug directly. Asking "add print statements that would help me figure out where this is going wrong" produces an instrumented version of the code that often reveals the answer on the next run.
Generating tests is a strong fit for chat-based AI. You paste a function, ask for unit tests, and ChatGPT writes a suite. For pure functions, the tests are usually decent. For functions with side effects, the model needs help knowing how to mock the side effects, which the developer must provide.
Prompt patterns that work:
fetch so the test does not hit the network."A non-obvious limitation: ChatGPT tends to write tests that pass. If it gets the implementation wrong, it will also get the test wrong in a matching way, and both will pass together. The defensive habit is to read each test, ask whether it would catch a realistic bug, and write the implementation and tests in separate prompts.
For TDD-style work, the pattern is reversed: paste the desired behavior or examples, ask ChatGPT to write only the tests first, then write or generate the implementation that makes them pass. This tends to produce sharper tests because the model is not just mirroring the implementation it just wrote.
"Translate this Python to Rust" is a use case that barely existed before LLMs. ChatGPT will do it. Quality varies by language pair. Python to JavaScript and JavaScript to TypeScript work very well. Python to Rust or Go works decently for self-contained functions but needs human review when ownership, lifetimes, or error handling come into play. Translating from older to newer versions of the same language, like Python 2 to 3 or callback-style JavaScript to async/await, is reliable.
Useful prompts:
Result<T, E> for error handling. Do not use unwrap()."re syntax."The Canvas coding shortcuts include a one-click port between JavaScript, TypeScript, Python, Java, C++, and PHP, which is the same operation surfaced as a button.
SQL and regular expressions are perennial favorites because both are hard to write and easy to verify by running. Paste a schema, describe what you want, and ChatGPT returns a query.
Things that work:
ROW_NUMBER() and RANK().Things to watch:
LIMIT syntax without OFFSET. State the dialect in the prompt.EXPLAIN ANALYZE and tune by hand.For regex, asking ChatGPT to also produce a few example matches and a few example non-matches makes the verification step almost mechanical.
Using ChatGPT as a code reviewer works for two kinds of feedback. The first is style and convention: "is this idiomatic Go?" "would this code be considered Pythonic?" "are these variable names clear?" The model is good at this kind of comment because the training data is full of code review feedback.
The second is shallow correctness: spotting obvious bugs, missing error handling, or unreachable code paths. ChatGPT is decent at this for small diffs.
What it cannot do is review a pull request in the context of a whole codebase. It does not know your team's conventions, your existing helper functions, or the bug you fixed last month that this change is about to reintroduce. For deeper review, dedicated tools like Claude Code, Sourcegraph Cody, or GitHub Copilot's PR review feature, which have access to the whole repository, do better.
A reasonable middle ground is to paste a diff into ChatGPT, list the kinds of issues you want it to look for ("security, off-by-one, missing null checks, unhandled errors"), and let it generate a checklist that a human reviewer can confirm or dismiss.
A few habits make ChatGPT more useful for code work.
Give it enough context. A function in isolation is harder to reason about than the same function with its caller, the imports, and the relevant type definitions. If the model is guessing, more context usually cuts the guesswork.
State the language, version, and constraints up front. "Python 3.11, no external dependencies, must work on Windows" is a better start than "write me a function." The same applies to SQL dialect, regex flavor, framework version, and runtime environment.
Use Custom Instructions for permanent preferences. Things like "always use TypeScript with strict mode" or "prefer pytest over unittest" or "do not add docstrings unless I ask" do not need to be repeated every time.
Ask for code only when you need code. For exploratory questions, asking for an explanation first and then the code lets you catch design issues before the model commits to a particular implementation.
Verify by running. ChatGPT does not actually run most code unless you are in Code Interpreter mode. The fact that it returns code does not mean the code works.
Use Canvas for multi-turn coding. Long sessions are easier when you can see the file as it evolves and edit in place.
Treat package and function names with suspicion. If the model imports a library you have never heard of, search for it before installing. If it calls a function that does not show up in the official docs, the function probably does not exist.
Keep secrets out of the chat. API keys, private code, and sensitive data sent to ChatGPT.com are processed by OpenAI servers and, on the free and Plus tiers, may be used for training unless you turn off data sharing. Enterprise and Edu plans have stronger privacy guarantees. For commercial code, that distinction matters.
The failure modes of ChatGPT for code are well documented.
The most common failure is the model inventing a function, method, parameter, or whole package that does not exist. The function name sounds plausible. The arguments make sense given the rest of the code. The library being called is real. The function being called from that library is not. This pattern is common enough that it has its own name: package hallucination. In a March 2025 paper that analyzed roughly 576,000 generated Python and JavaScript code samples, the recommended packages did not exist in around 20% of cases, with rates ranging from 5% to 38% across different leading LLMs.
Security researchers have shown that this is exploitable. On March 28, 2024, Bar Lanyado at Lasso Security demonstrated the attack by asking ChatGPT how to upload a model to Hugging Face, getting a hallucinated package name (huggingface-cli), registering an empty package under that name on PyPI, and watching it get downloaded over 30,000 times in three months. Big companies, including ones in the AI space, had pulled the package into their builds. The class of attack has been nicknamed "slopsquatting" by analogy with typosquatting.
Every ChatGPT model has a knowledge cutoff. After that date, it knows nothing. For fast-moving libraries that release a new major version every six months, the cutoff matters. Code that the model writes against an old version may not work against the current one. Browsing-enabled models and the Atlas browser can compensate by reading current docs, but only when explicitly asked to.
Research on AI-generated code consistently finds security issues. In a 2021 study from NYU titled "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions," Hammond Pearce and colleagues had GitHub Copilot produce 1,689 programs across 89 scenarios. Roughly 40% contained bugs or design flaws exploitable by an attacker. Subsequent studies of ChatGPT-generated code reach similar conclusions, with the rate depending heavily on the prompt. The most common categories include SQL injection, missing input validation, weak cryptography choices, and overly permissive defaults.
Even when code runs and tests pass, ChatGPT-generated code can contain subtle errors. The 2024 GitClear report, which analyzed 211 million changed lines of code from 2020 through 2024, found that the share of copy-and-pasted lines rose from 8.3% in 2020 to 12.3% in 2024, while refactored ("moved") lines fell from 24.1% to 9.5%. The share of new code that was revised within two weeks rose from 5.5% in 2020 to 7.9% in 2024. The authors argue that AI tools are pushing teams toward speed at the cost of maintainability.
A July 2025 randomized controlled trial from METR found that 16 experienced open-source developers, when given access to AI coding tools (mostly Cursor with Claude 3.5 and 3.7 Sonnet), took 19% longer to complete tasks on mature codebases than when working without AI assistance. The same developers had estimated they were 20% faster with AI. The mismatch between perceived and actual productivity is one of the most interesting findings in the AI coding literature so far.
On the more optimistic side, Peng et al. (2023) ran an earlier study using GitHub Copilot. Ninety five professional programmers were asked to implement an HTTP server in JavaScript. The treatment group with Copilot finished 55.8% faster than the control group. The two studies are not contradictory; they measure different things, on different tasks, in different codebases, at different stages of developer experience with the tools.
ChatGPT is not the only AI coding tool, and for serious software work it is often not the best one. The main alternatives sit in a couple of categories.
| Tool | Format | Strengths | Weaknesses |
|---|---|---|---|
| ChatGPT | Web chat, app, Atlas browser | General purpose, fast, great for explanations | Cannot see your repo by default |
| GitHub Copilot | IDE plugin | Inline autocomplete inside the editor | Less useful for long explanations |
| Claude | Web chat, API | Long context, strong at long files, careful about hallucinations | Smaller ecosystem of plugins |
| Claude Code | CLI agent | Reads and edits whole repos, runs shell commands | Requires comfort with the terminal |
| Cursor | Fork of VS Code | Repo-aware chat and edits inside the editor | Subscription on top of model fees |
| Windsurf | Editor | Agent flow with terminal access | Smaller user base |
| Aider | CLI tool | Git-aware, works with multiple model providers | Less polished UI |
| JetBrains AI Assistant | IDE plugin | Native in JetBrains IDEs | Limited to JetBrains stack |
| Sourcegraph Cody | IDE plugin | Strong at large codebases | Heavier setup |
| Tabnine | IDE plugin | Local model option | Smaller models, less capable |
| Codeium | IDE plugin | Free tier for individuals | Less repo-aware than Cursor |
The simplified rule of thumb that most developers settle on:
Many developers use two or three of these together: ChatGPT for thinking out loud, Cursor or Copilot for typing, and Claude Code or Aider for repo-wide changes.
Four days after ChatGPT launched, on December 5, 2022, Stack Overflow's moderators announced a temporary ban on ChatGPT-generated answers. The post said the accuracy rate was too low and the volume of plausible-looking wrong answers was overwhelming the volunteer curation process. The temporary ban remained in effect afterwards as Stack Overflow worked on a permanent policy. The site's overall traffic declined notably through 2023 and 2024, with developers shifting their question-asking habits toward AI chatbots.
In January 2024, the UK parcel delivery firm DPD ran a customer service chatbot built on LLM technology. A frustrated customer, musician Ashley Beauchamp, prompted it to swear and to write a poem criticizing DPD. It complied, producing lines like "There was once a chatbot called DPD, who was useless at providing help." The screenshots went viral, the company disabled the offending element, and the incident became a textbook case of a chatbot deployed without enough guardrails. While not strictly about programming, it shaped industry caution about deploying LLMs in production-facing roles, including in IDE-based coding assistants.
Lasso Security's Bar Lanyado published an empty huggingface-cli PyPI package in March 2024, named after a function ChatGPT had hallucinated, and it racked up over 30,000 downloads in three months. The demonstration pushed package registries and AI vendors to take the hallucinated-package risk more seriously.
The most common paid tier for individuals doing programming work is ChatGPT Plus, which costs $20 per month and unlocks access to the better models and features (Canvas, faster responses, longer context, more usage). The full tier list, as of late 2025, is:
| Plan | Price | Notes |
|---|---|---|
| Free | $0 | Limited access to GPT-5 and reasoning models |
| ChatGPT Plus | $20 per month | Full GPT-5 access, Canvas, Code Interpreter |
| ChatGPT Pro | $200 per month | Higher rate limits, access to o1 Pro mode |
| ChatGPT Team / Business | $25 per user per month (monthly) or $20 per user per month (annual) | Shared workspace, admin controls |
| ChatGPT Enterprise | Custom | Estimated $40 to $75 per seat per month with around 150 seat minimum |
| ChatGPT Edu | Custom | Discounted plan for universities |
The ChatGPT Team plan was renamed ChatGPT Business on August 29, 2025. Enterprise and Edu plans include stronger privacy guarantees, which matters for teams pasting proprietary code into chats.
Developers who only need code via the API can use the OpenAI API directly. Per-token pricing varies by model and changes regularly. For a typical coding session with frequent long pastes, the API can be cheaper or more expensive than Plus depending on usage.