Homoiconicity (from the Greek homo, "same," and eikon, "image") is a property of certain programming languages in which the primary representation of programs is itself a data structure in a primitive type of the language. In a homoiconic language, source code can be inspected, generated, and transformed at runtime using the same operations applied to ordinary data. Programs are values, and the boundary between code and data effectively disappears [1][2].
The property is sometimes summarised as "code as data." Homoiconicity is the foundation of Lisp's macro system, which has made the Lisp family well suited for metaprogramming, the construction of domain-specific languages, and symbolic artificial intelligence research for more than six decades. Beyond Lisp, homoiconicity informs the design of Clojure, Julia, Elixir, Mathematica, Prolog, Tcl, REBOL, and several other languages [3].
The word "homoiconic" was coined by Calvin Mooers and his colleague Peter Deutsch in connection with TRAC, the Text Reckoning and Compiling language that Mooers designed at the Rockford Research Institute beginning in 1959. Mooers's 1968 paper in Communications of the ACM, "TRAC, A Procedure-Describing Language for the Reactive Typewriter," describes TRAC as a language whose internal character-string representation of stored procedures is identical to the strings the user types at the keyboard [4]. Mooers argued the homoiconic property was essential for a system programmed interactively at a typewriter.
Alan Kay's 1969 PhD dissertation at the University of Utah, The Reactive Engine, picked up the term. Kay used "homoiconic" to describe languages in which the data structures representing programs are the same data structures programs manipulate at runtime, tracing the lineage through TRAC, Lisp, and the early work that became Smalltalk [5].
The term predates the modern "code as data" slogan but is largely synonymous with it in the Lisp tradition. Paul Graham prefers the older phrase because he considers "homoiconicity" too narrow [6]. Earlier work by Douglas McIlroy on macro extensions and by Christopher Strachey on formal semantics foreshadowed the program-as-data idea [17][18].
A modern working definition has three components. First, the language must have a primitive data structure rich enough to represent the abstract syntax tree of any program. In Lisp this is the cons-list; in Julia the Expr object; in Clojure the persistent list, vector, and map; in Mathematica the general expression head[arg1, arg2, ...]. Second, the language must provide a way to obtain the program-as-data form of any source fragment, typically a quote operator. Third, the language must provide a way to evaluate such a data structure as code, conventionally an eval procedure.
Given these three pieces, a programmer can write functions that consume code, transform it, and emit new code, all using the language's ordinary manipulation tools. Transformations can run at compile time, producing macros, or at runtime, producing dynamically generated programs. They compose freely, since input and output share the same type.
It is sometimes claimed that homoiconicity requires the surface syntax to look like the data structure. This is the strict, Lisp-style reading: the parens-and-symbols you type really are a list of symbols. A looser reading drops the surface-syntax requirement and asks only that programs be available as ordinary values. The distinction matters when classifying borderline cases such as Tcl, REBOL, and Prolog [7].
Lisp is the canonical homoiconic language. John McCarthy's 1960 paper "Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I" introduced S-expressions, nested lists of atoms and other lists, as a uniform notation for both programs and data [8]. McCarthy planned to use the s-expression form (called M-expressions in the paper) only as an intermediate representation, but his student Steve Russell and others found the s-expression form so convenient that no separate surface syntax was needed.
In Lisp, the expression (+ 1 2 3) is at one moment a list of four elements (the symbol + and the integers 1, 2, and 3) and at the next moment a function call that returns 6. Whether it is treated as data or as code depends only on whether the evaluator is asked to evaluate it. The quote operator suppresses evaluation: '(+ 1 2 3) yields the four-element list. The eval procedure performs the inverse step: (eval '(+ 1 2 3)) returns 6.
The quasiquote-unquote pair, written `(...) and ,x, supplies a template form that fills in pieces of code from values computed at expansion time. Splicing unquote ,@x splices lists into the surrounding template. These three operators, quote, quasiquote, and unquote, constitute the basic toolkit for code construction in Common Lisp, Scheme, and Clojure [9].
Macros, defined in Common Lisp with defmacro, are functions whose arguments are unevaluated source code and whose return value is also source code. The macro system runs at compile time. Because both input and output are ordinary lists, the full power of the language is available to the transformer.
Homoiconicity has several practical consequences that explain why the property has retained adherents long after the rise of more conventionally syntaxed languages [6][9].
First, macros and domain-specific languages become natural. A library author who wants to add a new control structure, a pattern matcher, or an embedded query language does not need a parser generator or special tooling. The new construct is just a macro that rewrites code into existing code. Common Lisp's loop facility, Clojure's core.async, and Racket's match are substantial sublanguages built entirely as macros.
Second, arbitrary computation can run at compile time. Because macros are ordinary functions, they can read files, query databases at build time, and run optimisers. This collapses the traditional distinction between language, compiler, and program into a single namespace.
Third, the compiler is exposed as a library. The same eval and compile procedures the system uses to run user code are available to user code, which means optimisers and partial evaluators can be written in the language itself rather than as external preprocessors.
Fourth, symbolic AI has historically benefited. Early systems such as MACSYMA (later Maxima), Cyc, the Boyer-Moore theorem prover, and a long line of planning and natural-language systems represented their domain knowledge as Lisp expressions and reasoned over those expressions using the same machinery they used to reason over their own implementation [10].
Fifth, live programming environments such as SLIME for Common Lisp, CIDER for Clojure, and the Smalltalk image-based workflow rely on reading, editing, recompiling, and replacing code at runtime. Homoiconicity is not strictly required for a live programming environment or a REPL, as Smalltalk and Erlang show, but it makes implementation simpler.
The table below lists languages commonly described as homoiconic, with the data structure used to represent programs and the level of support for code-as-data programming.
| Language | Code data structure | Quote and eval | Macros | Notes |
|---|---|---|---|---|
| Common Lisp | Cons-cell list | quote, eval | defmacro, reader macros | Reference homoiconic language |
| Scheme | Cons-cell list | quote, eval | syntax-rules, syntax-case | Hygienic macros standardised |
| Clojure | Persistent list, vector, map | quote, eval | defmacro, syntax-quote | JVM and JavaScript hosts |
| Racket | List, syntax objects | quote, eval | Hygienic, define-syntax | Language laboratory |
| Emacs Lisp | Cons-cell list | quote, eval | defmacro | Embedded in the editor |
| AutoLISP | Cons-cell list | quote, eval | defun-based | AutoCAD scripting |
| TRAC | Character string | #, ## | Procedure-describing | Mooers's original |
| Prolog | Term | =.., clause/2 | term_expansion/2 | Clauses as terms |
| Tcl | Command string | eval, subst | proc, uplevel | Strings as commands |
| REBOL | Block | do, reduce | Block rewriting | Strong code-as-data |
| Red | Block | do, reduce | Block rewriting | REBOL successor |
| Smalltalk | Method AST, message | Reflective access | Limited | Image-based |
| PostScript | Executable array | exec | Token arrays | Stack-based |
| Mathematica | Expression f[x,y] | Hold, Evaluate | Pattern rewriting | Wolfram Language |
| Julia | Expr object | quote, eval | macro, @generated | Scientific computing |
| Elixir | Quoted tuple | quote, unquote | defmacro | Erlang VM |
| Erlang | Abstract form | parse_transform | Compile-time only | Less seamless |
| Nim | NimNode AST | quote do: | AST macros, templates | Static typing |
| Crystal | AST node | macro | Compile-time | Ruby-style syntax |
Languages sometimes claimed to be homoiconic but whose status is debated include Forth and APL [3].
Writers disagree about how strict a definition the term should bear. Three positions are common.
The strict position holds that a language is homoiconic only if its surface syntax is itself a literal expression of a primitive data type, as in Lisp's parenthesised lists. By this standard the only fully homoiconic languages are the Lisp family, Mathematica, REBOL, Red, and a handful of others.
The moderate position drops the surface-syntax requirement but keeps the requirement that the AST be a value of a primitive data type, available without special imports. This admits Julia, Elixir, Nim, Crystal, and arguably Prolog.
The loose position counts any language with adequate runtime access to its own AST or source, including Python through its ast module, JavaScript through Function and eval, and Haskell through Template Haskell. Critics argue that this reading drains the term of content.
For most practical purposes, the moderate reading is the most useful: it captures the family of languages in which macros are routine rather than exotic.
The practical payoff of homoiconicity is the macro system. Macro history divides into two categories [11][12].
Unhygienic macros, of which Common Lisp's defmacro is the archetype, give the macro author full control over expansion. The author can introduce new bindings, capture variables from the surrounding scope, and rewrite arbitrary subforms. This freedom is powerful and dangerous. The most famous hazard is variable capture: a macro that introduces a temporary binding under a name the user happened to choose for a different purpose will quietly clobber the user's binding. The standard remedy is gensym, which generates fresh symbol names.
Hygienic macros, introduced for Scheme by Eugene Kohlbecker, Daniel Friedman, Matthias Felleisen, and Bruce Duba in 1986, automate the freshness discipline. The macro system tracks the lexical scope in which each identifier was introduced and rewrites references so captures cannot occur by accident. Scheme's syntax-rules provides a pattern-template form, and syntax-case extends this to procedural macros. Racket generalises both into a comprehensive language-construction toolkit [13].
Reader macros in Common Lisp extend the parser: installing a function on a particular character changes how surface text is read into s-expressions. Compiler macros are optimisation rewrites that transform a function call into a more efficient equivalent at compile time, with the option of falling back to the original call.
Several familiar Lisp constructs are themselves macros rather than primitives: defun, cond, loop, case, when, and unless all expand to combinations of a smaller set of special forms. User-level macros routinely add for-loops, pattern matchers, ORM query languages, and configuration languages, with expansions visible through macroexpand [12].
Homoiconicity has costs as well as benefits.
Tooling complexity is the most-cited drawback. Refactoring tools, code completion, and IDE features generally rely on a fixed grammar. Macro-heavy code can defeat such tools because the meaning of a form depends on the macro that processes it. Racket and Clojure have invested in macro-aware tooling, but the problem is intrinsic to the design.
Compile-time complexity can produce expansions that are large, opaque, or surprising. A macro nesting several layers of helpers may expand to thousands of lines of generated code, complicating debugging.
Macro hygiene is a persistent hazard in unhygienic systems. Even disciplined authors miss subtle captures, and the resulting bugs can be hard to diagnose.
Macro overuse is a cultural risk. Abundant macros let each codebase grow its own dialect, raising the cost of reading unfamiliar code.
Greenspun's tenth rule observes, with intentional sting, that any sufficiently complicated C or Fortran program contains an ad hoc, informally specified, bug-ridden, slow implementation of half of Common Lisp [14]. The rule is part joke and part serious claim about the gravitational pull of metaprogramming features.
Homoiconicity remained a niche concern through the 1990s, but several developments since 2005 have brought it back into wider use [15][16].
Clojure, released by Rich Hickey in 2007, revived practical interest in Lisp-style homoiconicity for the JVM ecosystem. Its emphasis on persistent data structures, software transactional memory, and Java interoperability gave Lisp macros a modern delivery vehicle.
Julia, launched in 2012, brought homoiconicity to scientific computing. Julia's Expr type, multiple dispatch, and @macro syntax have allowed library authors to build differentiable programming, GPU compilation, and probabilistic modelling on a shared metaprogramming substrate.
Elixir, created by Jose Valim in 2011, layered Lisp-influenced macros over the Erlang VM. The language's routing and ORM frameworks are macro-driven sublanguages built on quoted expressions.
Rust's declarative macro_rules! and procedural macros operate over token streams rather than over a primitive AST, so Rust is not classically homoiconic. The procedural macro system nevertheless provides much of the practical power of Lisp-style macros, and serde, tokio, and many other Rust libraries depend on it.
In AI and machine learning, several modern frameworks demonstrate the continued utility of code-as-data design even in non-homoiconic host languages. PyTorch FX traces Python code into a graph IR that downstream passes transform; JAX traces pure Python functions into XLA HLO; Turing.jl, Edward, and Pyro implement probabilistic programming by transforming user-written model code into samplers. Each system is homoiconicity-adjacent: it pays a cost in tracing or AST inspection precisely because the host language is not directly homoiconic.
The following examples illustrate the homoiconic idiom in three languages.
Common Lisp, defining unless as a macro:
(defmacro unless (cond &body body)
`(if (not ,cond)
(progn ,@body)))
(unless (zerop x)
(format t "x is ~a~%" x))
The macro receives cond and body as unevaluated source. The backquote builds a new list with ,cond and ,@body spliced in.
Clojure:
(defmacro unless [test & body]
`(if (not ~test)
(do ~@body)))
Clojure's syntax-quote uses ~ and ~@ for unquote and splicing-unquote and resolves symbols to fully qualified names, mitigating capture.
Julia:
macro unless(cond, body)
quote
if !$(esc(cond))
$(esc(body))
end
end
end
The macro returns an Expr from Julia's quote ... end form. The $ operator interpolates input AST, and esc controls hygiene.
Most mainstream languages are not homoiconic in the strict or moderate sense. C, C++, Java, JavaScript, Go, and Python all keep programs and data in distinct universes. C and C++ macros operate on tokens through the preprocessor and cannot inspect types or scopes. Java has reflection but no macro system. JavaScript can call eval on source strings but does not expose its AST as a value. Python's ast module exposes the AST as a class hierarchy, but Python's surface syntax is not a literal Python data structure and macros are not a normal part of the language.
Rust sits in a middle ground: its procedural macros operate on TokenStream and, with helper crates, on a parsed AST, but the AST type is not part of the language's primitive data and macro expansion is restricted to compile time.
Forth is sometimes claimed as homoiconic because words are sequences of executable primitives, but Forth programs at the source level are not first-class data. APL and J are data-oriented in a different sense: their primary objects are arrays, but programs are not arrays.