Cyc (pronounced "psyche", from "encyclopedia") is a long-running symbolic AI project that aims to assemble a comprehensive ontology and knowledge base of common-sense facts and reasoning rules. It was started by Douglas Lenat in 1984 at the Microelectronics and Computer Technology Corporation (MCC) in Austin, Texas, and has continued from 1994 onward at Cycorp, the company Lenat founded to commercialize the work. Cyc is one of the largest and longest-running symbolic AI efforts in history, with development spanning more than four decades [1][2].
The central thesis behind Cyc is that intelligent behavior in everyday situations depends on a vast, mostly unspoken body of background knowledge that humans acquire through living, and that machines need this same substrate to reason robustly about the physical and social world. Lenat estimated this body of "consensus reality" knowledge at millions of assertions and proposed to encode it by hand in a formal language called CycL [2][3]. As of public statements in the 2010s and 2020s, the full Cyc knowledge base contains hundreds of thousands of concept names and millions of assertions arranged into hundreds of microtheories [4][5].
Cyc is unusual for its scale, longevity, insistence on hand curation by trained ontologists, and commitment to formal logic. After Lenat's death from cancer on August 31, 2023, Cycorp has continued operating, and the research conversation around Cyc has been reshaped by the rise of large language models, which produce common-sense looking text without an explicit symbolic substrate [6][7].
Cyc grew out of Doug Lenat's earlier work on heuristic search. His 1976 Stanford doctoral thesis described AM, the Automated Mathematician, a program that explored elementary set theory and number theory by mutating and evaluating concepts, supervised by Edward Feigenbaum and Cordell Green [8]. AM appeared to rediscover concepts such as prime numbers and Goldbach's conjecture. Lenat extended the approach in Eurisko (1981 to 1984), which applied heuristic mutation to its own heuristics across several domains, including the design of three dimensional integrated circuits and the wargame Traveller TCS, where Eurisko-designed fleets won the U.S. national championship in 1981 and 1982 [9].
Despite these surface successes, Lenat concluded that AM and Eurisko had hit a hard ceiling. Both were powerful within narrow notational worlds but could not transfer to new domains because they lacked the broad background knowledge that human researchers take for granted. In a 1983 retrospective, Lenat argued that without a foundation of common sense, heuristic search would always be brittle [9][10].
The institutional opportunity to act on this conclusion arrived through MCC. The Microelectronics and Computer Technology Corporation was founded in 1982 in Austin, Texas, as a U.S. industry consortium intended to coordinate research on advanced computing in response to Japan's Fifth Generation Computer Systems project, announced in 1981 [11]. Bobby Ray Inman, MCC's first chief executive, recruited Lenat from Stanford in 1984 to lead an AI program, and Lenat proposed Cyc as its flagship effort.
The original plan was famously ambitious. Lenat estimated that an adult's basic stock of common sense involved on the order of one to two million assertions and budgeted roughly 200 person-years of effort to encode them, with a 10 year horizon to a working baseline [2][3]. MCC funding for Cyc through the late 1980s and early 1990s has been reported as on the order of tens of millions of dollars over a decade, depending on which member companies' contributions are counted [4][12]. By 1989, the knowledge base contained around 30,000 assertions about a few thousand concepts; by 1995 the figure was reportedly above one million [2][3].
In 1994, Lenat and a small team spun the project out of MCC into a private company called Cycorp, based in Austin. MCC's consortium model was poorly suited to long-term commercial development of a knowledge product, and several MCC member companies were winding down their AI investments after the AI winter of the late 1980s. Cycorp licensed the technology and continued the encoding effort while pursuing application contracts with U.S. government agencies and selected commercial customers [13].
Cycorp has remained closely held, with the Lenat family continuing to hold significant ownership and management roles. The company has reported a workforce of dozens of ontologists, engineers, and applied researchers at any given time, with most revenue coming from project contracts rather than packaged software [13][14]. Major funders have included DARPA, the Department of Defense, the intelligence community, the National Institutes of Health, and named commercial partners. After Lenat's death in 2023, Cycorp stated that the company would continue under existing leadership, with long-time technical staff including Dave Schneider and Michael Witbrock associated with the project at various points [6][14].
The core technical artifact of the Cyc project is CycL, a knowledge representation language that combines frames with a higher-order extension of first-order predicate logic. CycL was originally implemented in Common Lisp and uses a Lisp-like prefix syntax. A typical assertion is written as a logical sentence in parentheses, such as (isa Fido Dog) to express that the individual Fido is an instance of the collection Dog, or (implies (and (isa ?X Person) (isa ?Y Cup)) (canDrinkFrom ?X ?Y)) for a more involved rule [2][3].
The representation goes beyond textbook first-order logic in several ways. CycL supports higher-order quantification over predicates and collections, modal operators for belief and time, default reasoning with exceptions, and reified events and situations. Categories in CycL are first-class objects, which allows the language to express statements about classes themselves rather than only about their instances. Numbers, intervals, units, and arithmetic are represented through a layer of mathematical microtheories.
A distinctive feature of CycL is its use of microtheories, abbreviated Mt. A microtheory is a labeled bundle of assertions that share a context, such as background assumptions for a particular domain, time period, naive theory, or work setting. Microtheories form a partial order under a genlMt relation, and queries are evaluated relative to a microtheory, with assertions inherited from more general microtheories unless overridden. This allows Cyc to hold mutually contradictory beliefs without global inconsistency. For example, a folk-physics microtheory may say that solid objects rest on the ground, while an orbital mechanics microtheory allows objects to remain at rest in free fall, and a fairy-tale microtheory permits flying carpets [2][3][5].
The Cyc inference engine combines forward chaining, backward chaining, and specialized reasoning modules for arithmetic, set membership, transitive relations, temporal reasoning, and taxonomic reasoning. Forward chaining materializes commonly needed consequences as new assertions are added; backward chaining constructs proofs on demand during query answering. Cyc also supports abductive reasoning, planning, and the generation of natural language paraphrases of assertions. By the late 2010s, Cycorp described the system as supporting tens of thousands of distinct predicate and function symbols, with built-in ways to talk about events, agents, beliefs, evidence, and uncertainty [4][5].
The Cyc knowledge base has been described in many published statements over the decades, and the figures vary with date and with which subset is being measured. The following counts are approximate and tied to the indicated year of reporting.
The domains covered are intentionally broad. Naive physics covers solids, liquids, support, containment, motion, and basic causation. Biology covers organisms, anatomy, and life cycles. Geography covers political and physical regions, transportation, and climate. Social knowledge covers families, organizations, professions, money, ownership, contracts, and law. Specialized layers cover engineering, medicine, finance, and the parts of mathematics needed to reason about the rest [3][5]. Common-sense rules include the defaults that people have one head and two arms, that water flows downhill, that an unsupported solid object falls when released, that a glass breaks on a hard floor unless cushioned, and that working adults are usually paid for their labor. The point of these rules is not that they are universally true; they are the defaults a competent reasoner uses unless something in the situation overrides them.
From the early 2000s onward, Cycorp made several efforts to expose parts of the knowledge base to outside researchers.
OpenCyc, the open-source subset, was first released as version 0.6 in 2002. Subsequent releases included OpenCyc 0.9, 1.0, 2.0, 3.0, and the largest, OpenCyc 4.0 in 2012, which exposed roughly 239,000 concepts and 2.39 million assertions along with mappings to WordNet and DBpedia. OpenCyc was distributed under the Apache 2.0 license starting with version 1.0. Critics argued that it represented mainly the upper taxonomic layer and excluded the deeper rule and microtheory content that gave Cyc its distinctive reasoning ability. Cycorp discontinued OpenCyc around 2017, citing the difficulty of maintaining a public release alongside the evolving internal product [15][16].
ResearchCyc was offered to academic researchers under a free non-commercial license starting in 2006. It exposed substantially more rule and microtheory content than OpenCyc and included a Java-based browser, an inference engine, and a natural language toolkit. Several university groups produced papers using ResearchCyc through the 2000s and 2010s, including work on question answering, machine translation, and integration with Semantic Web data [16][17].
EnterpriseCyc, sometimes branded the Cyc Knowledge Base, is the commercial product, licensed to enterprise and government customers with consulting from Cycorp, and is the only release that includes the full current knowledge base and reasoning machinery [13][14].
Cyc has been applied to a range of practical problems, although Cycorp's commercial agreements often prevent disclosure of customers. The following examples appear in published case studies, press releases, or government program documents.
Cycorp has positioned Cyc as a substrate for reasoning problems where shallow machine learning is insufficient and where the cost of curating knowledge is justified by correct, explainable outputs.
Douglas Bruce Lenat was born in 1950 and earned his PhD from Stanford University in 1976 with the thesis that produced AM, supervised by Edward Feigenbaum and Cordell Green. After Stanford, Lenat held faculty positions at Carnegie Mellon and then returned to Stanford as an assistant professor before joining MCC in 1984 [8][9]. He was elected a Fellow of the AAAI and received the Computers and Thought Award in 1977 for his work on AM [19].
Lenat spent more than 38 years working on Cyc, longer than most academic researchers' entire careers. He served as Cycorp's chief executive officer until his death and remained personally involved in the design of CycL and the curation of high-level microtheories. He published widely over the decades, including the 1990 book Building Large Knowledge-Based Systems, written with Ramanathan V. Guha, which remains the most detailed published description of CycL and Cyc's reasoning architecture [2]. His 1995 Communications of the ACM paper "CYC: A Large-Scale Investment in Knowledge Infrastructure" is among the most cited statements of the project's rationale [3].
Lenat was known for being unusually willing to defend the unfashionable bet that hand-curated symbolic knowledge would eventually pay off, and he frequently engaged with critics in print. In a final coauthored paper with Gary Marcus, "Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc", posted to arXiv in July 2023 about a month before his death, Lenat argued that the path forward lay in combining the breadth of large language models with the explicit, inspectable knowledge and reasoning of Cyc [7].
Lenat died of cancer on August 31, 2023. His passing was noted in obituaries and tributes from across the AI community, including statements from researchers such as Stuart Russell and Yann LeCun, and from the Cycorp team [6][14][26].
Cyc has been one of the most discussed projects in AI partly because it has been one of the most contested. The criticisms broadly fall into several categories.
Hand coding versus learning. From the 1990s onward, machine learning researchers argued that the only feasible way to assemble knowledge at Cyc's intended scale was to learn it from data rather than encode it by hand. Pedro Domingos discussed Cyc in The Master Algorithm (2015) as a paradigm of the symbolic tradition and used it as a foil for his argument that probabilistic and connectionist methods were the more productive long-term path [20]. By the 2010s, web-mined knowledge bases such as Wikidata, DBpedia, and YAGO had reached scales much larger than Cyc at a tiny fraction of the labor cost, although their assertions were typically much shallower than CycL rules.
The bottom-up versus analogical critique. Douglas Hofstadter, in Fluid Concepts and Creative Analogies (1995) and Le Ton beau de Marot (1997), argued that Cyc's bottom-up approach to common sense missed what he saw as the central feature of human cognition, namely fluid analogy making. From this view, no amount of explicit propositions could capture how humans see one situation as another, and Cyc's commitment to formal predicates was a category error [21]. Marvin Minsky was more sympathetic, since his Society of Mind framework shared Cyc's appetite for many small pieces of knowledge, but argued that knowledge had to be entangled with diverse representations rather than uniformly logical [22].
Underdelivery. Stuart Russell, in Human Compatible (2019) and in editions of Artificial Intelligence: A Modern Approach coauthored with Peter Norvig, treated Cyc as a cautionary tale about the difficulty of building general intelligence by codifying common sense in formal logic. Russell observed that despite four decades of effort, Cyc had not produced a transformative capability comparable to those from deep learning [23][24].
Opacity. Because the full Cyc knowledge base is proprietary and the public OpenCyc release was a thin slice, outside researchers have had limited ability to evaluate Cyc's actual breadth and consistency. The discontinuation of OpenCyc in 2017 sharpened this concern, since claims about Cyc's size and capabilities could not be independently checked at the level of the rule layer [15][16].
Generative AI era. From around 2020 onward, the rise of large language models reshaped the debate. Models such as GPT-3, GPT-4, Claude, and Gemini produced text displaying apparent common sense across a wide range of tasks, learned implicitly from huge corpora rather than encoded by hand. Some commentators argued this vindicated the data-driven critics of Cyc. Others, including Lenat in his final writings with Marcus, argued that LLM hallucinations and the absence of an explicit world model showed exactly the weakness that Cyc-style symbolic substrates were designed to address, and called for hybrid neuro-symbolic systems [7][25].
Cyc is the largest and most ambitious of the hand-built knowledge bases, but it is far from the only large knowledge resource. The table below summarizes several of the most widely used systems.
| Resource | Origin | Started | Approach | Approximate scale | Focus |
|---|---|---|---|---|---|
| Cyc | Cycorp / MCC | 1984 | Hand curated CycL with microtheories | Tens of thousands of concepts, millions of assertions, 25M rules in late 2010s reports | Common-sense reasoning across domains |
| WordNet | Princeton | 1985 | Hand curated lexical database | About 117,000 synsets | English lexical semantics |
| ConceptNet | MIT Open Mind Common Sense | 2004 | Crowd sourced and merged | About 8 million edges across 80 languages | Common sense relations |
| DBpedia | Leipzig, Mannheim, et al. | 2007 | Extracted from Wikipedia infoboxes | Tens of millions of triples | Encyclopedic facts |
| YAGO | Max Planck Saarbrucken | 2007 | Wikipedia plus WordNet plus GeoNames | Hundreds of millions of facts | Encyclopedic facts with type hierarchy |
| NELL | Carnegie Mellon | 2010 | Never-ending web learning | Tens of millions of beliefs | Web-scale categories and relations |
| Google Knowledge Graph | 2012 | Proprietary, learned and curated | Hundreds of billions of facts (per Google statements) | Search and assistant grounding | |
| Wikidata | Wikimedia | 2012 | Community curated | Over 100 million items | General-purpose structured data |
| ATOMIC | Allen AI / UW | 2019 | Crowd-collected if-then knowledge | About 877,000 if-then tuples | Inferential common sense |
Cyc differs from these resources in several ways. Its assertions are richer, since CycL allows quantifiers, defaults, and modal operators that simple triple stores cannot directly express. Its content is organized around microtheories, allowing inconsistent local theories to coexist. And its provenance is a small team of trained ontologists rather than a crowd or a web crawler, which is both its principal cost and the source of its consistency.
The rise of large language models has reframed the discussion of Cyc, since LLMs trained on internet-scale text appear to absorb common-sense-looking knowledge implicitly, with no explicit ontology and no microtheories. Several positions have emerged.
One position is that LLMs have made Cyc-style hand engineering obsolete, on the grounds that statistical learning from massive corpora is the only practical route to general knowledge at scale. A contrasting position, advocated by Lenat and Gary Marcus in their 2023 paper, is that LLMs and Cyc-style knowledge bases are complementary: LLMs supply broad coverage and natural language fluency but hallucinate because they lack a structured, inspectable world model, while Cyc supplies precise, auditable rules but cannot scale alone to the breadth of everyday discourse. They argued that hybrid neuro-symbolic systems would deliver more trustworthy AI than either alone [7][25].
A third position holds that the most useful elements of Cyc may turn out to be the methodological lessons rather than the artifact. The Cyc team's experience cataloguing what humans know and where formal logic struggles is itself a contribution to knowledge representation and commonsense reasoning, even if future systems are largely learned. Cycorp has signaled openness to integration with statistical methods, with talks and white papers from the late 2010s and early 2020s describing ways to use machine learning to suggest candidate assertions for human review, to map natural language into CycL, and to use Cyc as a checker on top of language model output.
Cycorp has confirmed that work on Cyc continues after Lenat's death. Public messaging has emphasized continuity of mission, ongoing customer engagements, and maintenance and expansion of the knowledge base. As of public statements through 2024 and 2025, the company remains privately held in Austin with its longstanding senior staff in place [6][14]. Whether Cyc becomes a central thread of AI research, a niche industrial product, or a piece of the field's history is not yet settled. The next few years are likely to be decisive both for Cycorp's commercial trajectory and for the wider question of whether explicit, hand-curated knowledge remains a useful component of AI systems whose behavior is increasingly produced by neural networks.