Cognitive robotics is the subfield of robotics and artificial intelligence concerned with endowing robots with cognitive capabilities. The capabilities typically targeted include perception, attention, memory, reasoning, learning, planning, action selection, and social interaction. Cognitive robotics overlaps substantially with embodied AI, developmental robotics, social robotics, and bio-inspired robotics, but it is distinguished by its focus on the high-level mental processes that turn a moving machine into something that can be said to know, decide, and adapt.
The term was coined by the research group of Yves Lesperance, Hector Levesque, Fangzhen Lin, Daniel Marcu, Ray Reiter, and Richard Scherl at the University of Toronto in 1994 and was put forward more programmatically in the 1998 "Cognitive Robotics Manifesto" by Levesque and Reiter. In Levesque and Lakemeyer's 2008 chapter in the Handbook of Knowledge Representation, the field is defined as "the study of the knowledge representation and reasoning problems faced by an autonomous robot (or an agent) in a dynamic and incompletely known world."
Since that early Toronto work, cognitive robotics has expanded well beyond logical knowledge representation. It now spans developmental robotics in the tradition of Asada, Cangelosi, Pfeifer and Sandini; social robotics in the lineage of Brooks and Breazeal at MIT; symbolic cognitive architectures like Soar, ACT-R, and ICARUS being grounded on physical platforms; and the new wave of foundation-model robotics where vision-language-action systems such as RT-2, OpenVLA, π0, Helix, and Gemini Robotics drive humanoids using large pre-trained models.
Cognitive robotics sits at the intersection of robotics, artificial intelligence, cognitive science, and neuroscience. It can be contrasted with each of these neighbours along the following axes.
| Comparison | Cognitive robotics emphasises | The other field emphasises |
|---|---|---|
| Vs. classical robotics | High-level cognition: reasoning, knowledge, language, social behaviour | Low-level control, mechanics, kinematics, motion planning |
| Vs. AI | Embodied agents acting in the physical world; perception-action loops | Abstract symbol manipulation, disembodied algorithms, software agents |
| Vs. cognitive psychology / neuroscience | Engineering perspective; build artefacts that can act | Empirical study of biological cognition |
| Vs. cognitive science | Synthetic methodology: "understanding by building" | Theory and behavioural experiment |
| Vs. behaviour-based robotics | Internal representation, deliberation, language understanding | Reactive layered behaviours without explicit world models |
| Vs. developmental robotics | Often takes adult-like cognition as the design target | Models the developmental trajectory from infancy |
The boundary with developmental robotics is the most fluid. Asada and colleagues introduced "cognitive developmental robotics" (CDR) in 2009 specifically to bridge them: CDR uses physical embodiment and interaction to build up cognitive functions from body representation through to social behaviour, with the goal of understanding the development of human higher cognition through synthesis.
Classical AI included robotics from the start. The most influential early system was Shakey, built at the Stanford Research Institute (SRI) between 1966 and 1972 under Charles Rosen, Nils Nilsson, Bertram Raphael, and Peter Hart. Shakey was the first mobile robot to reason about its actions: it integrated logical reasoning, autonomous plan creation, plan execution with error recovery, computer vision, navigation, and natural-language communication in a single physical system. The project produced the A* search algorithm, the Hough transform, and the visibility graph method as direct by-products. Shakey defined what "a robot that thinks" looked like for a generation.
In the 1980s, dedicated cognitive robotics groups began to form. The Toronto cognitive robotics group around Hector Levesque and Ray Reiter started developing logical foundations for action and change, using Reiter's reformulation of the situation calculus. The ATR Cognitive Robotics group in Japan worked on perception and learning for autonomous robots.
In 1991, Rodney Brooks at MIT published "Intelligence Without Representation" in Artificial Intelligence (volume 47, pages 139 to 159). The paper argued that classical AI had foundered on representation, and that intelligence approached incrementally through perception and action need not require explicit symbolic models. Brooks's subsumption architecture, demonstrated on robots like Genghis and later Cog, organised behaviour into layers of simple competences (wander, avoid obstacles, follow walls) without a central world model. The paper became one of the most cited critiques of symbolic AI.
At the same time, the Toronto group went the other way. In 1994 Levesque, Reiter, Lesperance, Lin, and Scherl introduced GOLOG, a high-level programming language built on the situation calculus, designed specifically for cognitive robots that needed to reason about the effects of actions. GOLOG was extended to ConGolog (concurrent) and IndiGolog (incremental, supporting interleaved planning, sensing, and action) in collaboration with Yves Lesperance and Giuseppe De Giacomo.
The MIT humanoid robotics group under Brooks, with Cynthia Breazeal as a graduate student, built Cog (an upper-torso humanoid with 21 degrees of freedom and visual, auditory, vestibular, kinesthetic, and tactile senses) and Kismet (an expressive head designed for face-to-face social interaction). Kismet, completed in the late 1990s, is widely cited as the first social robot and as the founding artefact of social robotics.
The 2000s saw the consolidation of developmental robotics as a named field, driven by Max Lungarella, Giorgio Metta, Rolf Pfeifer, Giulio Sandini, Minoru Asada, Yasuo Kuniyoshi, and others. The signature artefact was the iCub: a one-metre humanoid the size of a 3.5-year-old child, designed by the RobotCub consortium and built at the Istituto Italiano di Tecnologia (IIT) in Genoa. The RobotCub project ran for 65 months from 1 September 2004 to 31 January 2010 with EUR 8.5 million from Unit E5 of the European Commission's Seventh Framework Programme. The cub in iCub stands for Cognitive Universal Body, and the platform was explicitly motivated by the embodied cognition hypothesis: that human-like manipulation is essential for human-like cognition. About thirty iCubs are in research labs, mostly in the European Union with one in the United States.
In 2007, Pfeifer and Bongard published How the Body Shapes the Way We Think: A New View of Intelligence (MIT Press), arguing that the structure of cognition is constrained and enabled by the morphology and material properties of the body. The book popularised a research methodology of "understanding by building" and a concrete agenda around morphological computation.
In 2009, Asada, Hosoda, Kuniyoshi, Ishiguro, Inui, Yoshikawa, Ogino, and Yoshida published "Cognitive Developmental Robotics: A Survey" in IEEE Transactions on Autonomous Mental Development, volume 1, issue 1, pages 12 to 34. The survey defined CDR's research agenda: physical embodiment as the foundation, then body representation, then motor and perceptual development, then social behaviour.
In the 2010s, classic cognitive architectures from cognitive science were applied to physical robots more systematically. Soar (Laird), ACT-R (Anderson), ICARUS (Langley), CLARION (Sun), LIDA (Franklin), Sigma (Rosenbloom), GLAIR, and Verschure's biologically-inspired Distributed Adaptive Control (DAC) all saw robotic implementations. KnowRob, introduced by Moritz Tenorth and Michael Beetz in 2009 and described in the International Journal of Robotics Research in 2013, became the most widely used knowledge-processing framework for cognition-enabled robots; it uses ontologies and "virtual knowledge bases" computed on demand from the robot's perception and planning components.
David Vernon's 2014 textbook Artificial Cognitive Systems: A Primer (MIT Press) consolidated the field into the cognitivist, emergent, and hybrid paradigms, with chapters on autonomy, embodiment, learning, memory, knowledge, and social cognition.
The biggest single change to cognitive robotics since the 1990s arrived with large pre-trained models. Google's PaLM-SayCan paper (Ahn et al., 2022) paired the PaLM language model with a learned affordance function and a library of low-level skills: the LLM proposed candidate actions, and the affordance function pruned them to those physically feasible in the current state. PaLM-SayCan reported 84% planning success and 74% execution success on a real mobile manipulator. RT-1 (Brohan et al., December 2022) introduced the Robotics Transformer, trained on 130,000 episodes covering 700+ tasks. RT-2 (Brohan et al., July 2023) extended the idea by treating actions as language tokens, co-fine-tuned with a vision-language model on Internet-scale data. The Open X-Embodiment / RT-X effort (Padalkar et al., 2024) pooled 60 datasets from 34 labs into one corpus of 1M+ trajectories across 22 embodiments and trained cross-embodiment policies. OpenVLA (Kim et al., 2024) released a 7B-parameter open-source VLA built on Llama 2 plus DINOv2 and SigLIP visual encoders, beating RT-2-X by 16.5% with seven times fewer parameters. Octo (Octo Model Team, 2024) added a fully open transformer diffusion policy trained on 800K episodes. Gemini Robotics from Google DeepMind, released in 2025 and updated as Gemini Robotics 1.5, brought the Gemini family directly into robot control. Physical Intelligence's pi0 (also written π0, Levine et al., October 2024) introduced a flow-matching action expert on top of a vision-language model and demonstrated long-horizon tasks like folding laundry; the company has raised more than USD 400 million and open-sourced the model. NVIDIA's Project GR00T, announced 18 March 2024 at GTC, is a foundation model targeting humanoid robots, with a dual-system architecture (System 1 reflexive, System 2 deliberative) and a Jetson Thor on-board computer; partners include 1X Technologies, Agility Robotics, Apptronik, Boston Dynamics, Figure AI, Fourier Intelligence, Sanctuary AI, Unitree Robotics, and XPENG. Figure AI's Helix, released in 2024 and updated as Helix 02 in early 2026, is a VLA controlling the full humanoid upper body and now the whole body.
A cognitive architecture is a specification of the fixed structure of a mind: its memories, processes, and how they interact. Several have been applied to robotic platforms.
| Architecture | Originator | Year | Style | Robotic uses |
|---|---|---|---|---|
| Soar | John Laird, Allen Newell, Paul Rosenbloom | 1983 | Symbolic with reinforcement learning, episodic and semantic memory | Robo-Soar (1991) on a Puma arm; mobile robots; REEM service robots; unmanned underwater vehicles |
| ACT-R | John Anderson | 1993 | Declarative and procedural memory, modular cognitive science model | Mobile robots, human-robot teams |
| ICARUS | Pat Langley | 1991 | Concepts and skills hierarchies | Indoor mobile robots, manipulation |
| CLARION | Ron Sun | 1997 | Dual-process: explicit symbolic plus implicit subsymbolic | Cognitive simulations and robot agents |
| LIDA | Stan Franklin | 2006 | Global Workspace consciousness model | Cognitive software and robots |
| Sigma | Paul Rosenbloom | 2011 | Graphical-model unification | Limited robotic deployments |
| GLAIR | Stuart Shapiro | 1990s | Grounded layered architecture with integrated reasoning | Cassie / FEVAHR, manipulation |
| DAC | Paul Verschure | 1992 onwards | Distributed Adaptive Control, biologically inspired layers | Mobile robots, the Ada robot, neuroprosthetics |
| Subsumption | Rodney Brooks | 1986 | Reactive layers without world model | Genghis, Allen, Herbert, Cog |
Soar and ACT-R are the oldest and most widely used. Their similarities, together with Sigma, prompted the Common Model of Cognition initiative to articulate a shared abstract specification.
| System | Origin | Year | Role |
|---|---|---|---|
| GOLOG, ConGolog, IndiGolog | Levesque, Reiter, Lesperance, De Giacomo (Toronto, York, Sapienza) | 1994 onwards | Situation-calculus-based programming languages for cognitive robots |
| KnowRob | Moritz Tenorth and Michael Beetz, Munich and Bremen | 2009 | Ontology-based knowledge processing framework for everyday manipulation |
| iCub | RobotCub consortium, IIT Genoa | 2004 | Open-source humanoid testbed for embodied cognition |
| Cog | Rodney Brooks, MIT | 1993 to 2003 | Upper-torso humanoid for developmental and social cognition |
| Kismet | Cynthia Breazeal, MIT | Late 1990s | Pioneer expressive social robot |
| Nico | Brian Scassellati, Yale | 2005 onwards | Child-like humanoid for cognitive science |
| HUMANOID series | Atsuo Takanishi, Waseda | 1980s onwards | Bipedal and emotional humanoids |
| Pepper | SoftBank Robotics, Aldebaran | 2014 | Mass-produced sociable humanoid; over 27,000 units at peak; deployed in retail, healthcare, hospitality, banking, education |
| REEM-C | PAL Robotics | 2013 | Research humanoid used with Soar and ROS |
| Robonaut | NASA and General Motors | 2000s onwards | Humanoid for orbital and ground tasks |
| Atlas | Boston Dynamics | 2013 onwards | Bipedal platform with limited explicit cognition, increasingly paired with foundation models |
| iRobot Roomba | iRobot, founded by Brooks, Greiner, Angle | 2002 | Minimal cognition: simple mapping and behaviour-based control |
Many of these platforms run on the Robot Operating System (ROS) for middleware, which has become the default communication layer for cognitive-robot research stacks.
Perception. Object recognition, scene understanding, multimodal integration of vision, audio, touch, and proprioception. Modern cognitive robotics increasingly uses pretrained vision encoders (DINOv2, SigLIP, CLIP) as front-ends for higher-level reasoning.
Attention and saliency. Top-down and bottom-up attention models that direct sensors and computation to task-relevant regions. Joint attention, where two agents attend to the same object, is a particular research focus in social cognitive robotics.
Knowledge representation. Ontologies (KnowRob), semantic maps, scene graphs, and situation calculus. These provide the structured background a robot needs to reason about objects, places, capabilities, and norms.
Reasoning. Situation-calculus-based reasoning in the GOLOG family; classical and probabilistic planning; commonsense reasoning over everyday objects and situations; causal reasoning about why an action will or will not work.
Memory. Episodic memory of specific past experiences, semantic memory of general facts, and procedural memory of motor skills. Soar, ACT-R, and LIDA all distinguish these subsystems explicitly.
Learning. Developmental learning that ramps up complexity over time, imitation learning from human demonstrations, learning from demonstration on teleoperated trajectories, reinforcement learning, and the self-supervised pretraining that drives modern VLA models.
Social cognition. Theory of mind, gaze following, joint attention, empathy, and turn-taking. Kismet was the first artefact built explicitly to engage in face-to-face social interaction, and the line continues through Nao, Pepper, and modern humanoids.
Language and dialogue. From early work on instruction following with parsers and grammars to today's LLM-grounded dialogue systems that interpret "please load the dishwasher" and decompose it into a feasible plan.
Embodiment and morphology. Pfeifer and Bongard 2007 is the canonical reference. The argument is that cognitive abilities are shaped by what the body can sense and do; "morphological computation" exploits passive dynamics and material properties to offload work that would otherwise have to be computed.
Tool use, manipulation, and metacognition. Tool use is a long-standing benchmark for cognitive ability and a current frontier for VLA models. Metacognition (robots that monitor their own state, recognise their limitations, and decide when to ask for help) draws on uncertainty estimation, meta-reasoning, and explicit self-models.
The period since 2022 has seen a rapid succession of vision-language-action (VLA) and robot foundation models that fold parts of cognitive robotics into a single learned system.
| Model | Authors | Year | Contribution |
|---|---|---|---|
| PaLM-SayCan | Ahn et al. (Google, Everyday Robots) | April 2022 | First widely cited LLM-plus-affordance system; PaLM 540B as planner, value-function affordances as filter |
| PaLM-E | Driess et al. (Google) | March 2023 | Embodied multimodal language model |
| RT-1 | Brohan et al. (Google) | December 2022 | Robotics Transformer trained on 130K episodes, 700+ tasks |
| RT-2 | Brohan et al. (Google DeepMind) | July 2023 | Vision-language-action model treating actions as language tokens; chain-of-thought planning |
| Open X-Embodiment / RT-X | Padalkar et al. (34 labs) | 2024 | 1M+ trajectories across 22 embodiments |
| OpenVLA | Kim et al. (Stanford, UC Berkeley, Google DeepMind, TRI) | June 2024 | 7B open-source VLA on Llama 2 plus DINOv2 plus SigLIP |
| Octo | Octo Model Team | May 2024 | Open transformer-diffusion generalist policy on 800K episodes |
| pi0 | Physical Intelligence (Levine et al.) | October 2024 | VLA flow-matching policy with action expert; folds laundry; later open-sourced |
| Gemini Robotics, Gemini Robotics-ER, 1.5 | Google DeepMind | 2025 | VLA built on Gemini 2.0; embodied reasoning variant; cross-embodiment transfer |
| Project GR00T, GR00T N1 | NVIDIA | March 2024 onwards | Humanoid foundation model with dual-system architecture; GR00T N1 first openly released |
| Helix, Helix 02 | Figure AI | 2024 to 2026 | Full upper-body and then full-body humanoid VLA with System 1 / System 2 split |
These systems bring cognitive abilities like instruction following, novel-object generalisation, and long-horizon planning into the robot stack without explicit symbolic engineering. They also import the open problems of large models: hallucinations, distribution shift, dataset bias, and limited interpretability.
Cognitive robotics has always been in conversation with cognitive science.
Embodied cognition. The view associated with George Lakoff, Francisco Varela, Evan Thompson, and Eleanor Rosch holds that cognition is grounded in the body's interactions with the environment. iCub was built around this hypothesis, and Pfeifer and Bongard 2007 is its robotic manifesto.
Predictive processing and active inference. Karl Friston's free energy principle has been picked up by cognitive roboticists, including Paul Verschure with DAC, as a unifying account of perception, action, and learning.
Mirror neurons and imitation. The discovery of mirror neurons in macaque area F5 by Rizzolatti and colleagues shaped a generation of imitation-learning research in robotics.
Joint attention. Developmental psychology of joint attention in infants directly inspired robotic gaze-following and shared-attention systems on Cog, Kismet, Nico, and iCub.
Real-world generalisation is the dominant problem: today's policies still fail on long tails of object shapes, lighting conditions, and clutter that humans handle easily. Bridging the sim-to-real gap and broadening dataset diversity remain active. Long-horizon autonomy, where a robot maintains coherent behaviour over hours or days, is largely unsolved outside curated demos. Sample efficiency lags far behind humans; an infant learns to grasp from far fewer trials than a current VLA. Safety and verification of cognitive behaviour, particularly in human-shared spaces, has no general solution. Interpretability of foundation-model decisions is poor, which complicates debugging and certification. Combining symbolic and connectionist methods, the perennial neuro-symbolic question, has new urgency now that LLMs supply much of the symbolic-style competence end-to-end. Common-sense knowledge in robots remains incomplete despite efforts like KnowRob and large LLMs. Energy efficiency on humanoids, where a battery-powered onboard compute budget meets real-time control, drives architectural choices like NVIDIA's Jetson Thor and Figure's onboard accelerators.
| Venue | Type | Focus |
|---|---|---|
| ICDL (International Conference on Development and Learning) | Conference | Developmental and epigenetic robotics, cognitive development |
| HRI (ACM/IEEE International Conference on Human-Robot Interaction) | Conference | Social cognitive robotics, interaction design |
| ICRA (IEEE International Conference on Robotics and Automation) | Conference | Broad robotics including cognitive themes |
| IROS (IEEE/RSJ International Conference on Intelligent Robots and Systems) | Conference | Intelligent and cognitive robots |
| AAAI Cognitive Robotics Symposium | Symposium | Knowledge representation and reasoning for robots |
| IJCAI Cognitive Robotics Workshop | Workshop | Continuation of the Toronto manifesto tradition |
| RSS (Robotics: Science and Systems) | Conference | Algorithms and learning, increasingly VLA-heavy |
| CoRL (Conference on Robot Learning) | Conference | Robot learning, dominant venue for VLA work since 2017 |
| IEEE Transactions on Cognitive and Developmental Systems (TCDS) | Journal | Cognition and development in natural and artificial systems |
| Cognitive Systems Research | Journal | Multidisciplinary cognitive systems |
| Frontiers in Robotics and AI | Journal | Open-access cognitive and developmental robotics |
IEEE TCDS, formerly IEEE Transactions on Autonomous Mental Development (which published the Asada et al. 2009 survey in its inaugural issue), is the field's flagship journal and is closely tied to ICDL through joint special issues.