Human-Robot Interaction (HRI) is an interdisciplinary field of study dedicated to understanding, designing, and evaluating systems in which humans and robots work, communicate, or share space together. HRI draws on robotics, artificial intelligence, human-computer interaction, cognitive science, psychology, design, anthropology, and ethics, and it sits at the intersection of how machines perceive people and how people perceive machines. The field emerged in the late 1990s as robots began to leave isolated industrial cages and enter laboratories, hospitals, classrooms, and homes, and it has expanded rapidly since the mid-2010s as large language models and vision-language-action models began to give robots the ability to understand natural speech and follow open-ended instructions [1][2].
Whereas human-computer interaction is mostly concerned with screens, keyboards, and other disembodied input devices, HRI must contend with the physical embodiment of the robot, the safety implications of moving mass, the uncertainty of perception in unstructured environments, and the social expectations that people bring to anything that looks or moves like a living agent. These additional dimensions make HRI a distinct discipline rather than a subfield of HCI, and they are reflected in dedicated venues such as the ACM/IEEE International Conference on Human-Robot Interaction (HRI), the IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), the journal IEEE Transactions on Robotics, and the journal International Journal of Social Robotics [3][4].
Human-robot interaction is most commonly defined as the study of the dynamics, communication, and joint behavior of humans and robots that share an environment, a task, or both. The field investigates everything from the millisecond-scale physics of contact between a robotic arm and a human elbow to the months-long social bond a person may form with a humanoid robot companion. A widely cited working definition from the textbook by Bartneck, Belpaeme, Eyssel, Kanda, Keijsers, and Sabanovic frames HRI as "the science of studying people's behavior and attitudes towards robots in relationship to the physical, technological, and interactive features of the robots, with the goal to develop robots that facilitate the emergence of human-robot interactions that are at the same time efficient, effective, intuitive, safe, and accepted" [5].
Researchers usually divide HRI into two broad branches: physical HRI (pHRI), which deals with the mechanics, control, and safety of contact between robots and people, and social HRI (sHRI), which deals with communication, cognition, and the social and emotional dimensions of human-robot interaction. The two branches frequently overlap, particularly in domains such as rehabilitation robotics, where a robot must be both biomechanically safe and emotionally encouraging, and the boundary between them is more methodological than substantive.
| Aspect | Physical HRI (pHRI) | Social HRI (sHRI) |
|---|---|---|
| Primary concern | Safe and effective physical contact and proximity | Communication, perception, and social acceptance |
| Core disciplines | Control theory, mechanical design, biomechanics, safety standards | Cognitive science, psychology, communication, design |
| Key variables | Force, torque, impedance, stiffness, velocity, separation distance | Gaze, gesture, prosody, language, proxemics, trust |
| Typical platforms | Cobots, exoskeletons, surgical robots, rehabilitation devices | Social robots, companion robots, museum guides, kiosks |
| Representative venues | IEEE Transactions on Robotics, ICRA, IROS, IJRR | ACM/IEEE HRI, RO-MAN, International Journal of Social Robotics |
| Example metrics | Maximum contact force, settling time, energy injected, ISO 15066 thresholds | Likability, trust, task success, conversational fluency |
The phrase "human-robot interaction" appeared in the engineering literature as early as the 1980s, but the field consolidated around three converging developments in the late 1990s and early 2000s. The first was the maturation of safe, lightweight robot arms, exemplified by the DLR Light-Weight Robot III at the German Aerospace Center, which made it conceivable to operate manipulators in close quarters with people. The second was the appearance of expressive humanoid platforms such as MIT's Kismet, designed by Cynthia Breazeal and built between 1997 and 2000, which demonstrated that even a simple robot face could elicit rich social responses from human partners [6]. The third was the rise of consumer service robots, beginning with iRobot's Roomba in 2002, which forced researchers to take seriously the question of how non-experts would understand, trust, and live with autonomous machines.
The first ACM/IEEE International Conference on Human-Robot Interaction was held in Salt Lake City in March 2006 and was co-sponsored by the IEEE Robotics and Automation Society, the ACM Special Interest Group on Computer-Human Interaction, and the ACM Special Interest Group on Artificial Intelligence. The conference grew steadily, and by the mid-2020s it received several hundred full-paper submissions per year and accepted roughly a quarter of them. The 20th annual edition of HRI took place in Melbourne, Australia in 2025 and accepted around 100 full papers along with hundreds of late-breaking reports, demonstrations, and workshop contributions [3].
A parallel community grew up around the IEEE RO-MAN conference, which was first held in Tokyo in 1992 and originally focused on telepresence and command-and-control interfaces for industrial robots. RO-MAN broadened its scope through the 2000s to include social robotics, affective computing, and assistive technology, and in 2025 it convened in Eindhoven, Netherlands as the 34th edition of the meeting [4].
Physical HRI studies the mechanics, control, and safety of robots that share workspace or make direct contact with human users. It encompasses collaborative robots on factory floors, exoskeletons that augment human strength, surgical assistants that move through delicate tissue, and rehabilitation devices that guide a patient's limb through a therapeutic motion.
The safety regime for industrial pHRI is anchored by ISO 10218-1 and ISO 10218-2, which set the broad safety requirements for industrial robots, and by the technical specification ISO/TS 15066, which provides detailed guidance for collaborative operation. ISO/TS 15066 specifies four collaborative operating modes (safety-rated monitored stop, hand guiding, speed and separation monitoring, and power and force limiting) and tabulates allowable contact forces and pressures for 29 zones of the human body, derived from biomechanical pain-onset studies. These thresholds are the basis for the certification of cobot installations from manufacturers such as Universal Robots, FANUC, KUKA, and ABB.
Classical position-controlled industrial robots are stiff: they will drive a planned trajectory regardless of obstacles, and a collision with a person can deliver kilojoules of energy in milliseconds. Safe pHRI therefore relies on compliant control schemes that let the robot yield to unexpected forces. The two foundational frameworks are impedance control, formalized by Neville Hogan in 1985, and admittance control, which can be viewed as its dual.
Impedance control treats the robot as a virtual spring-damper system attached to a desired trajectory. The controller measures the deviation of the actual end-effector pose from a reference and commands a force proportional to that deviation. When a person pushes against the robot, the apparent stiffness and damping of the joint determine how much it yields and how quickly it returns. Admittance control inverts the relationship: the controller measures external forces with a force-torque sensor and commands a velocity proportional to the residual force, making the robot easy to push when its admittance is high. Both schemes underpin contemporary cobots and surgical assistants, and modern variable-impedance approaches modulate the apparent stiffness in real time based on task phase, predicted contact, and inferred human intent [7].
Compliant control is most effective when paired with compliant hardware. Series elastic actuators, introduced by Gill Pratt and Matthew Williamson in 1995, place a spring in series between the motor output and the joint, effectively low-pass filtering shock loads and providing a high-fidelity force measurement through the spring's deflection. Variable stiffness actuators extend this idea by allowing the spring rate itself to be tuned, trading off speed against safety. Both approaches appear in machines as diverse as the Sawyer cobot from Rethink Robotics, the Baxter dual-arm research platform, and the legs of the Atlas robot from Boston Dynamics.
pHRI extends beyond stationary manipulators to wearable systems. Lower-limb exoskeletons such as Ekso Bionics' EksoNR and ReWalk Robotics' devices help people with spinal cord injury walk again, and upper-limb exoskeletons assist stroke survivors with arm rehabilitation. These systems require the robot to share a kinematic chain with the user's body, and a misalignment between robot and human joint axes can cause discomfort or injury. Cooperative pHRI control of exoskeletons typically combines force sensing at the cuffs with electromyographic estimation of user intent, so the device acts as an extension of the wearer rather than as a separate agent.
Social HRI studies the perception, communication, and emotional dimensions of human-robot interaction. Where physical HRI worries about Newtons and millimeters, social HRI worries about gaze, gesture, voice, attention, and trust. The branch grew out of work by Cynthia Breazeal at MIT in the late 1990s and has since expanded into a sprawling community spanning education, healthcare, customer service, and entertainment.
A handful of concepts recur throughout social HRI research and provide the vocabulary for most empirical studies in the field.
Anthropomorphism is the human tendency to attribute human-like traits, intentions, and emotions to non-human agents. Robots designed with eyes, voices, or human-like motion patterns reliably trigger anthropomorphic perception in their users, which in turn shapes expectations about competence, trustworthiness, and moral status.
The uncanny valley, proposed by Japanese roboticist Masahiro Mori in 1970, is the empirical observation that human affinity toward robots increases with their human-likeness up to a certain point, then drops sharply when the resemblance is close but imperfect. The dip is most pronounced for robots that are nearly human in appearance but betray their non-human nature through subtle movement glitches, plastic skin texture, or expressionless eyes. The valley is one of the most studied and contested ideas in HRI, and recent work has shown that its shape varies with the perceived mental capacity of the agent in addition to its physical appearance [8].
Proxemics, originally developed by anthropologist Edward T. Hall to describe interpersonal distance norms among humans, has been adapted to HRI to characterize how close a robot may approach a person before evoking discomfort. Studies have shown that approach distance preferences depend on the robot's height, speed, gaze direction, and prior interaction history, and on the cultural background, age, and personality of the human partner.
Trust calibration is the process by which a user develops an internal model of when a robot's outputs can and cannot be relied on. Both over-trust, which leads to misuse and accidents, and under-trust, which leads to disuse, are problematic. Trust calibration is especially salient in safety-critical settings such as automated vehicles, where a driver's willingness to override the autonomy depends on a well-calibrated mental model of the system's competence boundary.
Mental models describe the user's representation of how the robot perceives, decides, and acts. A mismatch between the user's mental model and the robot's actual behavior is a leading cause of errors in HRI. Designers therefore work to make robots legible: their motion, gaze, and gesture should communicate their intent in advance so the user can predict what they will do next.
Joint attention is the coordinated focus of two agents on the same object or event. In humans, joint attention is established through gaze following, pointing gestures, and verbal references, and it is foundational to language acquisition and cooperative task performance. Social robots that participate in joint attention with their users (by directing gaze toward referenced objects, following human pointing, or producing referential gestures of their own) are perceived as more competent and natural partners [9].
Turn-taking is the orderly exchange of speaking and acting between conversation partners. Human conversation is governed by tightly timed signals (gaze aversion, intonation contours, gestural beats) that mark turn boundaries, and replicating these timings is one of the open challenges of conversational HRI.
Humans communicate through speech, prosody, facial expression, gaze, gesture, posture, and proxemics, often simultaneously. Social robots therefore aim for multimodal interaction: speech recognition and synthesis combined with vision-based gaze and gesture tracking, with body and head movements that act as nonverbal signals of attention, agreement, or uncertainty. Research has shown that multimodal output (a robot that gestures while it speaks) significantly improves user comprehension and engagement compared to speech alone, and that the precise timing between speech and gesture matters as much as the content [10].
A handful of robot platforms have come to dominate empirical work in social HRI because they provide a standard, reproducible substrate for experiments.
| Platform | Manufacturer | First released | Form factor | Typical role |
|---|---|---|---|---|
| Kismet | MIT (Breazeal) | 2000 | Disembodied head with expressive face | Foundational study of affective interaction |
| ASIMO | Honda | 2000 | Bipedal humanoid, 130 cm | Public demonstration, motion research |
| NAO | Aldebaran (later SoftBank) | 2008 | Bipedal humanoid, 58 cm | Education, autism therapy, research |
| PARO | AIST (Shibata) | 2004 | Robotic harp seal | Dementia and elder care |
| Baxter | Rethink Robotics | 2012 | Dual-arm humanoid torso | Collaborative manufacturing, HRI research |
| Jibo | Jibo Inc. (Breazeal) | 2017 | Stationary social companion | Home companionship, family use |
| Pepper | SoftBank Robotics | 2014 | Wheeled humanoid, 120 cm | Retail, hospitality, eldercare |
| Sophia | Hanson Robotics | 2016 | Realistic humanoid head and torso | Public engagement, dialogue research |
| Misty II | Misty Robotics | 2019 | Small mobile platform | Developer research |
| Spot | Boston Dynamics | 2019 | Quadruped | Industrial inspection, research |
The NAO platform, developed by the French company Aldebaran Robotics in 2006 and shipped commercially from 2008 onward, has become the workhorse of academic HRI laboratories thanks to its programmable behavior, modest size, and repertoire of nineteen degrees of freedom. Its larger sibling Pepper, introduced by SoftBank Robotics in 2014, is a 120-centimeter wheeled humanoid that has been deployed in retail stores, hotels, and Japanese eldercare facilities and is one of the most studied platforms in real-world social HRI deployments.
PARO, a robotic harp seal developed by Takanori Shibata at Japan's National Institute of Advanced Industrial Science and Technology and first released in 2004, is the canonical example of a socially assistive robot. Its tactile sensors, microphone, and small repertoire of seal-like sounds and motions allow it to act as a calming presence for people with dementia, and randomized studies have reported reductions in agitation and the use of psychotropic medication when PARO is part of the care plan [11].
HRI research is organized as much around application domains as around theoretical concepts, because the requirements for each domain are distinct. Five application areas dominate contemporary research and deployment.
Collaborative robots, or cobots, are the largest commercial expression of pHRI. Universal Robots, founded in Odense, Denmark in 2005 and the first company to ship a true cobot in 2008, has by the mid-2020s installed more than 100,000 arms in factories worldwide. Cobots typically perform pick-and-place, light assembly, machine tending, polishing, and inspection tasks alongside human operators, and they have transformed the economics of small-batch manufacturing by lowering the capital and integration cost of automation. Rethink Robotics, founded by Rodney Brooks in 2008, popularized the dual-arm Baxter platform and its single-arm successor Sawyer; although Rethink ceased operations in 2018, both robots remain widely used in academic HRI labs.
The global aging trend, particularly in Japan, has driven sustained investment in healthcare and eldercare robots. Pepper has been deployed in hundreds of Japanese nursing homes to lead group exercise, host conversation circles, and play simple games. PARO is used in dementia care across Japan, the United States, and Europe. Surgical robots such as Intuitive Surgical's da Vinci platform deliver dexterous teleoperation for minimally invasive procedures, although their HRI is mediated through a master console rather than direct contact. Rehabilitation robots such as Kinova's Jaco arm and the Lokomat gait trainer combine pHRI control with motivating social feedback, blurring the boundary between physical and social HRI.
Social robots have been studied as tutors and learning companions for school-age children, particularly in language learning and STEM enrichment. NAO and Pepper have been programmed to deliver second-language vocabulary practice, mathematics tutoring, and storytelling sessions, with empirical results suggesting modest but consistent learning gains compared to screen-based equivalents. Robot-assisted intervention for children on the autism spectrum is a particularly active subfield, building on early work by Brian Scassellati at Yale and others; the predictability and reduced social complexity of robot partners can make them effective scaffolds for practicing communication skills.
The handover between an automated vehicle and its human driver is a quintessential HRI problem. In Society of Automotive Engineers Level 3 automation, the vehicle handles the dynamic driving task in defined conditions but the driver must resume control within a small number of seconds when the system reaches its operational boundary. Empirical work has shown that take-over request lead times shorter than about six seconds degrade handover quality, and that calibrated trust in the automation is essential for safe behavior in both nominal and degraded conditions [12]. Modern HRI research extends to multimodal warnings, gaze-tracking driver-monitoring systems, and the legibility of automated vehicle behavior to other road users.
Robots in airports, hotels, and shopping malls increasingly perform reception, wayfinding, and information lookup, often through voice and tablet-based interaction. Field studies have shown that humanoids such as Pepper elicit substantially more and longer interactions than equivalent stationary kiosks, although novelty effects can be large. Search and rescue robots developed by groups such as the Center for Robot-Assisted Search and Rescue at Texas A&M (founded by Robin Murphy after the 2001 World Trade Center attacks) demonstrate a different style of HRI, in which a remote operator must form an accurate mental model of the robot's situation through a constrained sensor feed.
The rise of large language models, vision-language-action models, and other foundation models has redrawn the landscape of HRI in the 2020s. These systems endow robots with two capabilities that were previously out of reach: open-ended natural language dialogue grounded in the physical scene, and the ability to follow free-form instructions without task-specific programming.
The first widely publicized integration of a modern large language model into a commercial robot was Boston Dynamics' coupling of OpenAI's GPT-4 with the Spot quadruped in late 2023. The integration combined GPT-4 with visual question-answering models and speech-to-text software, allowing Spot to act as a conversational tour guide that could answer free-form questions about its surroundings while pursuing its programmed inspection tasks. Boston Dynamics demonstrated Spot adopting personas (a 1920s archaeologist, a Shakespearean traveler, an English butler) and reported emergent behaviors that had not been explicitly coded, such as Spot proposing to walk toward a help desk to answer a question it could not resolve on its own [13].
Figure AI's Helix model, introduced in February 2025, takes the idea further by integrating speech recognition, language understanding, and learned visuomotor control in a single architecture. Helix uses a two-system design: an internet-pretrained vision-language model running at 7-9 Hz handles scene understanding and language comprehension while a fast reactive policy running at 200 Hz translates the slow system's latent representations into continuous joint commands. The on-board microphones and speakers support natural conversation, and the underlying audio stack is rebuilt to handle full-duplex dialogue rather than the click-to-talk pattern of earlier systems.
Apptronik's Apollo, a 1.7-meter humanoid first shown in 2023 and developed in partnership with Google DeepMind from 2024 onward, exposes a conversational interface backed by Google's robotics foundation models. The Google partnership lets Apollo inherit improvements to Google DeepMind's Gemini Robotics models without requiring Apptronik to train its own large multimodal systems from scratch. Figure AI and Apptronik are the two most visible examples of a broader pattern in which humanoid robot vendors couple bespoke hardware with rapidly improving language and perception models.
The deeper change is the emergence of vision-language-action (VLA) models, which bypass the traditional planner-and-controller stack in favor of an end-to-end policy that maps an image and a natural-language instruction directly to robot actions. Google DeepMind's RT-2, announced in July 2023, was the first widely publicized VLA: it co-fine-tuned a vision-language model on internet image-text pairs and on robot trajectories from the RT-1 dataset, and it expressed actions as discrete text tokens that fit alongside language tokens in the model vocabulary. RT-2 generalized to novel objects, unseen instructions, and rudimentary chain-of-thought reasoning over physical tasks, and it became the template for a wave of subsequent models [14].
Physical Intelligence's pi0 (introduced in October 2024) and pi0.5 (April 2025) extend the VLA pattern with a flow-matching architecture and large-scale cross-embodiment training data. The pi0 base model was pretrained on data from seven different robot configurations covering 68 tasks, and pi0.5 added co-training on web data, verbal instructions, and high-level subtask predictions to enable open-world generalization to entirely new homes and tasks. Physical Intelligence open-sourced the pi0 weights and inference code in early 2025 through the openpi repository on GitHub, lowering the barrier to entry for smaller laboratories.
OpenVLA, released by Stanford and collaborators in June 2024, is a 7-billion-parameter open-source VLA trained on the Open X-Embodiment dataset, a collaboration of 21 institutions that aggregated more than one million robot trajectories from 22 distinct embodiments. Google DeepMind's Gemini Robotics model, announced in 2025, extends the Gemini multimodal foundation model into the physical world and powers dexterous tasks such as folding origami and dealing playing cards.
A related thread of research uses natural language as the channel for teaching new skills. Language-conditioned imitation learning treats verbal annotations attached to demonstrations as additional training signal, allowing the resulting policy to be steered at runtime by free-form instructions. The approach has the practical advantage that a non-expert user can teach a robot a new task by demonstrating it once and naming the skill, without writing code or designing a reward function. Combined with VLA backbones, language-conditioned imitation lets robots inherit the broad world knowledge of pretrained language models while being grounded in specific physical capabilities through a small number of demonstrations.
The HRI community is small enough that a handful of investigators and their laboratories have shaped the field's intellectual agenda. The list below is not exhaustive but identifies several pioneers whose work appears repeatedly across the citations of HRI papers.
| Researcher | Affiliation | Notable contributions |
|---|---|---|
| Cynthia Breazeal | MIT Media Lab | Kismet, Leonardo, Jibo; foundational work on affective and social robotics |
| Brian Scassellati | Yale University | Cog (with Rodney Brooks), socially assistive robots for autism intervention |
| Bilge Mutlu | University of Wisconsin-Madison | Design-led HRI methodology, gaze and gesture in social robots |
| Selma Sabanovic | Indiana University Bloomington | R-House lab, social and cultural shaping of robots, organizational HRI |
| Andrea Thomaz | UT Austin / Diligent Robotics | Socially guided machine learning, Moxi healthcare robot |
| Manuela Veloso | Carnegie Mellon University | CoBot mobile service robots, RoboCup, symbiotic autonomy |
| Robin Murphy | Texas A&M University | Disaster robotics and search-and-rescue HRI |
| Maja Mataric | University of Southern California | Coined "socially assistive robotics," stroke and autism applications |
| Holly Yanco | UMass Lowell | HRI for assistive and field robotics, interface design |
| Takayuki Kanda | Kyoto University / ATR | Long-term studies of social robots in Japanese public spaces |
Cynthia Breazeal pioneered the modern field of social robotics with the Kismet head and went on to direct MIT's Personal Robots Group, where she developed the Leonardo robot and later founded Jibo, a startup that produced one of the first commercially shipped home companion robots. Brian Scassellati earned his PhD at MIT under Rodney Brooks while contributing to the Cog humanoid project, then founded the Social Robotics Laboratory at Yale and has produced influential work on robot-assisted intervention for children on the autism spectrum. Bilge Mutlu at the University of Wisconsin-Madison brings a design-led approach to HRI, blending engineering with rigorous human-subjects experimentation. Selma Sabanovic at Indiana University Bloomington founded the R-House HRI lab and has championed a science-and-technology-studies perspective on social robotics. Manuela Veloso developed the CoBot service robots that operated for years inside CMU's Gates-Hillman Center and pioneered the idea of "symbiotic autonomy," in which the robot enlists human help to overcome its limitations.
HRI maintains a healthy ecosystem of conferences and journals that span its physical and social branches.
| Venue | Type | First held / founded | Focus |
|---|---|---|---|
| ACM/IEEE HRI | Conference | 2006 | Flagship interdisciplinary HRI venue |
| IEEE RO-MAN | Conference | 1992 | Robot-human interactive communication |
| IEEE ICRA | Conference | 1984 | Broad robotics, including pHRI |
| IEEE/RSJ IROS | Conference | 1988 | Intelligent robots and systems |
| Robotics: Science and Systems (RSS) | Conference | 2005 | Single-track research conference |
| IEEE Transactions on Robotics (T-RO) | Journal | 2004 | Premier robotics journal |
| International Journal of Robotics Research (IJRR) | Journal | 1982 | First scholarly robotics journal |
| ACM Transactions on Human-Robot Interaction (THRI) | Journal | 2012 | Open-access HRI journal |
| International Journal of Social Robotics | Journal | 2009 | Springer journal for social robotics |
| Autonomous Robots | Journal | 1994 | Methods and applications |
The ACM/IEEE HRI conference is the flagship venue and combines a competitive full-paper track with late-breaking reports, demonstrations, video sessions, and workshops. RO-MAN has historically had a stronger European and Asian presence and a slightly more applied flavor. ICRA, IROS, and RSS are general robotics conferences but routinely host HRI tracks and workshops. ACM Transactions on Human-Robot Interaction, founded in 2012, provides an open-access journal home for the field, while International Journal of Social Robotics is the principal Springer outlet for social-HRI work. IEEE Transactions on Robotics and the International Journal of Robotics Research remain the prestige journal venues for technical HRI contributions, particularly on the physical side of the field.
HRI research has always carried a strong ethical thread because robots that share human spaces raise questions about safety, privacy, autonomy, employment, and the moral status of artificial agents. Early work on robot ethics built on Isaac Asimov's fictional Three Laws and on the European EURON Roboethics Roadmap (2006), but the field has since matured into a substantial subdiscipline addressing the use of social robots with vulnerable populations, the deception inherent in anthropomorphic design, the labor implications of cobots and humanoids, and the data-protection requirements of always-on home robots. Sherry Turkle's critiques of social robot deployment in eldercare and education, articulated in Alone Together (2011), continue to shape the debate over whether social robots can ever provide genuine companionship or only a simulation of it.
The arrival of large foundation models has added new ethical pressure points. Robots that hallucinate facts in conversation, inherit biases from internet training data, or behave unpredictably because of opaque model internals raise different concerns than the comparatively narrow industrial robots of the previous generation. The HRI community has responded with a growing literature on transparency, explainability, and robot legibility, and standards bodies have begun to draft guidance on the responsible deployment of AI-driven robots in public spaces.
Despite rapid progress, several long-standing problems remain open. Long-horizon manipulation in unstructured environments, the persistent gap between laboratory studies and real-world deployment, the brittleness of social interaction over long time spans (the so-called novelty-effect problem), and the difficulty of evaluating HRI rigorously across cultures all continue to attract substantial research effort. The integration of foundation models has solved some problems while creating new ones: VLA-driven robots can now follow far broader instructions than five years ago, but they can also fail in surprising ways that are hard to anticipate or explain.
A second cluster of open problems concerns measurement. The HRI community has produced numerous validated questionnaires (the Godspeed series for perceived anthropomorphism, animacy, likeability, intelligence, and safety; the Robotic Social Attributes Scale; the Negative Attitudes Toward Robots Scale) but disagreement persists about which constructs matter most.
A third cluster concerns scaling. Most published HRI studies involve small numbers of participants in single-session laboratory encounters, while the most interesting research questions about long-term human-robot relationships require deployments lasting weeks or months.