Robot navigation

Computer Vision Robotics

23 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v3 · 4,559 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Robot navigation is the ability of a mobile robot to determine its own position in its environment, plan a path toward a goal, and execute that path while avoiding obstacles. It is one of the central problems in robotics and autonomous systems, with applications ranging from warehouse logistics and household vacuum cleaners to autonomous driving and planetary exploration. A complete navigation system typically combines localization, mapping, path planning, and motion control into a pipeline that lets the robot move safely and efficiently through the world.

Overview

Robot navigation answers three questions that any mobile agent must solve: "Where am I?", "Where do I want to go?", and "How do I get there?" The first question is addressed by localization, the second by goal specification or mission planning, and the third by path planning and trajectory execution. In practice these problems are tightly coupled. A robot that cannot localize itself accurately will plan poor paths; a robot that cannot plan around obstacles will collide, no matter how good its localization is.

Modern navigation systems rely on a layered architecture. A perception layer processes raw sensor data (cameras, LiDAR, IMU, wheel encoders) into a representation of the environment. A planning layer uses that representation to compute a collision-free route. A control layer converts the planned route into motor commands. Between these layers, a localization module continuously estimates the robot's pose (its position and orientation) within a map.

The choice of sensor hardware heavily influences the capabilities and limitations of a navigation system.

Sensor	Principle	Typical range	Strengths	Weaknesses
LiDAR	Laser time-of-flight	10 m to 200+ m	High accuracy, works in darkness	Expensive, sparse in vertical resolution
RGB Camera	Passive light capture	Varies with lens	Low cost, rich texture and color	Sensitive to lighting, no direct depth
RGB-D / Depth Camera	Structured light or ToF	0.3 m to 10 m	Direct depth at low cost	Limited range, poor in sunlight
IMU (Inertial Measurement Unit)	Accelerometer + gyroscope	N/A (ego-motion)	High update rate, no external dependency	Drift accumulates over time
Wheel Encoders	Rotary counting	N/A (ego-motion)	Simple, inexpensive	Slip on rough or wet surfaces
GPS / GNSS	Satellite trilateration	Global	Absolute global position	Poor indoors, meter-level accuracy
Ultrasonic	Sound time-of-flight	0.02 m to 5 m	Very low cost	Narrow beam, low resolution
Radar	Radio wave reflection	1 m to 200+ m	Works in rain, fog, dust	Lower spatial resolution than LiDAR

Because every sensor has blind spots, most robots use sensor fusion, combining data from multiple sensors to compensate for individual weaknesses. Common fusion combinations include LiDAR with IMU, camera with IMU (visual-inertial odometry), and LiDAR with camera for both geometric and semantic understanding. The Extended Kalman Filter (EKF) is frequently used to merge these heterogeneous measurements into a single, consistent state estimate.^[18]

Simultaneous Localization and Mapping (SLAM)

SLAM is the computational problem of constructing a map of an unknown environment while simultaneously tracking the robot's position within that map. It is sometimes described as a chicken-and-egg problem: accurate localization requires a good map, but building a good map requires accurate localization. SLAM algorithms solve both problems jointly.^[2]

History

The probabilistic formulation of SLAM originated at the 1986 IEEE Robotics and Automation Conference in San Francisco.^[2] Randall Smith and Peter Cheeseman showed that the correlations between landmark position estimates and the robot pose estimate grow without bound as the robot explores, and proposed using the Extended Kalman Filter to maintain a joint state vector of the robot pose and all landmark positions.^[1] Hugh Durrant-Whyte, John Leonard, and others developed this idea through the late 1980s and 1990s. The acronym "SLAM" was coined by Durrant-Whyte in 1995.^[2]

Major subsequent milestones include FastSLAM (Montemerlo, Thrun, Koller, and Wegbreit, 2002), which used particle filters to factorize the SLAM posterior into per-particle maps,^[3] and GraphSLAM (Thrun and Montemerlo, 2006), which recast SLAM as a sparse graph optimization problem.

Mathematical formulation

SLAM can be stated as follows. At each time step, the robot executes a control command u and receives a sensor observation z. The robot's pose at time t is x_t and the map consists of a set of landmark positions m. The goal is to estimate the posterior probability P(x_{1:t}, m | z_{1:t}, u_{1:t}), the joint distribution over the trajectory and map given all controls and observations.^[2]^[18] Different SLAM algorithms approximate this posterior in different ways.

Types of SLAM

EKF-SLAM

The earliest SLAM solutions represented the robot pose and all landmark locations in a single augmented state vector, updated at each step by an Extended Kalman Filter.^[1] EKF-SLAM is simple to implement and produces consistent estimates when the number of landmarks is small. Its main drawback is computational cost: the state covariance matrix grows quadratically with the number of landmarks, making it impractical for large environments.^[2]

Particle filter SLAM (FastSLAM)

FastSLAM, introduced by Michael Montemerlo and Sebastian Thrun in 2002, uses a Rao-Blackwellized particle filter. Each particle carries a sampled robot trajectory and a set of independent landmark filters. This factorization breaks the quadratic scaling of EKF-SLAM, making the algorithm O(M log N) per update where M is the number of particles and N is the number of landmarks.^[3]

Graph-based SLAM

Graph-based SLAM (also called GraphSLAM or pose-graph SLAM) represents the problem as a factor graph. Nodes correspond to robot poses at different time steps (and optionally landmark positions). Edges encode constraints from odometry measurements between consecutive poses and from observations of landmarks. The full trajectory and map are recovered by solving a nonlinear least-squares optimization problem over the graph. This is the dominant SLAM paradigm as of 2025. Commonly used optimization libraries include g2o (developed by Rainer Kummerle et al., 2011)^[16] and GTSAM (developed by Frank Dellaert at Georgia Tech, 2012).

Visual SLAM

Visual SLAM (vSLAM) uses cameras as the primary sensor. It extracts visual features (corners, edges, or learned descriptors) from images and tracks them across frames to estimate motion and build a 3D map. Visual SLAM systems are popular because cameras are inexpensive and provide rich appearance information.

Notable visual SLAM systems include:

MonoSLAM (Davison, 2003): one of the first real-time monocular SLAM systems.
PTAM (Klein and Murray, 2007): introduced the split between tracking and mapping threads.
ORB-SLAM (Mur-Artal, Montiel, and Tardos, 2015): a feature-based system using ORB descriptors with loop closing and relocalization.^[4]
ORB-SLAM3 (Campos, Elvira, Rodriguez, Montiel, and Tardos, 2021): extends ORB-SLAM2 with visual-inertial and multi-map support, covering monocular, stereo, and RGB-D cameras with both pinhole and fisheye lens models. Its stereo-inertial mode achieves centimeter-level accuracy on standard benchmarks.^[5]
LSD-SLAM (Engel, Schops, and Cremers, 2014): a direct (featureless) method that operates on pixel intensities.
DSO (Engel, Koltun, and Cremers, 2017): a direct sparse odometry method.

LiDAR SLAM

LiDAR-based SLAM uses laser scanners to produce accurate range measurements. In 2D, algorithms like GMapping (Grisetti, Stachniss, and Burgard, 2007) use Rao-Blackwellized particle filters on occupancy grids. In 3D, point cloud registration methods such as ICP (Iterative Closest Point) and NDT (Normal Distributions Transform) align successive scans. Google Cartographer (Hess et al., 2016) is a widely used open-source system that performs real-time 2D and 3D SLAM using scan matching and loop closure on a pose graph.^[6]

LiDAR SLAM produces highly accurate geometric maps and works in complete darkness, but the sensor cost is higher than cameras and the resulting maps lack the color and texture information that visual SLAM provides.

Semantic SLAM

Semantic SLAM adds object detection and scene understanding on top of geometric SLAM. By recognizing objects and assigning semantic labels to map elements, the robot can build maps that encode not just geometry but also the meaning of different regions (e.g., "door", "table", "road"). Advances in deep learning and computer vision have accelerated semantic SLAM research, with systems leveraging convolutional neural networks and, more recently, vision-language models for richer scene representations.

Multi-sensor fusion SLAM

State-of-the-art SLAM systems increasingly fuse data from multiple sensors. Common configurations include LiDAR-IMU SLAM, visual-inertial SLAM (camera + IMU), LiDAR-visual SLAM, and three-way LiDAR-IMU-visual SLAM. Fusion can be loosely coupled (each sensor runs its own front-end and results are merged) or tightly coupled (raw measurements are combined in a single optimization). RTAB-Map (Labbe and Michaud, 2019) is a notable open-source system that can fuse LiDAR and camera data with loop closure detection and multi-session mapping.

SLAM implementations

System	Sensor input	Approach	Key feature
ORB-SLAM3	Mono, stereo, RGB-D camera + optional IMU	Feature-based visual(-inertial) SLAM	Multi-map, loop closure, relocalization^[5]
LSD-SLAM	Monocular camera	Direct (intensity-based) visual SLAM	Semi-dense depth maps
Google Cartographer	2D/3D LiDAR + IMU	Scan matching + pose graph optimization	Real-time 2D and 3D mapping^[6]
GMapping	2D LiDAR	Rao-Blackwellized particle filter	Occupancy grid maps
RTAB-Map	LiDAR, stereo, RGB-D	Graph-based with appearance-based loop closure	Multi-session, multi-sensor
Hector SLAM	2D LiDAR	Scan matching (no odometry required)	Fast indoor 2D mapping

Path planning

Path planning is the problem of computing a collision-free route from a start position to a goal position, given a map of the environment. Planning algorithms are typically divided into global planners, which compute a full path before execution begins, and local planners (controllers), which adjust the path in real time to respond to new obstacles.

Configuration space

Planning is often formulated in configuration space (C-space), where each point represents a complete specification of the robot's degrees of freedom. The subset of C-space that is free of collisions is called the free space. A path is a continuous curve through free space connecting the start configuration to the goal configuration.

Graph search algorithms

Dijkstra's algorithm

Dijkstra's algorithm (Edsger Dijkstra, 1956) finds the shortest path in a weighted graph by expanding nodes in order of increasing cost from the start. It is complete and optimal but explores in all directions, making it slow for large search spaces.

A* algorithm

A* (Hart, Nilsson, and Raphael, 1968) improves on Dijkstra's algorithm by adding a heuristic function h(n) that estimates the cost from node n to the goal. A* expands nodes in order of f(n) = g(n) + h(n), where g(n) is the cost from the start. When h(n) is admissible (never overestimates), A* is both complete and optimal.^[7] It is one of the most widely used planners in robotics and is the basis for many grid-based navigation systems.

D* and D* Lite

D* (Dynamic A*), developed by Anthony Stentz at Carnegie Mellon University in 1994, is designed for environments that change during execution. When the robot discovers new obstacles, D* repairs the existing solution incrementally rather than replanning from scratch.^[8]

D* Lite (Koenig and Likhachev, 2002) achieves the same incremental replanning behavior with a simpler algorithm built on top of Lifelong Planning A* (LPA*). It maintains a priority queue of inconsistent nodes and updates only the portion of the search space affected by map changes.^[9] D* Lite is the preferred variant in most modern implementations, as it is easier to implement and analyze while producing identical behavior to Focused D*.^[9]

Theta*

Theta* (Nash, Daniel, Koenig, and Felner, 2007) extends A* by allowing any-angle paths. At each node expansion, it checks whether a straight-line path to the parent's parent is collision-free, producing smoother paths than grid-constrained A*.

Sampling-based algorithms

Probabilistic Roadmap (PRM)

PRM (Kavraki, Svestka, Latombe, and Overmars, 1996) operates in two phases. In the construction phase, it samples random configurations in free space and connects nearby samples with straight-line paths, building a roadmap graph. In the query phase, it connects the start and goal to the roadmap and finds the shortest path using Dijkstra's algorithm.^[12] PRM is probabilistically complete: as the number of samples grows, the probability of finding a solution (if one exists) approaches 1.^[12]

Rapidly-exploring Random Tree (RRT)

RRT (LaValle, 1998) incrementally builds a tree rooted at the start configuration by repeatedly sampling random points and extending the nearest tree node toward them. RRT is efficient at exploring high-dimensional configuration spaces and can handle non-holonomic constraints.^[10] Its main weakness is that it does not optimize path quality.

RRT*

RRT* (Karaman and Frazzoli, 2011) is an asymptotically optimal variant of RRT. After adding a new node, it searches for nearby nodes that could serve as better parents and rewires the tree to reduce total path cost. Given enough time, RRT* converges to the optimal path.^[11]

Comparison of path planning algorithms

Algorithm	Type	Completeness	Optimality	Handles dynamic environments	Typical use case
Dijkstra's	Graph search	Complete	Optimal	No (static)	Small graphs, baseline
A*	Graph search	Complete	Optimal (admissible h)	No (static)	Grid-based 2D/3D planning^[7]
D* Lite	Graph search	Complete	Optimal	Yes (incremental)	Unknown or changing environments^[9]
Theta*	Graph search	Complete	Near-optimal	No (static)	Any-angle grid planning
PRM	Sampling-based	Probabilistically complete	Asymptotically optimal (PRM*)	No (static)	Multi-query, high-dimensional^[12]
RRT	Sampling-based	Probabilistically complete	No	No (static)	Single-query, high-dimensional^[10]
RRT*	Sampling-based	Probabilistically complete	Asymptotically optimal	No (static)	High-dimensional, quality paths^[11]

Obstacle avoidance

Obstacle avoidance is the reactive layer of navigation that prevents collisions during path execution. Even with a planned global path, the robot needs real-time collision avoidance to handle dynamic obstacles (people, other robots) and map inaccuracies.

Artificial Potential Field (APF)

The Artificial Potential Field method (Khatib, 1986) models the goal as an attractive potential and obstacles as repulsive potentials. The robot follows the negative gradient of the combined field.^[13] APF is computationally simple and generates smooth trajectories. Its main limitation is susceptibility to local minima, where attractive and repulsive forces cancel out and the robot gets stuck. Various escape strategies have been proposed, including random walks, virtual obstacles, and harmonic potential functions.

Vector Field Histogram (VFH)

VFH (Borenstein and Koren, 1991) builds a polar histogram of obstacle density around the robot using range sensor data. It identifies openings (valleys) in the histogram that are wide enough for the robot to pass through, and selects the opening closest to the goal direction.^[15] VFH+ and VFH* add trajectory simulation and cost functions to handle wider robots and more complex environments.

Dynamic Window Approach (DWA)

DWA (Fox, Burgard, and Thrun, 1997) works in velocity space. It considers only velocities reachable within one control cycle (the "dynamic window") and simulates short trajectories for each candidate velocity. An objective function scores each trajectory based on progress toward the goal, clearance from obstacles, and velocity.^[14] DWA is widely used in the ROS Navigation Stack.

Model Predictive Control (MPC)

MPC formulates trajectory tracking as an optimization problem over a receding time horizon. At each step, it solves for the optimal control sequence that minimizes a cost function (tracking error, control effort, obstacle proximity) subject to dynamics and constraint equations, then executes only the first control command before re-optimizing. MPC can handle complex dynamics and constraints but requires more computation than reactive methods.

Bug algorithms

Bug algorithms (Lumelsky and Stepanov, 1987) are simple reactive strategies. In Bug1, the robot moves toward the goal until it hits an obstacle, circumnavigates the entire obstacle, and then departs from the point closest to the goal. Bug2 follows a straight line to the goal and, upon hitting an obstacle, follows the boundary until it can rejoin the original line. These algorithms guarantee convergence in 2D but produce long, inefficient paths.

The Robot Operating System (ROS) provides the most widely used open-source navigation framework. The original ROS 1 Navigation Stack, and its successor Nav2 for ROS 2, implement a full pipeline from sensor data to motor commands.^[17]

Nav2 architecture

Nav2 (Navigation2) is the production-grade navigation framework for ROS 2, trusted by over 100 companies worldwide as of 2025. Its architecture is built on ROS 2 action servers, lifecycle-managed nodes, and behavior trees (using BehaviorTree.CPP V4) for task orchestration.

The core components of Nav2 include:

Planner Server: computes global paths using pluggable algorithms including NavFn (Dijkstra/A*), Smac Planner (Hybrid-A*, State Lattice), and Theta*.
Controller Server: executes local trajectory tracking through controllers such as DWB (Dynamic Window B), TEB (Timed Elastic Band), Regulated Pure Pursuit, and MPPI (Model Predictive Path Integral).
Behavior Server: manages recovery behaviors (spin in place, back up, wait) when the robot gets stuck.
Smoother Server: post-processes global paths for smoother trajectories.
Costmap2D: maintains layered 2D cost maps with static, obstacle, and inflation layers that the planner and controller use to represent occupied and dangerous space.
AMCL: Adaptive Monte Carlo Localization, a particle-filter-based localization module.^[18]

Nav2 supports differential-drive, holonomic, Ackermann (car-like), and legged robot kinematics through its plugin architecture.

Indoor navigation presents specific challenges that distinguish it from outdoor operation.

GPS signals are unreliable or unavailable inside buildings due to walls and ceilings, so indoor robots depend entirely on onboard sensors (LiDAR, cameras, wheel odometry) and pre-built maps or SLAM for localization. Indoor environments tend to be geometrically structured (walls, corridors, rooms) but highly dynamic, with people, furniture, and doors that open and close. The robot must navigate socially, yielding to pedestrians and respecting personal space.

Common indoor navigation applications include warehouse autonomous mobile robots (AMRs), hospital delivery robots, cleaning robots (e.g., Roomba and its successors), and service robots in hotels and airports. These systems typically use 2D LiDAR SLAM on an occupancy grid, with A* or Dijkstra for global planning and DWA or TEB for local control.

Indoor positioning can also be augmented with infrastructure-based systems such as ultra-wideband (UWB) beacons, Wi-Fi fingerprinting, or Bluetooth Low Energy (BLE) beacons, which provide absolute position references to correct odometry drift.

Outdoor navigation introduces challenges related to scale, terrain variability, weather, and GPS availability.

Outdoor robots can use GPS/GNSS for coarse global localization, but centimeter-level accuracy requires RTK (Real-Time Kinematic) corrections or fusion with other sensors. Terrain can vary from paved roads to gravel, grass, mud, and uneven ground, requiring the robot to assess traversability, not just detect obstacles. Lighting conditions change dramatically between sun and shade, and weather (rain, fog, snow) can degrade both camera and LiDAR performance.

Autonomous vehicles are the highest-profile outdoor navigation application. Self-driving cars from companies like Waymo, Cruise, and others combine high-definition maps, LiDAR, cameras, radar, and IMUs in a tightly integrated perception and planning stack. Agricultural robots, delivery robots (e.g., Starship Technologies, Nuro), and planetary rovers (NASA's Mars rovers Curiosity and Perseverance) also rely on outdoor navigation. Unstructured outdoor environments, such as forests and disaster sites, remain among the most challenging domains for autonomous navigation.

Machine learning and deep learning approaches are increasingly used alongside or in place of classical navigation algorithms.

Reinforcement learning (RL) trains a navigation policy by letting the agent interact with an environment (real or simulated) and learn from rewards. The agent receives sensor observations as state input and outputs velocity commands. Deep RL methods such as Deep Q-Networks (DQN), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) have been applied to mapless navigation, where the robot learns to reach goals and avoid obstacles directly from raw sensor data without building an explicit map.

The main advantage of RL-based navigation is adaptability: a well-trained policy can generalize to new environments without manual tuning. The main challenges are sample efficiency (training requires millions of interactions), sim-to-real transfer (policies trained in simulation may fail on real hardware), and safety during training.

End-to-end learning

End-to-end approaches train a single neural network to map directly from sensor input (e.g., camera images) to control commands. Early work includes ALVINN (Pomerleau, 1989), which trained a neural network to steer a vehicle from camera images, and NVIDIA's PilotNet (Bojarski et al., 2016), which used convolutional neural networks for lane following. These methods bypass the traditional perception-planning-control pipeline but are difficult to interpret and verify.

Visual navigation uses camera images as the primary input for navigation decisions. Recent systems combine computer vision with learned representations to navigate in previously unseen environments. Point-goal navigation ("go to coordinates X, Y") and image-goal navigation ("go to the place shown in this photo") are standard benchmarks in the embodied AI community, evaluated in simulation environments like Habitat (Meta AI) and AI2-THOR (Allen Institute for AI).

Object-goal navigation tasks the robot with finding a specific object category (e.g., "find a refrigerator") in an unexplored environment. This requires the robot to combine exploration, object detection, and commonsense reasoning about where objects are likely to be found. Recent approaches (2024-2025) use scene graphs and large language models for high-level reasoning. CogNav (Cao et al., ICCV 2025) uses LLM-driven cognitive maps combining semantic scene graphs, landmark graphs, and occupancy maps. FOM-Nav (2025) maintains frontier-object maps with 3D object point clouds to guide exploration efficiently. The Open Scene Graph Navigator (2025) organizes spatial information hierarchically for zero-shot generalization across environments and robot platforms.

Vision-language navigation (VLN) requires a robot to follow natural language instructions (e.g., "Go past the kitchen and turn left at the hallway") while navigating through a 3D environment. VLN integrates natural language processing, visual perception, and spatial reasoning. The Room-to-Room (R2R) dataset (Anderson et al., 2018) is a widely used benchmark.^[19] Recent work leverages vision-language models and transformer architectures to improve instruction following and generalization.

As of 2025, foundation models are being integrated into robot navigation at multiple levels. Large language models serve as high-level planners that decompose natural language goals into sequences of navigation sub-tasks. Vision-language models provide zero-shot object recognition and scene understanding for semantic navigation. Vision-Language-Action (VLA) models attempt to map visual observations and language instructions directly to robot actions. While these approaches show strong results on benchmarks, transferring them reliably to physical robots in uncontrolled settings remains an active research problem.

When multiple robots share an environment, navigation must account for inter-robot coordination to avoid collisions and deadlocks. Centralized approaches plan paths for all robots jointly but scale poorly. Decentralized approaches let each robot plan independently while following collision avoidance protocols such as ORCA (Optimal Reciprocal Collision Avoidance, van den Berg et al., 2011).^[20] Multi-robot SLAM systems extend single-robot SLAM by merging maps built by different robots, requiring robust place recognition and inter-robot communication.

Current state and trends (2025-2026)

Several trends define the current state of robot navigation research and deployment.

Neural SLAM and learned representations. Neural network-based SLAM systems are replacing hand-crafted features with learned descriptors and end-to-end differentiable pipelines. Systems using the Mamba state-space architecture for efficient visual feature processing, combined with transformer-based attention, represent one frontier.

Event cameras for SLAM. Event-based cameras, which output asynchronous per-pixel brightness changes rather than full frames, offer high dynamic range, microsecond-level temporal resolution, and low power consumption. These properties make them attractive for high-speed navigation and environments with extreme lighting variation.

Multi-sensor fusion as default. Modern production systems almost universally combine LiDAR, cameras, and IMUs in tightly coupled fusion frameworks. The single-sensor SLAM system is increasingly a research prototype rather than a deployed product.

LLM-powered planning. Large language models are being used as high-level task planners and commonsense reasoning engines for navigation. They decompose goals like "bring me a glass of water" into navigation sub-goals and manipulation actions, though reliability in open-ended real-world scenarios remains limited.

Sim-to-real transfer. Photorealistic simulators (Habitat, Isaac Sim, AI2-THOR) allow training navigation policies in simulation and transferring them to real robots. Domain randomization and domain adaptation techniques help bridge the gap between simulated and real sensor data.

Continuous and lifelong mapping. Robots deployed for months or years in the same environment need maps that update incrementally as the environment changes, handling new furniture, seasonal changes, or construction. Lifelong SLAM and experience-based navigation address this requirement.

References

Smith, R., Self, M., and Cheeseman, P. (1990). "Estimating Uncertain Spatial Relationships in Robotics." In *Autonomous Robot Vehicles*, pp. 167-193. Springer. ↩
Durrant-Whyte, H. and Bailey, T. (2006). "Simultaneous Localization and Mapping: Part I." *IEEE Robotics and Automation Magazine*, 13(2), pp. 99-110. ↩
Montemerlo, M., Thrun, S., Koller, D., and Wegbreit, B. (2002). "FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem." *AAAI Conference on Artificial Intelligence*. ↩
Mur-Artal, R., Montiel, J.M.M., and Tardos, J.D. (2015). "ORB-SLAM: A Versatile and Accurate Monocular SLAM System." *IEEE Transactions on Robotics*, 31(5), pp. 1147-1163. ↩
Campos, C., Elvira, R., Rodriguez, J.J.G., Montiel, J.M.M., and Tardos, J.D. (2021). "ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM." *IEEE Transactions on Robotics*, 37(6), pp. 1874-1890. ↩
Hess, W., Kohler, D., Rapp, H., and Andor, D. (2016). "Real-Time Loop Closure in 2D LIDAR SLAM." *IEEE International Conference on Robotics and Automation (ICRA)*, pp. 1271-1278. ↩
Hart, P.E., Nilsson, N.J., and Raphael, B. (1968). "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." *IEEE Transactions on Systems Science and Cybernetics*, 4(2), pp. 100-107. ↩
Stentz, A. (1994). "Optimal and Efficient Path Planning for Partially-Known Environments." *IEEE International Conference on Robotics and Automation (ICRA)*. ↩
Koenig, S. and Likhachev, M. (2002). "D* Lite." *AAAI Conference on Artificial Intelligence*, pp. 476-483. ↩
LaValle, S.M. (1998). "Rapidly-Exploring Random Trees: A New Tool for Path Planning." *Technical Report*, Computer Science Department, Iowa State University. ↩
Karaman, S. and Frazzoli, E. (2011). "Sampling-based Algorithms for Optimal Motion Planning." *International Journal of Robotics Research*, 30(7), pp. 846-894. ↩
Kavraki, L.E., Svestka, P., Latombe, J.C., and Overmars, M.H. (1996). "Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces." *IEEE Transactions on Robotics and Automation*, 12(4), pp. 566-580. ↩
Khatib, O. (1986). "Real-Time Obstacle Avoidance for Manipulators and Mobile Robots." *International Journal of Robotics Research*, 5(1), pp. 90-98. ↩
Fox, D., Burgard, W., and Thrun, S. (1997). "The Dynamic Window Approach to Collision Avoidance." *IEEE Robotics and Automation Magazine*, 4(1), pp. 23-33. ↩
Borenstein, J. and Koren, Y. (1991). "The Vector Field Histogram - Fast Obstacle Avoidance for Mobile Robots." *IEEE Transactions on Robotics and Automation*, 7(3), pp. 278-288. ↩
Kummerle, R., Grisetti, G., Strasdat, H., Konolige, K., and Burgard, W. (2011). "g2o: A General Framework for Graph Optimization." *IEEE International Conference on Robotics and Automation (ICRA)*. ↩
Macenski, S., Foote, T., Gerkey, B., Lalancette, C., and Woodall, W. (2020). "Robot Operating System 2: Design, Architecture, and Uses in the Wild." *Science Robotics*, 7(66). ↩
Thrun, S., Burgard, W., and Fox, D. (2005). *Probabilistic Robotics*. MIT Press. ↩
Anderson, P., Wu, Q., Teney, D., Bruce, J., Johnson, M., Sunderhauf, N., Reid, I., Gould, S., and van den Berg, A. (2018). "Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments." *CVPR*. ↩
van den Berg, J., Guy, S.J., Lin, M., and Manocha, D. (2011). "Reciprocal n-Body Collision Avoidance." *Robotics Research*, pp. 3-19. Springer. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

Overview

Sensors for robot navigation

Simultaneous Localization and Mapping (SLAM)

History

Mathematical formulation

Types of SLAM

EKF-SLAM

Particle filter SLAM (FastSLAM)

Graph-based SLAM

Visual SLAM

LiDAR SLAM

Semantic SLAM

Multi-sensor fusion SLAM

SLAM implementations

Path planning

Configuration space

Graph search algorithms

Dijkstra's algorithm

A* algorithm

D* and D* Lite

Theta*

Sampling-based algorithms

Probabilistic Roadmap (PRM)

Rapidly-exploring Random Tree (RRT)

RRT*

Comparison of path planning algorithms

Obstacle avoidance

Artificial Potential Field (APF)

Vector Field Histogram (VFH)

Dynamic Window Approach (DWA)

Model Predictive Control (MPC)

Bug algorithms

The ROS Navigation Stack

Nav2 architecture

Indoor navigation

Outdoor navigation

Learning-based navigation

Reinforcement learning for navigation

End-to-end learning

Visual navigation

Semantic and object-goal navigation

Vision-language navigation

Foundation models for navigation

Multi-robot navigation

Current state and trends (2025-2026)

References

Improve this article

Related Articles

Autonomous driving

Drone

Robot manipulation

Simultaneous Localization and Mapping (SLAM)

Intel RealSense D555

Rerun (rerun.io)

Related Articles

Autonomous driving

Drone

Robot manipulation

Simultaneous Localization and Mapping (SLAM)

Intel RealSense D555

Rerun (rerun.io)