Control theory
Last reviewed
May 1, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,981 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 1, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,981 words
Add missing citations, update stale details, or suggest a clearer explanation.
Control theory is the mathematical and engineering discipline concerned with designing and analysing systems that achieve desired behaviour through measurement and feedback. It studies how to manipulate the inputs (controls) to a dynamical system in order to make its outputs follow reference trajectories or stabilise around equilibria, in the presence of uncertainty and disturbances. The discipline is foundational for robotics, aerospace, the process industries, biology, economics, and increasingly for machine learning. Modern fields such as reinforcement learning, autonomous driving, and humanoid manipulation all build on a control-theoretic substrate.
The canonical modern reference is Karl Johan Aastrom and Richard M. Murray's Feedback Systems: An Introduction for Scientists and Engineers (Princeton University Press, 2nd ed. 2021), which won the 2011 IFAC Harold Chestnut Control Engineering Textbook Prize and is freely available online. Other standard texts include Katsuhiko Ogata's Modern Control Engineering (5th ed., Pearson 2010) and Hassan K. Khalil's Nonlinear Systems (3rd ed., Prentice Hall 2002).
For a long time, control engineering and AI evolved on parallel tracks: control theorists assumed access to a model of the plant, while machine-learning researchers assumed access to data. The two fields have converged. Benjamin Recht's 2019 survey A Tour of Reinforcement Learning: The View from Continuous Control (Annual Review of Control, Robotics, and Autonomous Systems; arXiv:1806.09460) makes the case that reinforcement learning is best understood as data-driven optimal control, and that many open RL questions are restatements of long-studied control problems.
Several concrete reasons make control theory essential for modern AI:
Richard Sutton and Andrew Barto's Reinforcement Learning: An Introduction (2nd ed., MIT Press 2018) opens with an explicit acknowledgement that the dynamic-programming and optimal-control communities are one of the three intellectual sources of RL, alongside trial-and-error learning and temporal-difference learning.
Control theory has a long history that long predates electronic computers, let alone AI. Mechanical feedback regulators were used in antiquity (water clocks, float regulators), but the modern engineering discipline took shape in the late nineteenth century and matured in the twentieth.
| Year | Event |
|---|---|
| ca. 250 BCE | Ktesibios of Alexandria designs a float-regulated water clock, an early example of mechanical feedback. |
| 1788 | James Watt and Matthew Boulton apply the centrifugal (flyball) governor to steam engines, regulating speed via feedback. |
| 1868 | James Clerk Maxwell publishes "On Governors" in the Proceedings of the Royal Society, the first formal stability analysis of a feedback regulator. |
| 1892 | Aleksandr Lyapunov's doctoral dissertation establishes the general theory of stability of motion that bears his name. |
| 1922 | Nicholas Minorsky publishes the first analysis of three-term (PID) control for ship steering. |
| 1932 | Harry Nyquist at Bell Labs publishes "Regeneration Theory", giving the Nyquist stability criterion for feedback amplifiers. |
| 1942 | John G. Ziegler and Nathaniel B. Nichols publish PID tuning rules at Taylor Instrument Companies. |
| 1945 | Hendrik W. Bode publishes Network Analysis and Feedback Amplifier Design, introducing Bode plots, gain margin, and phase margin. |
| 1948 | Norbert Wiener publishes Cybernetics: Or Control and Communication in the Animal and the Machine (MIT Press), tying control to information theory and biology. |
| 1957 | Richard Bellman publishes Dynamic Programming (Princeton), introducing the principle of optimality. |
| 1960 | Rudolf E. Kalman publishes "A New Approach to Linear Filtering and Prediction Problems" in the ASME Journal of Basic Engineering, introducing the Kalman filter. The same year he formalises state-space optimal control (LQR) in Boletin de la Sociedad Matematica Mexicana. |
| 1962 | Lev Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko publish The Mathematical Theory of Optimal Processes (Interscience), formalising the maximum principle. |
| 1969 | Apollo 11 lunar module uses an onboard Kalman filter for descent navigation. |
| 1980s | Robust control develops: H-infinity, structured singular value (mu-synthesis), Quantitative Feedback Theory. Adaptive control matures (Aastrom and Wittenmark). |
| 1989 | John Doyle, Keith Glover, Pramod Khargonekar, and Bruce Francis publish "State-space solutions to standard H-2 and H-infinity control problems" in IEEE Transactions on Automatic Control. |
| 1990s | Nonlinear control consolidates: feedback linearisation, sliding-mode (Utkin), backstepping (Krstic, Kanellakopoulos, Kokotovic). |
| 2000 | David Q. Mayne, James B. Rawlings, Christopher V. Rao, and Pierre O. M. Scokaert publish the survey "Constrained model predictive control: Stability and optimality" in Automatica. |
| 2005 | Stanford's Stanley wins the DARPA Grand Challenge, demonstrating autonomous driving with classical estimation and control fused with machine learning. |
| 2010s | Intersection with machine learning intensifies: data-driven control, learning-based model predictive control, differentiable MPC, model-free and model-based RL applied to robotics. |
| 2015 | SpaceX lands the Falcon 9 first stage using convex-optimisation guidance, an industrial demonstration of real-time optimal control. |
| 2020s | Whole-body MPC running at hundreds of hertz becomes standard on quadrupeds and humanoids (Boston Dynamics Atlas, ANYmal, Unitree). Foundation models start to appear inside control stacks. |
Norbert Wiener's revival of Maxwell's 1868 paper in Cybernetics (1948) is what cemented Maxwell's status as the founder of modern stability analysis. Before Wiener pointed to it, the paper had been largely forgotten because contemporaries found it hard to read.
Most of the vocabulary used in control papers and robotics codebases comes from a small set of core ideas. The table below collects the most important ones; full treatments are in Aastrom and Murray (2021) and Ogata (2010).
| Concept | Meaning |
|---|---|
| Plant | The physical system being controlled (motor, aircraft, chemical reactor, robot arm). |
| Controller | The algorithm that computes the input to the plant from measurements. |
| Sensor | Device that produces a measurement of the plant state or output. |
| Actuator | Device that applies the control input (motor, valve, thruster). |
| Open-loop control | Control input is computed without using any measurement of the output. Sensitive to disturbances and model error. |
| Closed-loop (feedback) control | Control input is a function of measured output, allowing rejection of disturbances and tolerance to model error. |
| Feedforward | Control input is computed from a known reference or measured disturbance, often combined with feedback. |
| Reference tracking | Goal of making the output follow a desired trajectory. |
| Disturbance rejection | Goal of keeping the output near a setpoint despite external disturbances. |
| LTI system | Linear, time-invariant system; the most studied class, where superposition holds and frequency-domain methods apply. |
| Transfer function | Ratio of Laplace-transformed output to input, describing an LTI system in the frequency domain. |
| State-space representation | Description of a system as x_dot = f(x, u), y = h(x, u), generalising to nonlinear and multi-input multi-output systems. |
| Laplace transform | Continuous-time integral transform used to convert ODEs into algebraic equations in the variable s. |
| z-transform | Discrete-time analogue of the Laplace transform, used for digital control. |
| Frequency response | The steady-state response of an LTI system to sinusoidal inputs, visualised via Bode and Nyquist plots. |
| BIBO stability | Bounded-input bounded-output stability of LTI systems; equivalent to all transfer-function poles being in the open left half-plane (continuous time). |
| Lyapunov stability | A solution is Lyapunov-stable if trajectories starting near it stay near it; asymptotic stability adds convergence. Proven via Lyapunov functions. |
| Controllability | A state-space system (A, B) is controllable if its state can be driven from any initial value to any final value in finite time. Kalman's rank condition checks rank[B AB ... A^(n-1)B] = n for LTI systems. |
| Observability | A system (A, C) is observable if the initial state can be reconstructed from input and output histories. Dual to controllability. |
| Stabilisability and detectability | Weaker conditions than controllability and observability that suffice for stabilising controllers and stable observers. |
The field has accumulated many design paradigms, each suited to particular structure and assumptions. The table below summarises the main families.
| Paradigm | Key idea | Typical use |
|---|---|---|
| PID (Proportional-Integral-Derivative) | Output is a weighted sum of error, its integral, and its derivative. Three tunable gains. | The workhorse of industrial control. Surveys consistently report that around 90 to 97 percent of industrial control loops use PID, with a Honeywell survey in 2000 estimating 97 percent. |
| State-feedback / pole placement | Design u = -Kx so that the closed-loop matrix A - BK has desired eigenvalues. | Multivariable LTI design when full state is measured or estimated. |
| Linear-quadratic regulator (LQR) | Optimal state feedback for linear dynamics and quadratic cost; gain K obtained by solving an algebraic Riccati equation. Introduced by Kalman (1960b). | Aerospace, helicopter and inverted-pendulum control, baseline for RL benchmarks. |
| Linear-quadratic-Gaussian (LQG) | LQR combined with a Kalman filter; satisfies a separation principle. | Standard for stochastic LTI systems with Gaussian noise. |
| Pontryagin's maximum principle | Necessary conditions for optimal control via a Hamiltonian; introduced by Pontryagin et al. (1962). | Trajectory optimisation, aerospace guidance. |
| Hamilton-Jacobi-Bellman (HJB) | Sufficient conditions and value-function PDE for continuous-time optimal control. | Theoretical foundation for dynamic programming and RL. |
| Robust control (H-infinity, mu-synthesis) | Minimise worst-case sensitivity to bounded disturbances and model uncertainty. State-space H-infinity solution by Doyle, Glover, Khargonekar, Francis (1989). | Flight control, hard-disk read-write heads, applications with structured uncertainty. |
| Adaptive control | Controller parameters are adjusted online based on observed plant behaviour. | Plants whose parameters change slowly or are unknown, e.g. tail-rotor failures, fuel-burn shifts in aircraft. |
| Nonlinear control | Techniques such as feedback linearisation, sliding-mode (Utkin 1977), backstepping (Krstic et al. 1995). | Robotic manipulators, electric drives, automotive subsystems. |
| Model predictive control (MPC) | At each step, solve a finite-horizon optimal control problem and apply only the first input; repeat. Mayne et al. (2000) provide the canonical stability survey. | Refineries since the 1980s, autonomous driving, quadrupeds, humanoids, rocket landings. |
| Sliding-mode control | Force the system state onto a designed sliding surface and keep it there using high-frequency switching. | Power converters, electric drives, automotive ABS. |
| Fuzzy control | Use linguistic if-then rules and fuzzy-set inference instead of explicit dynamics. | Consumer appliances, expert-elicited rule bases. |
| Reinforcement learning control | Learn a policy directly from interaction, model-free or model-based. | Game-playing, dexterous manipulation, locomotion, recently combined with classical control as warm starts and safety filters. |
PID dominates for a reason: it is robust, easy to tune, requires no plant model, and works well on near-linear loops. The more sophisticated paradigms become attractive when constraints, multivariable coupling, or strong nonlinearity break PID's assumptions.
Most real systems do not give direct access to their internal state. Estimation theory bridges this gap by reconstructing state from noisy partial measurements. This is also the basis for sensor fusion.
The Kalman filter, introduced by Rudolf E. Kalman in 1960, provides the optimal linear minimum-mean-square-error estimator for linear-Gaussian state-space systems. It is recursive, updating its estimate as each new measurement arrives, and propagates a covariance matrix that quantifies uncertainty. The filter has been used continuously in aerospace since NASA's Stanley F. Schmidt adapted it for the Apollo program; every Apollo lunar mission, including the Apollo 11 descent in 1969, used a Kalman filter to fuse inertial measurement unit, optical, and ground-tracking data.
For nonlinear systems, the extended Kalman filter (EKF) linearises the dynamics and measurement equations around the current estimate. It is widely used despite well-known divergence problems on highly nonlinear systems. The unscented Kalman filter (UKF), introduced by Simon Julier and Jeffrey Uhlmann in 1997, propagates a deterministic set of sigma points through the nonlinear functions and reconstructs mean and covariance, often outperforming the EKF without computing Jacobians. Particle filters use a Monte-Carlo set of weighted samples and apply to general non-Gaussian distributions; they are standard in robotics for problems such as global localisation. Moving-horizon estimation is the dual of model predictive control on the estimation side, solving a constrained optimisation over a sliding window of past measurements.
In modern robot systems, the line between estimation and learning blurs. EKFs and UKFs sit inside SLAM systems such as MonoSLAM and FastSLAM; particle filters power Monte-Carlo localisation in mobile robots; learned visual feature detectors feed measurements into Kalman or factor-graph back ends.
Control algorithms run, often invisibly, inside almost every autonomous system. The table below collects representative deployments.
| Domain | Typical control approach | Notes |
|---|---|---|
| Industrial process control | PID, MPC | Refineries, chemical plants, pulp and paper. MPC has been deployed on thousands of refinery columns since the 1980s (Aspen DMCplus, Honeywell Profit Controller). |
| HVAC and building automation | PID, supervisory MPC | Setpoint regulation for temperature, humidity, and CO2. |
| Power converters and grids | PID, sliding-mode, MPC | Inverters, DC-DC converters, frequency regulation. |
| Automotive subsystems | PID, sliding-mode, gain-scheduled control | Engine control, anti-lock braking (ABS), electronic stability programme (ESP), cruise control. |
| Aircraft autopilots | PID, gain-scheduled, robust H-infinity | Boeing 777, Airbus A320 family, drones such as PX4 and ArduPilot stacks. |
| Spacecraft attitude and navigation | LQG, Kalman filter, sliding-mode | Apollo lunar module (Kalman filter), Mars rovers, ISS attitude control. |
| Rocket landing | Convex-optimisation guidance, MPC | SpaceX Falcon 9 first-stage powered descent uses real-time convex programming. |
| Industrial robot arms | PID for joint loops, computed-torque, MPC for trajectory | KUKA, ABB, FANUC; ROS Control framework. |
| Surgical robots | Cartesian impedance control, motion-scaling control | Intuitive Surgical's da Vinci system. |
| Drone flight control | Cascaded PID, geometric control on SO(3), MPC | PX4 and ArduPilot use cascaded PID by default; research stacks use geometric or MPC. |
| Autonomous driving | MPC for trajectory tracking, LQR for lateral control, EKF or UKF for state estimation | Stanford's Stanley (2005 DARPA Grand Challenge winner) used classical control plus learned terrain classifiers. Modern systems from Waymo, Cruise, and Tesla embed MPC for motion planning and trajectory tracking. |
| Quadrupeds | Whole-body MPC, convex MPC, RL policies | Boston Dynamics Spot, ANYbotics ANYmal, MIT Cheetah, Unitree Go and B series. |
| Humanoids | Whole-body MPC, model-based optimisation | Boston Dynamics Atlas uses a model-predictive controller for whole-body motion; Tesla Optimus and Figure use similar internal control loops. |
| Manipulation | Operational-space control, impedance control, RL | Inner loops of dexterous-manipulation research stacks. |
| SLAM | EKF, UKF, particle filters, factor graphs | EKF-SLAM, MonoSLAM, FastSLAM use estimation-theory machinery for joint pose and map estimation. |
A particularly clean industrial example is the Falcon 9 booster landing. The vehicle solves a convex-optimisation problem in real time during powered descent, computing thrust profiles that satisfy fuel, attitude, and glide-slope constraints. The mathematics is descended directly from optimal-control theory developed in the 1960s.
The relationship between control and reinforcement learning is often summarised as: RL is control without a known model. Both fields care about choosing actions to optimise a long-horizon objective in a dynamical environment. Both rely on the Bellman equation. The differences are in what is assumed and how solutions are computed.
Classical optimal control assumes a known dynamics model f and cost function g, then solves either Pontryagin's necessary conditions (open-loop trajectory optimisation) or the Hamilton-Jacobi-Bellman equation (closed-loop value function). Reinforcement learning assumes that f and possibly g are unknown, but that the agent can interact with the environment and observe rewards. Value-function methods such as Q-learning and SARSA derive directly from the discrete-time Bellman equation; policy-gradient methods perform stochastic gradient ascent on the expected return; actor-critic methods combine the two.
Recht's 2019 survey is the cleanest bridge between the two literatures. It analyses LQR with unknown dynamics as the simplest non-trivial RL benchmark, derives sample-complexity bounds, and shows experimentally that model-based methods often dominate model-free methods on continuous-control problems where good models can be learned. Sutton and Barto's 2018 textbook makes the inverse point: trial-and-error RL existed before dynamic programming, and the synthesis of the two traditions defines modern RL.
Model-based RL (such as PILCO, Dyna, and Dreamer) is essentially MPC plus learned dynamics, with the planning step replaced by simulation in a learned world model. Differentiable MPC, introduced by Brandon Amos and colleagues in 2018, runs MPC inside the computation graph so that controller parameters can be trained by gradient descent. Safe RL builds barrier-function or robust-MPC layers around learned policies to enforce constraints with provable guarantees.
Control-system design is supported by mature toolchains in both academia and industry.
| Tool | Use |
|---|---|
| MATLAB Control System Toolbox | Industry standard for transfer-function and state-space design, root locus, Bode and Nyquist plots, LQR, Kalman filter design. |
| Simulink | Block-diagram simulation and code generation; widely used in automotive and aerospace. |
Python python-control package | Open-source MATLAB-style control library. |
scipy.signal and scipy.linalg | LTI building blocks (state-space, transfer functions, Riccati solvers). |
| CasADi | Symbolic framework for nonlinear optimisation and optimal control. |
| ACADO and acados | Real-time MPC and nonlinear optimal control. |
| do-mpc | Python framework for MPC and moving-horizon estimation. |
| IPOPT | Interior-point nonlinear programming solver, often used as the inner solver for trajectory optimisation. |
| ROS Control framework | Standard hardware-abstraction and controller-manager layer in the Robot Operating System. |
| Drake (Toyota Research Institute) | Robotics-oriented modelling, simulation, optimisation, and control library written in C++ with Python bindings. |
| Mathematica | Symbolic control design and analysis. |
The last decade has been defined by the convergence of classical control with machine learning. Several lines of work stand out:
What unifies these trends is the realisation that control and learning are not competing solutions to the same problem; they are complementary tools. Modern systems blend them at every layer of the stack.
A few demonstrations have shaped how researchers and the public understand the field:
Control theory does not stand alone. It draws on and feeds into several adjacent fields. Signal processing supplies filtering, sampling, and frequency-domain tools. Systems engineering supplies the methodology for integrating controllers into larger products. Optimisation supplies the algorithms (interior-point methods, sequential quadratic programming, convex programming) that power MPC and trajectory optimisation. Dynamical systems theory supplies the mathematics of stability, bifurcation, and chaos. Mechatronics fuses control with mechanical and electrical design for actuators and sensors. Information theory and statistics supply the foundations of estimation and Kalman filtering. The boundaries between these fields are porous, and modern researchers in robotics or autonomous systems generally need fluency in several of them.
For a modern engineer building autonomous driving software, an autonomous mobile robot, or a humanoid platform, control theory is not optional background. It is the language in which the inner loops of the system are written, the mathematics in which safety arguments are framed, and the literature against which any new RL or learning-based method is benchmarked.