Control theory

Control theory is the mathematical and engineering discipline concerned with designing and analysing systems that achieve desired behaviour through measurement and feedback. It studies how to manipulate the inputs (controls) to a dynamical system in order to make its outputs follow reference trajectories or stabilise around equilibria, in the presence of uncertainty and disturbances. The discipline is foundational for robotics, aerospace, the process industries, biology, economics, and increasingly for machine learning. Modern fields such as reinforcement learning, autonomous driving, and humanoid manipulation all build on a control-theoretic substrate.

The canonical modern reference is Karl Johan Aastrom and Richard M. Murray's Feedback Systems: An Introduction for Scientists and Engineers (Princeton University Press, 2nd ed. 2021), which won the 2011 IFAC Harold Chestnut Control Engineering Textbook Prize and is freely available online. Other standard texts include Katsuhiko Ogata's Modern Control Engineering (5th ed., Pearson 2010) and Hassan K. Khalil's Nonlinear Systems (3rd ed., Prentice Hall 2002).

Why control theory matters for AI

For a long time, control engineering and AI evolved on parallel tracks: control theorists assumed access to a model of the plant, while machine-learning researchers assumed access to data. The two fields have converged. Benjamin Recht's 2019 survey A Tour of Reinforcement Learning: The View from Continuous Control (Annual Review of Control, Robotics, and Autonomous Systems; arXiv:1806.09460) makes the case that reinforcement learning is best understood as data-driven optimal control, and that many open RL questions are restatements of long-studied control problems.

Several concrete reasons make control theory essential for modern AI:

Reinforcement learning is, in many formulations, data-driven control. The Bellman equation that underlies value-function methods comes directly from Richard Bellman's 1957 dynamic programming, originally developed for optimal control.
Robotics tightly intertwines control with perception and learning. Every quadruped, manipulator, and humanoid contains low-level control loops underneath any higher-level learned policy.
Safety-critical AI systems often need control-theoretic guarantees such as Lyapunov stability, robust performance, or input-to-state stability that cannot be established by sample-based learning alone.
Optimal control techniques such as the linear-quadratic regulator (LQR), the Hamilton-Jacobi-Bellman equation, and Pontryagin's maximum principle directly motivated the design of policy-gradient and actor-critic algorithms.
Autonomous driving, drones, and humanoids embed control inside their stacks: even when a reinforcement learning policy chooses a high-level action, the low-level joint torques or wheel torques are still computed by a controller.

Richard Sutton and Andrew Barto's Reinforcement Learning: An Introduction (2nd ed., MIT Press 2018) opens with an explicit acknowledgement that the dynamic-programming and optimal-control communities are one of the three intellectual sources of RL, alongside trial-and-error learning and temporal-difference learning.

A brief history

Control theory has a long history that long predates electronic computers, let alone AI. Mechanical feedback regulators were used in antiquity (water clocks, float regulators), but the modern engineering discipline took shape in the late nineteenth century and matured in the twentieth.

Year	Event
ca. 250 BCE	Ktesibios of Alexandria designs a float-regulated water clock, an early example of mechanical feedback.
1788	James Watt and Matthew Boulton apply the centrifugal (flyball) governor to steam engines, regulating speed via feedback.
1868	James Clerk Maxwell publishes "On Governors" in the Proceedings of the Royal Society, the first formal stability analysis of a feedback regulator.
1892	Aleksandr Lyapunov's doctoral dissertation establishes the general theory of stability of motion that bears his name.
1922	Nicholas Minorsky publishes the first analysis of three-term (PID) control for ship steering.
1932	Harry Nyquist at Bell Labs publishes "Regeneration Theory", giving the Nyquist stability criterion for feedback amplifiers.
1942	John G. Ziegler and Nathaniel B. Nichols publish PID tuning rules at Taylor Instrument Companies.
1945	Hendrik W. Bode publishes Network Analysis and Feedback Amplifier Design, introducing Bode plots, gain margin, and phase margin.
1948	Norbert Wiener publishes Cybernetics: Or Control and Communication in the Animal and the Machine (MIT Press), tying control to information theory and biology.
1957	Richard Bellman publishes Dynamic Programming (Princeton), introducing the principle of optimality.
1960	Rudolf E. Kalman publishes "A New Approach to Linear Filtering and Prediction Problems" in the ASME Journal of Basic Engineering, introducing the Kalman filter. The same year he formalises state-space optimal control (LQR) in Boletin de la Sociedad Matematica Mexicana.
1962	Lev Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko publish The Mathematical Theory of Optimal Processes (Interscience), formalising the maximum principle.
1969	Apollo 11 lunar module uses an onboard Kalman filter for descent navigation.
1980s	Robust control develops: H-infinity, structured singular value (mu-synthesis), Quantitative Feedback Theory. Adaptive control matures (Aastrom and Wittenmark).
1989	John Doyle, Keith Glover, Pramod Khargonekar, and Bruce Francis publish "State-space solutions to standard H-2 and H-infinity control problems" in IEEE Transactions on Automatic Control.
1990s	Nonlinear control consolidates: feedback linearisation, sliding-mode (Utkin), backstepping (Krstic, Kanellakopoulos, Kokotovic).
2000	David Q. Mayne, James B. Rawlings, Christopher V. Rao, and Pierre O. M. Scokaert publish the survey "Constrained model predictive control: Stability and optimality" in Automatica.
2005	Stanford's Stanley wins the DARPA Grand Challenge, demonstrating autonomous driving with classical estimation and control fused with machine learning.
2010s	Intersection with machine learning intensifies: data-driven control, learning-based model predictive control, differentiable MPC, model-free and model-based RL applied to robotics.
2015	SpaceX lands the Falcon 9 first stage using convex-optimisation guidance, an industrial demonstration of real-time optimal control.
2020s	Whole-body MPC running at hundreds of hertz becomes standard on quadrupeds and humanoids (Boston Dynamics Atlas, ANYmal, Unitree). Foundation models start to appear inside control stacks.

Norbert Wiener's revival of Maxwell's 1868 paper in Cybernetics (1948) is what cemented Maxwell's status as the founder of modern stability analysis. Before Wiener pointed to it, the paper had been largely forgotten because contemporaries found it hard to read.

Foundational concepts

Most of the vocabulary used in control papers and robotics codebases comes from a small set of core ideas. The table below collects the most important ones; full treatments are in Aastrom and Murray (2021) and Ogata (2010).

Concept	Meaning
Plant	The physical system being controlled (motor, aircraft, chemical reactor, robot arm).
Controller	The algorithm that computes the input to the plant from measurements.
Sensor	Device that produces a measurement of the plant state or output.
Actuator	Device that applies the control input (motor, valve, thruster).
Open-loop control	Control input is computed without using any measurement of the output. Sensitive to disturbances and model error.
Closed-loop (feedback) control	Control input is a function of measured output, allowing rejection of disturbances and tolerance to model error.
Feedforward	Control input is computed from a known reference or measured disturbance, often combined with feedback.
Reference tracking	Goal of making the output follow a desired trajectory.
Disturbance rejection	Goal of keeping the output near a setpoint despite external disturbances.
LTI system	Linear, time-invariant system; the most studied class, where superposition holds and frequency-domain methods apply.
Transfer function	Ratio of Laplace-transformed output to input, describing an LTI system in the frequency domain.
State-space representation	Description of a system as x_dot = f(x, u), y = h(x, u), generalising to nonlinear and multi-input multi-output systems.
Laplace transform	Continuous-time integral transform used to convert ODEs into algebraic equations in the variable s.
z-transform	Discrete-time analogue of the Laplace transform, used for digital control.
Frequency response	The steady-state response of an LTI system to sinusoidal inputs, visualised via Bode and Nyquist plots.
BIBO stability	Bounded-input bounded-output stability of LTI systems; equivalent to all transfer-function poles being in the open left half-plane (continuous time).
Lyapunov stability	A solution is Lyapunov-stable if trajectories starting near it stay near it; asymptotic stability adds convergence. Proven via Lyapunov functions.
Controllability	A state-space system (A, B) is controllable if its state can be driven from any initial value to any final value in finite time. Kalman's rank condition checks rank[B AB ... A^(n-1)B] = n for LTI systems.
Observability	A system (A, C) is observable if the initial state can be reconstructed from input and output histories. Dual to controllability.
Stabilisability and detectability	Weaker conditions than controllability and observability that suffice for stabilising controllers and stable observers.

Major control paradigms

The field has accumulated many design paradigms, each suited to particular structure and assumptions. The table below summarises the main families.

Paradigm	Key idea	Typical use
PID (Proportional-Integral-Derivative)	Output is a weighted sum of error, its integral, and its derivative. Three tunable gains.	The workhorse of industrial control. Surveys consistently report that around 90 to 97 percent of industrial control loops use PID, with a Honeywell survey in 2000 estimating 97 percent.
State-feedback / pole placement	Design u = -Kx so that the closed-loop matrix A - BK has desired eigenvalues.	Multivariable LTI design when full state is measured or estimated.
Linear-quadratic regulator (LQR)	Optimal state feedback for linear dynamics and quadratic cost; gain K obtained by solving an algebraic Riccati equation. Introduced by Kalman (1960b).	Aerospace, helicopter and inverted-pendulum control, baseline for RL benchmarks.
Linear-quadratic-Gaussian (LQG)	LQR combined with a Kalman filter; satisfies a separation principle.	Standard for stochastic LTI systems with Gaussian noise.
Pontryagin's maximum principle	Necessary conditions for optimal control via a Hamiltonian; introduced by Pontryagin et al. (1962).	Trajectory optimisation, aerospace guidance.
Hamilton-Jacobi-Bellman (HJB)	Sufficient conditions and value-function PDE for continuous-time optimal control.	Theoretical foundation for dynamic programming and RL.
Robust control (H-infinity, mu-synthesis)	Minimise worst-case sensitivity to bounded disturbances and model uncertainty. State-space H-infinity solution by Doyle, Glover, Khargonekar, Francis (1989).	Flight control, hard-disk read-write heads, applications with structured uncertainty.
Adaptive control	Controller parameters are adjusted online based on observed plant behaviour.	Plants whose parameters change slowly or are unknown, e.g. tail-rotor failures, fuel-burn shifts in aircraft.
Nonlinear control	Techniques such as feedback linearisation, sliding-mode (Utkin 1977), backstepping (Krstic et al. 1995).	Robotic manipulators, electric drives, automotive subsystems.
Model predictive control (MPC)	At each step, solve a finite-horizon optimal control problem and apply only the first input; repeat. Mayne et al. (2000) provide the canonical stability survey.	Refineries since the 1980s, autonomous driving, quadrupeds, humanoids, rocket landings.
Sliding-mode control	Force the system state onto a designed sliding surface and keep it there using high-frequency switching.	Power converters, electric drives, automotive ABS.
Fuzzy control	Use linguistic if-then rules and fuzzy-set inference instead of explicit dynamics.	Consumer appliances, expert-elicited rule bases.
Reinforcement learning control	Learn a policy directly from interaction, model-free or model-based.	Game-playing, dexterous manipulation, locomotion, recently combined with classical control as warm starts and safety filters.

PID dominates for a reason: it is robust, easy to tune, requires no plant model, and works well on near-linear loops. The more sophisticated paradigms become attractive when constraints, multivariable coupling, or strong nonlinearity break PID's assumptions.

Estimation and state observers

Most real systems do not give direct access to their internal state. Estimation theory bridges this gap by reconstructing state from noisy partial measurements. This is also the basis for sensor fusion.

The Kalman filter, introduced by Rudolf E. Kalman in 1960, provides the optimal linear minimum-mean-square-error estimator for linear-Gaussian state-space systems. It is recursive, updating its estimate as each new measurement arrives, and propagates a covariance matrix that quantifies uncertainty. The filter has been used continuously in aerospace since NASA's Stanley F. Schmidt adapted it for the Apollo program; every Apollo lunar mission, including the Apollo 11 descent in 1969, used a Kalman filter to fuse inertial measurement unit, optical, and ground-tracking data.

For nonlinear systems, the extended Kalman filter (EKF) linearises the dynamics and measurement equations around the current estimate. It is widely used despite well-known divergence problems on highly nonlinear systems. The unscented Kalman filter (UKF), introduced by Simon Julier and Jeffrey Uhlmann in 1997, propagates a deterministic set of sigma points through the nonlinear functions and reconstructs mean and covariance, often outperforming the EKF without computing Jacobians. Particle filters use a Monte-Carlo set of weighted samples and apply to general non-Gaussian distributions; they are standard in robotics for problems such as global localisation. Moving-horizon estimation is the dual of model predictive control on the estimation side, solving a constrained optimisation over a sliding window of past measurements.

In modern robot systems, the line between estimation and learning blurs. EKFs and UKFs sit inside SLAM systems such as MonoSLAM and FastSLAM; particle filters power Monte-Carlo localisation in mobile robots; learned visual feature detectors feed measurements into Kalman or factor-graph back ends.

Applications in robotics and AI

Control algorithms run, often invisibly, inside almost every autonomous system. The table below collects representative deployments.

Domain	Typical control approach	Notes
Industrial process control	PID, MPC	Refineries, chemical plants, pulp and paper. MPC has been deployed on thousands of refinery columns since the 1980s (Aspen DMCplus, Honeywell Profit Controller).
HVAC and building automation	PID, supervisory MPC	Setpoint regulation for temperature, humidity, and CO2.
Power converters and grids	PID, sliding-mode, MPC	Inverters, DC-DC converters, frequency regulation.
Automotive subsystems	PID, sliding-mode, gain-scheduled control	Engine control, anti-lock braking (ABS), electronic stability programme (ESP), cruise control.
Aircraft autopilots	PID, gain-scheduled, robust H-infinity	Boeing 777, Airbus A320 family, drones such as PX4 and ArduPilot stacks.
Spacecraft attitude and navigation	LQG, Kalman filter, sliding-mode	Apollo lunar module (Kalman filter), Mars rovers, ISS attitude control.
Rocket landing	Convex-optimisation guidance, MPC	SpaceX Falcon 9 first-stage powered descent uses real-time convex programming.
Industrial robot arms	PID for joint loops, computed-torque, MPC for trajectory	KUKA, ABB, FANUC; ROS Control framework.
Surgical robots	Cartesian impedance control, motion-scaling control	Intuitive Surgical's da Vinci system.
Drone flight control	Cascaded PID, geometric control on SO(3), MPC	PX4 and ArduPilot use cascaded PID by default; research stacks use geometric or MPC.
Autonomous driving	MPC for trajectory tracking, LQR for lateral control, EKF or UKF for state estimation	Stanford's Stanley (2005 DARPA Grand Challenge winner) used classical control plus learned terrain classifiers. Modern systems from Waymo, Cruise, and Tesla embed MPC for motion planning and trajectory tracking.
Quadrupeds	Whole-body MPC, convex MPC, RL policies	Boston Dynamics Spot, ANYbotics ANYmal, MIT Cheetah, Unitree Go and B series.
Humanoids	Whole-body MPC, model-based optimisation	Boston Dynamics Atlas uses a model-predictive controller for whole-body motion; Tesla Optimus and Figure use similar internal control loops.
Manipulation	Operational-space control, impedance control, RL	Inner loops of dexterous-manipulation research stacks.
SLAM	EKF, UKF, particle filters, factor graphs	EKF-SLAM, MonoSLAM, FastSLAM use estimation-theory machinery for joint pose and map estimation.

A particularly clean industrial example is the Falcon 9 booster landing. The vehicle solves a convex-optimisation problem in real time during powered descent, computing thrust profiles that satisfy fuel, attitude, and glide-slope constraints. The mathematics is descended directly from optimal-control theory developed in the 1960s.

Connection with reinforcement learning

The relationship between control and reinforcement learning is often summarised as: RL is control without a known model. Both fields care about choosing actions to optimise a long-horizon objective in a dynamical environment. Both rely on the Bellman equation. The differences are in what is assumed and how solutions are computed.

Classical optimal control assumes a known dynamics model f and cost function g, then solves either Pontryagin's necessary conditions (open-loop trajectory optimisation) or the Hamilton-Jacobi-Bellman equation (closed-loop value function). Reinforcement learning assumes that f and possibly g are unknown, but that the agent can interact with the environment and observe rewards. Value-function methods such as Q-learning and SARSA derive directly from the discrete-time Bellman equation; policy-gradient methods perform stochastic gradient ascent on the expected return; actor-critic methods combine the two.

Recht's 2019 survey is the cleanest bridge between the two literatures. It analyses LQR with unknown dynamics as the simplest non-trivial RL benchmark, derives sample-complexity bounds, and shows experimentally that model-based methods often dominate model-free methods on continuous-control problems where good models can be learned. Sutton and Barto's 2018 textbook makes the inverse point: trial-and-error RL existed before dynamic programming, and the synthesis of the two traditions defines modern RL.

Model-based RL (such as PILCO, Dyna, and Dreamer) is essentially MPC plus learned dynamics, with the planning step replaced by simulation in a learned world model. Differentiable MPC, introduced by Brandon Amos and colleagues in 2018, runs MPC inside the computation graph so that controller parameters can be trained by gradient descent. Safe RL builds barrier-function or robust-MPC layers around learned policies to enforce constraints with provable guarantees.

Software

Control-system design is supported by mature toolchains in both academia and industry.

Tool	Use
MATLAB Control System Toolbox	Industry standard for transfer-function and state-space design, root locus, Bode and Nyquist plots, LQR, Kalman filter design.
Simulink	Block-diagram simulation and code generation; widely used in automotive and aerospace.
Python `python-control` package	Open-source MATLAB-style control library.
`scipy.signal` and `scipy.linalg`	LTI building blocks (state-space, transfer functions, Riccati solvers).
CasADi	Symbolic framework for nonlinear optimisation and optimal control.
ACADO and acados	Real-time MPC and nonlinear optimal control.
do-mpc	Python framework for MPC and moving-horizon estimation.
IPOPT	Interior-point nonlinear programming solver, often used as the inner solver for trajectory optimisation.
ROS Control framework	Standard hardware-abstraction and controller-manager layer in the Robot Operating System.
Drake (Toyota Research Institute)	Robotics-oriented modelling, simulation, optimisation, and control library written in C++ with Python bindings.
Mathematica	Symbolic control design and analysis.

Recent trends

The last decade has been defined by the convergence of classical control with machine learning. Several lines of work stand out:

Learning-based MPC, where parts of the dynamics, cost, or terminal set are learned from data while preserving the receding-horizon structure that gives MPC its safety properties.
Differentiable MPC and differentiable simulators, which let controllers be trained end-to-end with backpropagation.
Safe RL with control-theoretic guarantees, using control-barrier functions, robust MPC safety filters, or Lyapunov-based shielding.
Neural ODEs and continuous-time models meet control: neural networks parameterise the right-hand side of an ODE and are trained against trajectory data.
Meta-learning for adaptive control, learning controllers that adjust to new dynamics from a few seconds of interaction.
Foundation models for control: early research is exploring whether large pretrained models can be used as policies or world models for embodied agents.
Verifiable and certifiable control with RL, applying formal-methods tools (HJ reachability, sum-of-squares, abstract interpretation) to learned policies.

What unifies these trends is the realisation that control and learning are not competing solutions to the same problem; they are complementary tools. Modern systems blend them at every layer of the stack.

Notable demonstrations

A few demonstrations have shaped how researchers and the public understand the field:

The Apollo program's lunar-module guidance and navigation, which combined an inertial platform, a Kalman filter, and explicit-guidance algorithms small enough to run on a 70-kilogram computer with 2 KB of RAM.
Stanford's Stanley winning the 2005 DARPA Grand Challenge by completing a 132-mile desert course in 6 hours 53 minutes 58 seconds, fusing classical control and estimation with machine-learned terrain classifiers.
Boston Dynamics' Atlas performing parkour, backflips, and dynamic manipulation, all driven by a model-predictive controller with a learned add-on layer in newer versions.
SpaceX's first successful Falcon 9 first-stage landing in December 2015, repeated routinely since, demonstrating real-time convex-optimisation guidance at industrial scale.
DeepMind's data-centre cooling controllers, which used model-based optimisation against a learned model to cut Google data-centre cooling energy by around 40 percent.

Connection to other engineering disciplines

Control theory does not stand alone. It draws on and feeds into several adjacent fields. Signal processing supplies filtering, sampling, and frequency-domain tools. Systems engineering supplies the methodology for integrating controllers into larger products. Optimisation supplies the algorithms (interior-point methods, sequential quadratic programming, convex programming) that power MPC and trajectory optimisation. Dynamical systems theory supplies the mathematics of stability, bifurcation, and chaos. Mechatronics fuses control with mechanical and electrical design for actuators and sensors. Information theory and statistics supply the foundations of estimation and Kalman filtering. The boundaries between these fields are porous, and modern researchers in robotics or autonomous systems generally need fluency in several of them.

For a modern engineer building autonomous driving software, an autonomous mobile robot, or a humanoid platform, control theory is not optional background. It is the language in which the inner loops of the system are written, the mathematics in which safety arguments are framed, and the literature against which any new RL or learning-based method is benchmarked.

Control theory

Control theory

Why control theory matters for AI

A brief history

Foundational concepts

Major control paradigms

Estimation and state observers

Applications in robotics and AI

Connection with reinforcement learning

Software

Recent trends

Notable demonstrations

Connection to other engineering disciplines

References

Improve this article

Control theory

Why control theory matters for AI

A brief history

Foundational concepts

Major control paradigms

Estimation and state observers

Applications in robotics and AI

Connection with reinforcement learning

Software

Recent trends

Notable demonstrations

Connection to other engineering disciplines

References

Control theory

Why control theory matters for AI

A brief history

Foundational concepts

Major control paradigms

Estimation and state observers

Applications in robotics and AI

Connection with reinforcement learning

Software

Recent trends

Notable demonstrations

Connection to other engineering disciplines

References

Improve this article

Related Articles

Machine learning terms/Reinforcement Learning

AlphaGo

Robot

Bellman Equation

Markov Decision Process (MDP)

Embodied AI

Control theory

Why control theory matters for AI

A brief history

Foundational concepts

Major control paradigms

Estimation and state observers

Applications in robotics and AI

Connection with reinforcement learning

Software

Recent trends

Notable demonstrations

Connection to other engineering disciplines

References

Related Articles

Machine learning terms/Reinforcement Learning

AlphaGo

Robot

Bellman Equation

Markov Decision Process (MDP)

Embodied AI