Convex Set
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v4 · 4,935 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v4 · 4,935 words
Add missing citations, update stale details, or suggest a clearer explanation.
A convex set is a subset of a Euclidean space (or more generally, a vector space) in which the line segment connecting any two points in the set lies entirely within the set. Convex sets are foundational objects in convex optimization, machine learning, and mathematical analysis. Their geometric simplicity leads to strong theoretical guarantees: optimization problems defined over convex sets avoid the pitfalls of local minima, making them computationally tractable.
The study of convex sets sits at the crossroads of geometry, linear algebra, and analysis. Modern treatments trace back to the work of Hermann Minkowski in the late 19th century, the foundational text Convex Analysis by R. Tyrrell Rockafellar (Princeton, 1970), and the widely used graduate textbook Convex Optimization by Stephen Boyd and Lieven Vandenberghe (Cambridge, 2004). These references continue to shape how convex sets are taught and applied across mathematics, statistics, control theory, and machine learning.
A set C in R^n is convex if, for every pair of points x and y in C and every scalar λ with 0 ≤ λ ≤ 1, the point
λx + (1 − λ)y ∈ C
holds. In other words, the weighted average of any two points in the set also belongs to the set. The scalar λ traces out the line segment from y (at λ = 0) to x (at λ = 1), and the definition requires every point along that segment to remain inside C.
This definition extends naturally to convex combinations of more than two points. A point of the form
θ₁x₁ + θ₂x₂ + ... + θ_kx_k, where θ₁ + θ₂ + ... + θ_k = 1 and each θ_i ≥ 0
is called a convex combination of x₁, ..., x_k. A set is convex if and only if it contains all convex combinations of its points. By induction on k, a two-point definition is enough to imply the general k-point version, which is one of the reasons the original line-segment definition is so compact.
The definition is purely algebraic: it depends only on addition and scalar multiplication, not on a metric or inner product. As a result, the concept of convexity carries over to any real or rational vector space and underpins much of convex analysis in infinite-dimensional settings such as Hilbert spaces and locally convex topological vector spaces.
Convex sets appear throughout mathematics and applied fields. The table below lists the most commonly encountered families.
| Convex Set | Definition | Notes |
|---|---|---|
| Hyperplane | {x ∈ R^n : aᵀx = b}, where a ≠ 0 | A flat surface of dimension n − 1; divides R^n into two half-spaces |
| Half-space | {x ∈ R^n : aᵀx ≤ b} | The region on one side of a hyperplane; always convex and closed |
| Euclidean ball | {x ∈ R^n : ‖x − x₀‖₂ ≤ r} | A sphere and its interior centered at x₀ with radius r |
| Ellipsoid | {x : (x − x₀)ᵀ P⁻¹ (x − x₀) ≤ 1}, P symmetric positive definite | A stretched or compressed ball; reduces to a ball when P = r²I |
| Polyhedron | {x : Ax ≤ b, Cx = d} | Intersection of finitely many half-spaces and hyperplanes; includes polytopes |
| Polytope | Bounded polyhedron, equivalently the convex hull of a finite point set | Studied extensively in linear programming and combinatorics |
| Norm ball | {x : ‖x − x₀‖ ≤ r} for any norm ‖·‖ | Convex for every valid norm (L1, L2, L-infinity, etc.) |
| Probability simplex | {x ∈ R^n : x_i ≥ 0, Σx_i = 1} | The set of all discrete probability distributions on n outcomes |
| Standard simplex | conv({0, e₁, ..., e_n}) | Convex hull of the origin and standard basis vectors |
| Positive semidefinite (PSD) cone | {X ∈ S^n : X ≽ 0} (all eigenvalues ≥ 0) | A convex cone in the space of symmetric matrices; central to semidefinite programming |
| Second-order cone | {(x, t) ∈ R^n × R : ‖x‖₂ ≤ t} | Also called the Lorentz cone or ice-cream cone; used in SOCP |
| Nonnegative orthant | {x ∈ R^n : x_i ≥ 0 for all i} | The natural feasible region for nonnegativity constraints |
| Affine subspace | {x : Ax = b} | Solution set of a system of linear equations; a special case of a polyhedron |
Several trivial but important cases deserve mention: the empty set, any single point, and the entire space R^n are all convex. Lines, line segments, rays, and any subspace of R^n are also convex.
Recognizing non-convex sets is just as useful as knowing the convex ones. A set is non-convex whenever you can find two points inside it whose connecting line segment passes outside the set.
Common non-convex examples include:
The intersection of any collection of convex sets (finite or infinite) is itself convex. This property is used constantly in optimization: the feasible region of a problem with multiple convex constraints is the intersection of the individual constraint sets, and that intersection is guaranteed to be convex.
Note that the union of convex sets is generally not convex. For example, the union of two disjoint line segments is non-convex. A useful exception is when one set is contained in another: if A ⊆ B and both are convex, then A ∪ B = B is convex.
If C is convex, then both its closure (adding boundary points) and its interior (removing boundary points) are also convex. This is useful in analysis when switching between open and closed formulations of a problem. The relative interior of a convex set, defined as the interior taken inside the affine hull of the set, is also convex and plays a central role in convex analysis when the set has no full-dimensional interior in R^n.
Convexity is preserved under affine mappings. If C is convex and f(x) = Ax + b is an affine function, then the image f(C) = {Ax + b : x ∈ C} is convex. Likewise, the preimage f⁻¹(C) = {x : Ax + b ∈ C} is convex. Projections, linear transformations, and translations therefore all preserve convexity.
The Minkowski sum of two sets S₁ and S₂ is defined as S₁ + S₂ = {x₁ + x₂ : x₁ ∈ S₁, x₂ ∈ S₂}. If both S₁ and S₂ are convex, their Minkowski sum is also convex. This operation is used to model uncertainty (where the second set represents a perturbation), to compute reachable sets in control theory, and to define convolution structures in geometry.
The perspective function P: R^(n+1) → R^n defined by P(z, t) = z / t with domain t > 0 preserves convexity in both directions. As a consequence, linear-fractional functions of the form f(x) = (Ax + b) / (cᵀx + d) on the half-space {x : cᵀx + d > 0} also preserve convexity. Boyd and Vandenberghe devote a section of their textbook to these maps because they appear in many engineering applications.
A central practical question in convex analysis is: which operations on sets keep them convex? The table below summarizes the most useful operations and the conditions under which they preserve convexity.
| Operation | Preserves Convexity? | Notes |
|---|---|---|
| Intersection | Yes, always | Works for any collection, finite or infinite |
| Union | No, in general | Exception: nested or chained collections |
| Affine image | Yes | f(x) = Ax + b maps convex to convex |
| Affine preimage | Yes | The preimage of a convex set under an affine map is convex |
| Cartesian product | Yes | C₁ × C₂ is convex if both C₁ and C₂ are convex |
| Minkowski sum | Yes | The pointwise sum of convex sets stays convex |
| Scalar multiplication | Yes | αC = {αx : x ∈ C} is convex for any α ∈ R |
| Closure | Yes | The closure of a convex set is convex |
| Interior | Yes | The interior of a convex set is convex |
| Perspective map | Yes | P(z, t) = z / t with t > 0 |
| Linear-fractional map | Yes | (Ax + b) / (cᵀx + d) on its domain |
| Convex hull | Yes (always produces a convex set) | The smallest convex set containing the input |
The convex hull of a set S, denoted conv(S), is the smallest convex set containing S. Equivalently, it is the set of all convex combinations of points in S:
conv(S) = {θ₁x₁ + ... + θ_kx_k : x_i ∈ S, θ_i ≥ 0, Σθ_i = 1}
It can also be described as the intersection of all convex sets that contain S.
Geometrically, the convex hull of a finite set of points in 2D is the shape formed by stretching a rubber band around the points and letting it snap tight. In computational geometry, efficient algorithms exist for computing convex hulls:
| Algorithm | Time Complexity | Description |
|---|---|---|
| Graham scan | O(n log n) | Sorts points by angle, then processes them in order |
| Jarvis march (gift wrapping) | O(nh) | Wraps around the point set; h = number of hull vertices |
| Divide and conquer | O(n log n) | Splits points, computes hulls recursively, then merges |
| Chan's algorithm | O(n log h) | Output-sensitive; optimal when h is small |
| Quickhull | O(n log n) average | Practical for higher-dimensional inputs |
| Kirkpatrick-Seidel | O(n log h) | The first ultimate convex hull algorithm in 2D |
The convex hull operation is important in machine learning for tasks such as outlier detection, shape analysis, and feature space visualization. In computational geometry, the convex hull is used in collision detection, path planning, and geographic information systems.
Caratheodory's theorem sharpens the convex hull construction. It states that any point in the convex hull of a set S ⊆ R^n can be written as a convex combination of at most n + 1 points of S. Formally, if x ∈ conv(S), then there exist points x₁, ..., x_(n+1) ∈ S and weights θ_i ≥ 0 with Σθ_i = 1 such that x = Σ θ_i x_i.
The bound n + 1 is tight: the simplex with n + 1 affinely independent vertices in R^n requires all n + 1 points to express its barycenter. Caratheodory's theorem is the basic dimensional result of convex geometry. It is what makes the simplex method for linear programming work efficiently in finite dimensions, since vertex solutions can always be described with at most n + 1 active points.
A point x ∈ C is an extreme point of a convex set C if it cannot be written as a non-trivial convex combination of two distinct points of C. In other words, if x = λy + (1 − λ)z with y, z ∈ C and 0 < λ < 1, then y = z = x. Geometrically, extreme points are the corners of the set.
The Krein-Milman theorem (1940) states that any compact convex set in a Hausdorff locally convex topological vector space equals the closed convex hull of its extreme points. In finite dimensions, this means a compact convex set is fully determined by its corners. For polytopes, the extreme points are exactly the vertices, and Krein-Milman recovers the familiar fact that a polytope is the convex hull of its vertices.
The Krein-Milman theorem has direct optimization consequences: a continuous linear function attains its maximum and minimum over a compact convex set at extreme points. This is the geometric reason why the simplex method searches only vertices when solving a linear programming problem.
The supporting hyperplane theorem states that for any convex set C and any point x₀ on the boundary of C, there exists a hyperplane that passes through x₀ and has the entire set C on one side. Formally, there exists a nonzero vector a such that
aᵀx ≤ aᵀx₀ for all x ∈ C
This hyperplane supports the set at x₀, touching it without cutting into it. The theorem guarantees that at every boundary point of a convex set, you can place a flat surface tangent to the set. This result, originally due to Hermann Minkowski, is the geometric basis for duality theory in convex optimization. Lagrange multipliers and KKT conditions can be derived as algebraic shadows of supporting hyperplanes at the optimum.
The separating hyperplane theorem addresses two disjoint convex sets. If A and B are nonempty convex subsets of R^n with A ∩ B = ∅, then there exists a hyperplane that separates them. That is, there exists a nonzero vector v and a scalar c such that
vᵀx ≥ c for all x ∈ A, and vᵀy ≤ c for all y ∈ B
When additional conditions hold (for instance, both sets are closed and at least one is compact), a strict separating hyperplane exists with a positive gap between the two sets.
This theorem has direct practical applications. In support vector machines (SVMs), the goal is to find the maximum-margin separating hyperplane between two classes of data points, each forming a convex set of training examples. The separating hyperplane theorem guarantees that such a separator exists when the classes are linearly separable. When the classes overlap, soft-margin SVMs use a relaxed formulation that still leans on the convex geometry of the underlying feasible set.
A family of classical results in convex geometry connects intersection patterns and combinatorial structure.
Helly's theorem (1913). Let X₁, X₂, ..., X_k be a finite collection of convex sets in R^n with k > n. If every n + 1 of these sets have a common point, then all k sets share a common point. The result also extends to infinite collections of compact convex sets. Helly's theorem connects a local property (every n + 1 sets intersect) to a global property (all sets intersect), which is rare in geometry. It plays a key role in convex feasibility analysis: when checking whether finitely many constraints have a feasible solution, it suffices to check intersection patterns on small subsets.
Radon's theorem (1921). Any n + 2 points in R^n can be partitioned into two sets whose convex hulls intersect. The intersection point is called a Radon point. For example, in the plane (n = 2), any four points can either be split into a triangle that contains the fourth point, or into two pairs whose connecting line segments cross. Radon's theorem is the standard ingredient in one of the cleanest proofs of Helly's theorem.
Tverberg's theorem (1966). A generalization of Radon's theorem due to Helge Tverberg states that any (n + 1)(r − 1) + 1 points in R^n can be partitioned into r subsets whose convex hulls all share a common point. This recovers Radon's theorem when r = 2.
These theorems are central to combinatorial geometry. They also appear in computational learning theory: Radon's theorem can be used to compute the VC dimension of linear classifiers in R^n, which is exactly n + 1.
A convex cone is a set K in a vector space such that for every x, y ∈ K and every α, β ≥ 0, the point αx + βy belongs to K. Equivalently, K is a convex cone if it is closed under nonnegative linear combinations. Convex cones are convex sets, but they have additional scaling structure that makes them useful in optimization and duality theory.
| Cone | Definition | Role |
|---|---|---|
| Nonnegative orthant R^n_+ | {x ∈ R^n : x_i ≥ 0} | Feasible region for nonnegativity constraints in linear programming |
| Second-order (Lorentz) cone | {(x, t) : ‖x‖₂ ≤ t} | Feasible region for second-order cone programs (SOCP) |
| Positive semidefinite cone S^n_+ | {X ∈ S^n : X ≽ 0} | Feasible region for semidefinite programming |
| Polyhedral cone | {x : Ax ≤ 0} | Solution set of homogeneous linear inequalities |
| Tangent cone T_C(x) | Closure of all directions feasible from x in C | Used in optimality conditions |
| Normal cone N_C(x) | {v : vᵀ(y − x) ≤ 0 for all y ∈ C} | Dual object for variational inequalities |
The nonnegative orthant, the second-order cone, and the PSD cone are all self-dual under their canonical inner products, which makes the optimization problems built on them especially clean.
For a cone K ⊆ R^n, the dual cone is defined as K* = {y ∈ R^n : yᵀx ≥ 0 for all x ∈ K}. The dual cone K* is always closed and convex, even when K is not. If K is a closed convex cone, then K** = K, recovering the original cone by double duality. Dual cones are at the heart of conic duality theory in convex optimization, where the dual of a conic program is itself a conic program over the dual cone.
Given a closed convex set C and a point x ∈ C, the normal cone to C at x is defined as N_C(x) = {v : vᵀ(y − x) ≤ 0 for all y ∈ C}. The normal cone collects all directions that point away from C at x. Normal cones characterize first-order optimality conditions: a point x* minimizes a convex function f over a convex set C if and only if 0 ∈ ∂f(x*) + N_C(x*), where ∂f is the subdifferential of f.
Convex sets are central to convex optimization, where the goal is to minimize a convex function over a convex feasible region. The key advantages come from the geometry of convex sets:
Common convex feasible regions in optimization include polyhedra (in linear programs), ellipsoids (in robust optimization), the PSD cone (in semidefinite programs), and second-order cones (in SOCP).
Conic optimization is a unifying framework that minimizes a linear objective over the intersection of an affine subspace and a convex cone. It includes linear programming (with the nonnegative orthant), second-order cone programming, and semidefinite programming as special cases. Conic optimization problems can be solved efficiently with interior-point methods, and they form the practical core of many modern solvers such as MOSEK, ECOS, and SCS. Boyd's CVX modeling system, widely used in research and teaching, builds problems by combining convex sets and operations that preserve convexity.
Given a closed convex set C and a point z ∈ R^n, the projection of z onto C is the unique point P_C(z) ∈ C closest to z in Euclidean distance. The projection map P_C is well defined, single valued, and non-expansive, meaning ‖P_C(x) − P_C(y)‖ ≤ ‖x − y‖ for all x, y.
Projection is the key building block of projected gradient descent: at each step, the algorithm takes a gradient step and then projects the iterate back into the feasible set. When projection onto C has a closed-form solution (as for boxes, balls, or the simplex), projected gradient methods are extremely efficient.
The projections onto convex sets (POCS) algorithm, also called the alternating projection method, finds a point in the intersection of two or more closed convex sets by repeatedly projecting onto each set in turn. For two convex sets with non-empty intersection, classical results show that POCS converges linearly to a point in the intersection. POCS underlies many practical algorithms for image reconstruction, signal recovery, and constrained learning. Variants such as Dykstra's algorithm modify the iterations so that the limit is the projection of the starting point onto the intersection rather than an arbitrary feasible point.
Convex sets appear throughout machine learning in several roles:
Deep neural networks built from ReLU activations partition input space into a finite collection of polytopes on which the network acts as an affine map. This polytope structure is itself a convex-set object: each linear region is a convex polyhedron, and the global behavior of the network is determined by how these convex pieces fit together. Researchers in robustness and verification use convex relaxations of ReLU outputs, such as triangle relaxations and zonotope abstractions, to certify that a network's prediction is stable on a convex input region. Recent results from 2024 to 2026 have characterized when ReLU networks are themselves convex, mapped out the limits of multi-neuron convex relaxations, and shown that two-layer ReLU networks admit polynomial-time convex training formulations under certain conditions.
In reinforcement learning, the set of policies for a fixed Markov decision process can often be described as a convex set, and the set of state-action visitation distributions for a finite MDP forms a polytope. Linear programming formulations of MDPs exploit this polytope structure, and dual variables in these LPs correspond to value functions. The convex geometry of the visitation polytope underpins occupancy-measure methods, dual reinforcement learning, and convex regularization of policies.
Convex sets and convex functions are closely linked through three connections:
These links explain why textbooks like Boyd and Vandenberghe spend an entire chapter on convex sets before introducing convex functions. The geometry of the set comes first; the inequality structure of the function follows.
The systematic study of convex sets began with Hermann Minkowski in the 1890s. His work on the geometry of numbers led to convex bodies, supporting hyperplanes, and Minkowski sums. Constantin Caratheodory's 1907 paper introduced what is now known as Caratheodory's theorem on convex hulls. Eduard Helly proved his intersection theorem in 1913, with proofs published over the next decade by Radon and others. Werner Fenchel's lectures at Princeton in 1951 laid out duality between convex sets and convex functions in a form that influenced Rockafellar's 1970 textbook Convex Analysis. Modern convex optimization, as taught in courses like Stanford's CVX-101 led by Stephen Boyd, brings these classical results into a computational framework grounded in numerical algorithms and software.
Imagine you have a blob of play-dough on a table. Pick any two spots inside the play-dough and stretch a piece of string between them. If the string always stays inside the play-dough no matter which two spots you pick, then the play-dough shape is a convex set. A circle, a square, and a triangle all work this way.
Now imagine a crescent moon shape. If you pick one tip of the crescent and the other tip, the string between them goes through empty air outside the moon. That means the crescent is not a convex set.
In machine learning, convex sets are helpful because they make finding the best answer to a problem much easier. If the search space where you are looking for an answer is shaped like a nice convex blob, you will never get tricked into thinking you found the best answer when a better one is hiding somewhere else. That is why many ML algorithms are designed so that their search spaces are convex.