A convex set is a subset of a Euclidean space (or more generally, a vector space) in which the line segment connecting any two points in the set lies entirely within the set. Convex sets are foundational objects in convex optimization, machine learning, and mathematical analysis. Their geometric simplicity leads to strong theoretical guarantees: optimization problems defined over convex sets avoid the pitfalls of local minima, making them computationally tractable.
A set C in R^n is convex if, for every pair of points x and y in C and every scalar λ with 0 ≤ λ ≤ 1, the point
λx + (1 − λ)y ∈ C
holds. In other words, the weighted average of any two points in the set also belongs to the set. The scalar λ traces out the line segment from y (at λ = 0) to x (at λ = 1), and the definition requires every point along that segment to remain inside C.
This definition extends naturally to convex combinations of more than two points. A point of the form
θ₁x₁ + θ₂x₂ + ... + θ_kx_k, where θ₁ + θ₂ + ... + θ_k = 1 and each θ_i ≥ 0
is called a convex combination of x₁, ..., x_k. A set is convex if and only if it contains all convex combinations of its points.
Convex sets appear throughout mathematics and applied fields. The table below lists the most commonly encountered families.
| Convex Set | Definition | Notes |
|---|---|---|
| Hyperplane | {x ∈ R^n : aᵀx = b}, where a ≠ 0 | A flat surface of dimension n − 1; divides R^n into two half-spaces |
| Half-space | {x ∈ R^n : aᵀx ≤ b} | The region on one side of a hyperplane; always convex and closed |
| Euclidean ball | {x ∈ R^n : ‖x − x₀‖₂ ≤ r} | A sphere and its interior centered at x₀ with radius r |
| Ellipsoid | {x : (x − x₀)ᵀ P⁻¹ (x − x₀) ≤ 1}, P symmetric positive definite | A stretched or compressed ball; reduces to a ball when P = r²I |
| Polyhedron | {x : Ax ≤ b, Cx = d} | Intersection of finitely many half-spaces and hyperplanes; includes polytopes |
| Norm ball | {x : ‖x − x₀‖ ≤ r} for any norm ‖·‖ | Convex for every valid norm (L1, L2, L-infinity, etc.) |
| Positive semidefinite (PSD) cone | {X ∈ S^n : X ≽ 0} (all eigenvalues ≥ 0) | A convex cone in the space of symmetric matrices; central to semidefinite programming |
| Second-order cone | {(x, t) ∈ R^n × R : ‖x‖₂ ≤ t} | Also called the Lorentz cone or ice-cream cone; used in SOCP |
| Affine subspace | {x : Ax = b} | Solution set of a system of linear equations; a special case of a polyhedron |
Several trivial but important cases deserve mention: the empty set, any single point, and the entire space R^n are all convex.
Recognizing non-convex sets is just as useful as knowing the convex ones. A set is non-convex whenever you can find two points inside it whose connecting line segment passes outside the set.
Common non-convex examples include:
The intersection of any collection of convex sets (finite or infinite) is itself convex. This property is used constantly in optimization: the feasible region of a problem with multiple convex constraints is the intersection of the individual constraint sets, and that intersection is guaranteed to be convex.
Note that the union of convex sets is generally not convex. For example, the union of two disjoint line segments is non-convex.
If C is convex, then both its closure (adding boundary points) and its interior (removing boundary points) are also convex. This is useful in analysis when switching between open and closed formulations of a problem.
Convexity is preserved under affine mappings. If C is convex and f(x) = Ax + b is an affine function, then the image f(C) = {Ax + b : x ∈ C} is convex. Likewise, the preimage f⁻¹(C) = {x : Ax + b ∈ C} is convex. Projections, linear transformations, and translations therefore all preserve convexity.
The Minkowski sum of two sets S₁ and S₂ is defined as S₁ + S₂ = {x₁ + x₂ : x₁ ∈ S₁, x₂ ∈ S₂}. If both S₁ and S₂ are convex, their Minkowski sum is also convex.
The convex hull of a set S, denoted conv(S), is the smallest convex set containing S. Equivalently, it is the set of all convex combinations of points in S:
conv(S) = {θ₁x₁ + ... + θ_kx_k : x_i ∈ S, θ_i ≥ 0, Σθ_i = 1}
It can also be described as the intersection of all convex sets that contain S.
Geometrically, the convex hull of a finite set of points in 2D is the shape formed by stretching a rubber band around the points and letting it snap tight. In computational geometry, efficient algorithms exist for computing convex hulls:
| Algorithm | Time Complexity | Description |
|---|---|---|
| Graham scan | O(n log n) | Sorts points by angle, then processes them in order |
| Jarvis march (gift wrapping) | O(nh) | Wraps around the point set; h = number of hull vertices |
| Divide and conquer | O(n log n) | Splits points, computes hulls recursively, then merges |
| Chan's algorithm | O(n log h) | Output-sensitive; optimal when h is small |
| Quickhull | O(n log n) average | Practical for higher-dimensional inputs |
The convex hull operation is important in machine learning for tasks such as outlier detection, shape analysis, and feature space visualization.
The supporting hyperplane theorem states that for any convex set C and any point x₀ on the boundary of C, there exists a hyperplane that passes through x₀ and has the entire set C on one side. Formally, there exists a nonzero vector a such that
aᵀx ≤ aᵀx₀ for all x ∈ C
This hyperplane "supports" the set at x₀, touching it without cutting into it. The theorem guarantees that at every boundary point of a convex set, you can place a flat surface tangent to the set. This result, originally due to Hermann Minkowski, is the geometric basis for duality theory in convex optimization.
The separating hyperplane theorem addresses two disjoint convex sets. If A and B are nonempty convex subsets of R^n with A ∩ B = ∅, then there exists a hyperplane that separates them. That is, there exists a nonzero vector v and a scalar c such that
vᵀx ≥ c for all x ∈ A, and vᵀy ≤ c for all y ∈ B
When additional conditions hold (for instance, both sets are closed and at least one is compact), a strict separating hyperplane exists with a gap between the two sets.
This theorem has direct practical applications. In support vector machines (SVMs), the goal is to find the maximum-margin separating hyperplane between two classes of data points, each forming a convex set of training examples. The separating hyperplane theorem guarantees that such a separator exists when the classes are linearly separable.
Convex sets are central to convex optimization, where the goal is to minimize a convex function over a convex feasible region. The key advantages come from the geometry of convex sets:
Common convex feasible regions in optimization include polyhedra (in linear programs), ellipsoids (in robust optimization), the PSD cone (in semidefinite programs), and second-order cones (in SOCP).
Convex sets appear throughout machine learning in several roles:
Convex sets and convex functions are closely linked through two connections:
Imagine you have a blob of play-dough on a table. Pick any two spots inside the play-dough and stretch a piece of string between them. If the string always stays inside the play-dough no matter which two spots you pick, then the play-dough shape is a convex set. A circle, a square, and a triangle all work this way.
Now imagine a crescent moon shape. If you pick one tip of the crescent and the other tip, the string between them goes through empty air outside the moon. That means the crescent is not a convex set.
In machine learning, convex sets are helpful because they make finding the best answer to a problem much easier. If the "search space" where you are looking for an answer is shaped like a nice convex blob, you will never get tricked into thinking you found the best answer when a better one is hiding somewhere else. That is why many ML algorithms are designed so that their search spaces are convex.