Convex Set

A convex set is a subset of a Euclidean space (or more generally, a vector space) in which the line segment connecting any two points in the set lies entirely within the set. Convex sets are foundational objects in convex optimization, machine learning, and mathematical analysis. Their geometric simplicity leads to strong theoretical guarantees: optimization problems defined over convex sets avoid the pitfalls of local minima, making them computationally tractable.

The study of convex sets sits at the crossroads of geometry, linear algebra, and analysis. Modern treatments trace back to the work of Hermann Minkowski in the late 19th century, the foundational text Convex Analysis by R. Tyrrell Rockafellar (Princeton, 1970), and the widely used graduate textbook Convex Optimization by Stephen Boyd and Lieven Vandenberghe (Cambridge, 2004). These references continue to shape how convex sets are taught and applied across mathematics, statistics, control theory, and machine learning.

mathematical definition

A set C in R^n is convex if, for every pair of points x and y in C and every scalar λ with 0 ≤ λ ≤ 1, the point

λx + (1 − λ)y ∈ C

holds. In other words, the weighted average of any two points in the set also belongs to the set. The scalar λ traces out the line segment from y (at λ = 0) to x (at λ = 1), and the definition requires every point along that segment to remain inside C.

This definition extends naturally to convex combinations of more than two points. A point of the form

θ₁x₁ + θ₂x₂ + ... + θ_kx_k, where θ₁ + θ₂ + ... + θ_k = 1 and each θ_i ≥ 0

is called a convex combination of x₁, ..., x_k. A set is convex if and only if it contains all convex combinations of its points. By induction on k, a two-point definition is enough to imply the general k-point version, which is one of the reasons the original line-segment definition is so compact.

The definition is purely algebraic: it depends only on addition and scalar multiplication, not on a metric or inner product. As a result, the concept of convexity carries over to any real or rational vector space and underpins much of convex analysis in infinite-dimensional settings such as Hilbert spaces and locally convex topological vector spaces.

examples of convex sets

Convex sets appear throughout mathematics and applied fields. The table below lists the most commonly encountered families.

Convex Set	Definition	Notes
Hyperplane	{x ∈ R^n : aᵀx = b}, where a ≠ 0	A flat surface of dimension n − 1; divides R^n into two half-spaces
Half-space	{x ∈ R^n : aᵀx ≤ b}	The region on one side of a hyperplane; always convex and closed
Euclidean ball	{x ∈ R^n : ‖x − x₀‖₂ ≤ r}	A sphere and its interior centered at x₀ with radius r
Ellipsoid	{x : (x − x₀)ᵀ P⁻¹ (x − x₀) ≤ 1}, P symmetric positive definite	A stretched or compressed ball; reduces to a ball when P = r²I
Polyhedron	{x : Ax ≤ b, Cx = d}	Intersection of finitely many half-spaces and hyperplanes; includes polytopes
Polytope	Bounded polyhedron, equivalently the convex hull of a finite point set	Studied extensively in linear programming and combinatorics
Norm ball	{x : ‖x − x₀‖ ≤ r} for any norm ‖·‖	Convex for every valid norm (L1, L2, L-infinity, etc.)
Probability simplex	{x ∈ R^n : x_i ≥ 0, Σx_i = 1}	The set of all discrete probability distributions on n outcomes
Standard simplex	conv({0, e₁, ..., e_n})	Convex hull of the origin and standard basis vectors
Positive semidefinite (PSD) cone	{X ∈ S^n : X ≽ 0} (all eigenvalues ≥ 0)	A convex cone in the space of symmetric matrices; central to semidefinite programming
Second-order cone	{(x, t) ∈ R^n × R : ‖x‖₂ ≤ t}	Also called the Lorentz cone or ice-cream cone; used in SOCP
Nonnegative orthant	{x ∈ R^n : x_i ≥ 0 for all i}	The natural feasible region for nonnegativity constraints
Affine subspace	{x : Ax = b}	Solution set of a system of linear equations; a special case of a polyhedron

Several trivial but important cases deserve mention: the empty set, any single point, and the entire space R^n are all convex. Lines, line segments, rays, and any subspace of R^n are also convex.

non-convex examples

Recognizing non-convex sets is just as useful as knowing the convex ones. A set is non-convex whenever you can find two points inside it whose connecting line segment passes outside the set.

Common non-convex examples include:

Annulus (ring shape): the region between two concentric circles. Two points on opposite sides of the inner hole can be connected by a segment that passes through the hole.
Star shape (non-convex polygon): a five-pointed star has dents where the line segment between two tips passes outside the star.
Union of disjoint intervals: on the real line, the set [0, 1] ∪ [2, 3] is non-convex because the segment between 0.5 and 2.5 includes points not in the set.
Discrete point sets: any finite set of two or more isolated points is non-convex.
Crescent (moon shape): the difference of two overlapping disks creates a concavity where segments can escape the set.
Punctured disk: a closed ball with a single interior point removed is not convex, because any chord through the missing point exits the set.
Graph of a non-convex function (epigraph counterexample): the region above the graph of sin(x) on the real line is non-convex.

properties of convex sets

intersection

The intersection of any collection of convex sets (finite or infinite) is itself convex. This property is used constantly in optimization: the feasible region of a problem with multiple convex constraints is the intersection of the individual constraint sets, and that intersection is guaranteed to be convex.

Note that the union of convex sets is generally not convex. For example, the union of two disjoint line segments is non-convex. A useful exception is when one set is contained in another: if A ⊆ B and both are convex, then A ∪ B = B is convex.

closure and interior

If C is convex, then both its closure (adding boundary points) and its interior (removing boundary points) are also convex. This is useful in analysis when switching between open and closed formulations of a problem. The relative interior of a convex set, defined as the interior taken inside the affine hull of the set, is also convex and plays a central role in convex analysis when the set has no full-dimensional interior in R^n.

affine transformations

Convexity is preserved under affine mappings. If C is convex and f(x) = Ax + b is an affine function, then the image f(C) = {Ax + b : x ∈ C} is convex. Likewise, the preimage f⁻¹(C) = {x : Ax + b ∈ C} is convex. Projections, linear transformations, and translations therefore all preserve convexity.

Minkowski sum

The Minkowski sum of two sets S₁ and S₂ is defined as S₁ + S₂ = {x₁ + x₂ : x₁ ∈ S₁, x₂ ∈ S₂}. If both S₁ and S₂ are convex, their Minkowski sum is also convex. This operation is used to model uncertainty (where the second set represents a perturbation), to compute reachable sets in control theory, and to define convolution structures in geometry.

perspective and linear-fractional maps

The perspective function P: R^(n+1) → R^n defined by P(z, t) = z / t with domain t > 0 preserves convexity in both directions. As a consequence, linear-fractional functions of the form f(x) = (Ax + b) / (cᵀx + d) on the half-space {x : cᵀx + d > 0} also preserve convexity. Boyd and Vandenberghe devote a section of their textbook to these maps because they appear in many engineering applications.

operations preserving convexity

A central practical question in convex analysis is: which operations on sets keep them convex? The table below summarizes the most useful operations and the conditions under which they preserve convexity.

Operation	Preserves Convexity?	Notes
Intersection	Yes, always	Works for any collection, finite or infinite
Union	No, in general	Exception: nested or chained collections
Affine image	Yes	f(x) = Ax + b maps convex to convex
Affine preimage	Yes	The preimage of a convex set under an affine map is convex
Cartesian product	Yes	C₁ × C₂ is convex if both C₁ and C₂ are convex
Minkowski sum	Yes	The pointwise sum of convex sets stays convex
Scalar multiplication	Yes	αC = {αx : x ∈ C} is convex for any α ∈ R
Closure	Yes	The closure of a convex set is convex
Interior	Yes	The interior of a convex set is convex
Perspective map	Yes	P(z, t) = z / t with t > 0
Linear-fractional map	Yes	(Ax + b) / (cᵀx + d) on its domain
Convex hull	Yes (always produces a convex set)	The smallest convex set containing the input

convex hull

The convex hull of a set S, denoted conv(S), is the smallest convex set containing S. Equivalently, it is the set of all convex combinations of points in S:

conv(S) = {θ₁x₁ + ... + θ_kx_k : x_i ∈ S, θ_i ≥ 0, Σθ_i = 1}

It can also be described as the intersection of all convex sets that contain S.

Geometrically, the convex hull of a finite set of points in 2D is the shape formed by stretching a rubber band around the points and letting it snap tight. In computational geometry, efficient algorithms exist for computing convex hulls:

Algorithm	Time Complexity	Description
Graham scan	O(n log n)	Sorts points by angle, then processes them in order
Jarvis march (gift wrapping)	O(nh)	Wraps around the point set; h = number of hull vertices
Divide and conquer	O(n log n)	Splits points, computes hulls recursively, then merges
Chan's algorithm	O(n log h)	Output-sensitive; optimal when h is small
Quickhull	O(n log n) average	Practical for higher-dimensional inputs
Kirkpatrick-Seidel	O(n log h)	The first ultimate convex hull algorithm in 2D

The convex hull operation is important in machine learning for tasks such as outlier detection, shape analysis, and feature space visualization. In computational geometry, the convex hull is used in collision detection, path planning, and geographic information systems.

Caratheodory's theorem

Caratheodory's theorem sharpens the convex hull construction. It states that any point in the convex hull of a set S ⊆ R^n can be written as a convex combination of at most n + 1 points of S. Formally, if x ∈ conv(S), then there exist points x₁, ..., x_(n+1) ∈ S and weights θ_i ≥ 0 with Σθ_i = 1 such that x = Σ θ_i x_i.

The bound n + 1 is tight: the simplex with n + 1 affinely independent vertices in R^n requires all n + 1 points to express its barycenter. Caratheodory's theorem is the basic dimensional result of convex geometry. It is what makes the simplex method for linear programming work efficiently in finite dimensions, since vertex solutions can always be described with at most n + 1 active points.

extreme points and the Krein-Milman theorem

A point x ∈ C is an extreme point of a convex set C if it cannot be written as a non-trivial convex combination of two distinct points of C. In other words, if x = λy + (1 − λ)z with y, z ∈ C and 0 < λ < 1, then y = z = x. Geometrically, extreme points are the corners of the set.

The Krein-Milman theorem (1940) states that any compact convex set in a Hausdorff locally convex topological vector space equals the closed convex hull of its extreme points. In finite dimensions, this means a compact convex set is fully determined by its corners. For polytopes, the extreme points are exactly the vertices, and Krein-Milman recovers the familiar fact that a polytope is the convex hull of its vertices.

The Krein-Milman theorem has direct optimization consequences: a continuous linear function attains its maximum and minimum over a compact convex set at extreme points. This is the geometric reason why the simplex method searches only vertices when solving a linear programming problem.

supporting hyperplane theorem

The supporting hyperplane theorem states that for any convex set C and any point x₀ on the boundary of C, there exists a hyperplane that passes through x₀ and has the entire set C on one side. Formally, there exists a nonzero vector a such that

aᵀx ≤ aᵀx₀ for all x ∈ C

This hyperplane supports the set at x₀, touching it without cutting into it. The theorem guarantees that at every boundary point of a convex set, you can place a flat surface tangent to the set. This result, originally due to Hermann Minkowski, is the geometric basis for duality theory in convex optimization. Lagrange multipliers and KKT conditions can be derived as algebraic shadows of supporting hyperplanes at the optimum.

separating hyperplane theorem

The separating hyperplane theorem addresses two disjoint convex sets. If A and B are nonempty convex subsets of R^n with A ∩ B = ∅, then there exists a hyperplane that separates them. That is, there exists a nonzero vector v and a scalar c such that

vᵀx ≥ c for all x ∈ A, and vᵀy ≤ c for all y ∈ B

When additional conditions hold (for instance, both sets are closed and at least one is compact), a strict separating hyperplane exists with a positive gap between the two sets.

This theorem has direct practical applications. In support vector machines (SVMs), the goal is to find the maximum-margin separating hyperplane between two classes of data points, each forming a convex set of training examples. The separating hyperplane theorem guarantees that such a separator exists when the classes are linearly separable. When the classes overlap, soft-margin SVMs use a relaxed formulation that still leans on the convex geometry of the underlying feasible set.

Helly, Radon, and Tverberg theorems

A family of classical results in convex geometry connects intersection patterns and combinatorial structure.

Helly's theorem (1913). Let X₁, X₂, ..., X_k be a finite collection of convex sets in R^n with k > n. If every n + 1 of these sets have a common point, then all k sets share a common point. The result also extends to infinite collections of compact convex sets. Helly's theorem connects a local property (every n + 1 sets intersect) to a global property (all sets intersect), which is rare in geometry. It plays a key role in convex feasibility analysis: when checking whether finitely many constraints have a feasible solution, it suffices to check intersection patterns on small subsets.

Radon's theorem (1921). Any n + 2 points in R^n can be partitioned into two sets whose convex hulls intersect. The intersection point is called a Radon point. For example, in the plane (n = 2), any four points can either be split into a triangle that contains the fourth point, or into two pairs whose connecting line segments cross. Radon's theorem is the standard ingredient in one of the cleanest proofs of Helly's theorem.

Tverberg's theorem (1966). A generalization of Radon's theorem due to Helge Tverberg states that any (n + 1)(r − 1) + 1 points in R^n can be partitioned into r subsets whose convex hulls all share a common point. This recovers Radon's theorem when r = 2.

These theorems are central to combinatorial geometry. They also appear in computational learning theory: Radon's theorem can be used to compute the VC dimension of linear classifiers in R^n, which is exactly n + 1.

convex cones

A convex cone is a set K in a vector space such that for every x, y ∈ K and every α, β ≥ 0, the point αx + βy belongs to K. Equivalently, K is a convex cone if it is closed under nonnegative linear combinations. Convex cones are convex sets, but they have additional scaling structure that makes them useful in optimization and duality theory.

important named cones

Cone	Definition	Role
Nonnegative orthant R^n_+	{x ∈ R^n : x_i ≥ 0}	Feasible region for nonnegativity constraints in linear programming
Second-order (Lorentz) cone	{(x, t) : ‖x‖₂ ≤ t}	Feasible region for second-order cone programs (SOCP)
Positive semidefinite cone S^n_+	{X ∈ S^n : X ≽ 0}	Feasible region for semidefinite programming
Polyhedral cone	{x : Ax ≤ 0}	Solution set of homogeneous linear inequalities
Tangent cone T_C(x)	Closure of all directions feasible from x in C	Used in optimality conditions
Normal cone N_C(x)	{v : vᵀ(y − x) ≤ 0 for all y ∈ C}	Dual object for variational inequalities

The nonnegative orthant, the second-order cone, and the PSD cone are all self-dual under their canonical inner products, which makes the optimization problems built on them especially clean.

dual cone

For a cone K ⊆ R^n, the dual cone is defined as K* = {y ∈ R^n : yᵀx ≥ 0 for all x ∈ K}. The dual cone K* is always closed and convex, even when K is not. If K is a closed convex cone, then K** = K, recovering the original cone by double duality. Dual cones are at the heart of conic duality theory in convex optimization, where the dual of a conic program is itself a conic program over the dual cone.

normal cone

Given a closed convex set C and a point x ∈ C, the normal cone to C at x is defined as N_C(x) = {v : vᵀ(y − x) ≤ 0 for all y ∈ C}. The normal cone collects all directions that point away from C at x. Normal cones characterize first-order optimality conditions: a point x* minimizes a convex function f over a convex set C if and only if 0 ∈ ∂f(x*) + N_C(x*), where ∂f is the subdifferential of f.

convex sets in optimization

Convex sets are central to convex optimization, where the goal is to minimize a convex function over a convex feasible region. The key advantages come from the geometry of convex sets:

No local minima traps. in a convex optimization problem, every local minimum is a global minimum. This eliminates the need for global search heuristics.
Feasible region structure. since the feasible region is an intersection of convex constraint sets, it is itself convex. Algorithms such as interior-point methods and projected gradient descent can exploit this structure to find solutions efficiently.
Strong duality. under mild conditions (such as Slater's constraint qualification), convex optimization problems satisfy strong duality, meaning the primal and dual optimal values coincide. The supporting hyperplane theorem provides the geometric intuition behind this duality.
Polynomial-time solvability. many classes of convex optimization problems, including linear programming, quadratic programming, and semidefinite programming, can be solved in polynomial time.

Common convex feasible regions in optimization include polyhedra (in linear programs), ellipsoids (in robust optimization), the PSD cone (in semidefinite programs), and second-order cones (in SOCP).

conic optimization

Conic optimization is a unifying framework that minimizes a linear objective over the intersection of an affine subspace and a convex cone. It includes linear programming (with the nonnegative orthant), second-order cone programming, and semidefinite programming as special cases. Conic optimization problems can be solved efficiently with interior-point methods, and they form the practical core of many modern solvers such as MOSEK, ECOS, and SCS. Boyd's CVX modeling system, widely used in research and teaching, builds problems by combining convex sets and operations that preserve convexity.

projection onto a convex set

Given a closed convex set C and a point z ∈ R^n, the projection of z onto C is the unique point P_C(z) ∈ C closest to z in Euclidean distance. The projection map P_C is well defined, single valued, and non-expansive, meaning ‖P_C(x) − P_C(y)‖ ≤ ‖x − y‖ for all x, y.

Projection is the key building block of projected gradient descent: at each step, the algorithm takes a gradient step and then projects the iterate back into the feasible set. When projection onto C has a closed-form solution (as for boxes, balls, or the simplex), projected gradient methods are extremely efficient.

projections onto convex sets (POCS)

The projections onto convex sets (POCS) algorithm, also called the alternating projection method, finds a point in the intersection of two or more closed convex sets by repeatedly projecting onto each set in turn. For two convex sets with non-empty intersection, classical results show that POCS converges linearly to a point in the intersection. POCS underlies many practical algorithms for image reconstruction, signal recovery, and constrained learning. Variants such as Dykstra's algorithm modify the iterations so that the limit is the projection of the starting point onto the intersection rather than an arbitrary feasible point.

convex sets in machine learning

Convex sets appear throughout machine learning in several roles:

Feasible regions for model parameters. in regularization techniques such as Lasso (L1) and Ridge (L2), the constraint set for the parameter vector is a convex norm ball. Lasso constrains parameters to an L1 ball (a diamond shape in 2D), while Ridge uses an L2 ball (a circle in 2D). The corners of the L1 ball encourage sparse solutions, which is the geometric reason Lasso produces models with many zero coefficients.
SVM classification. Support vector machines find the optimal separating hyperplane between convex hulls of data points from different classes. The dual SVM formulation is itself a convex quadratic program over a simplex-like polytope.
Loss function level sets. for convex loss functions such as cross-entropy and squared error, the sublevel sets {x : f(x) ≤ α} are convex sets, ensuring well-behaved optimization landscapes.
Probability simplex constraints. the probability simplex {x : x_i ≥ 0, Σx_i = 1} is the natural feasible set for categorical probabilities, mixture model weights, and discrete policies in reinforcement learning. Methods such as exponentiated gradient descent and entropic mirror descent are tailored to optimization over the simplex.
Constraint sets in constrained optimization. many ML formulations add constraints (non-negativity, probability simplex, bounded parameters) that define convex feasible regions.
Convex relaxations. when the original problem is non-convex (as in many deep learning settings), researchers often study convex relaxations that replace non-convex constraint sets with their convex hulls to obtain tractable approximations. Linear programming relaxations of integer programs, semidefinite relaxations of MAXCUT, and nuclear-norm relaxations of low-rank matrix recovery are all convex-set replacements for non-convex feasible regions.
Online learning. in online convex optimization, the learner repeatedly chooses a point from a convex set while suffering convex losses chosen by an adversary. Algorithms such as online gradient descent and follow-the-regularized-leader exploit convexity of the decision set to obtain regret bounds that grow only like the square root of the number of rounds.

convex sets and neural network verification

Deep neural networks built from ReLU activations partition input space into a finite collection of polytopes on which the network acts as an affine map. This polytope structure is itself a convex-set object: each linear region is a convex polyhedron, and the global behavior of the network is determined by how these convex pieces fit together. Researchers in robustness and verification use convex relaxations of ReLU outputs, such as triangle relaxations and zonotope abstractions, to certify that a network's prediction is stable on a convex input region. Recent results from 2024 to 2026 have characterized when ReLU networks are themselves convex, mapped out the limits of multi-neuron convex relaxations, and shown that two-layer ReLU networks admit polynomial-time convex training formulations under certain conditions.

convex sets in reinforcement learning

In reinforcement learning, the set of policies for a fixed Markov decision process can often be described as a convex set, and the set of state-action visitation distributions for a finite MDP forms a polytope. Linear programming formulations of MDPs exploit this polytope structure, and dual variables in these LPs correspond to value functions. The convex geometry of the visitation polytope underpins occupancy-measure methods, dual reinforcement learning, and convex regularization of policies.

relationship to convex functions

Convex sets and convex functions are closely linked through three connections:

Epigraph characterization. a function f is convex if and only if its epigraph, defined as {(x, t) : f(x) ≤ t}, is a convex set. This allows results about convex sets to be translated into results about convex functions and vice versa.
Sublevel sets. if f is a convex function, then every sublevel set {x : f(x) ≤ α} is a convex set. The converse does not hold: a function can have convex sublevel sets without being convex; such functions are called quasiconvex.
Domain requirement. by definition, a convex function must have a convex domain. The function's convexity inequality, f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y), only makes sense when the domain is convex, so that the left-hand side is well-defined.

These links explain why textbooks like Boyd and Vandenberghe spend an entire chapter on convex sets before introducing convex functions. The geometry of the set comes first; the inequality structure of the function follows.

brief history

The systematic study of convex sets began with Hermann Minkowski in the 1890s. His work on the geometry of numbers led to convex bodies, supporting hyperplanes, and Minkowski sums. Constantin Caratheodory's 1907 paper introduced what is now known as Caratheodory's theorem on convex hulls. Eduard Helly proved his intersection theorem in 1913, with proofs published over the next decade by Radon and others. Werner Fenchel's lectures at Princeton in 1951 laid out duality between convex sets and convex functions in a form that influenced Rockafellar's 1970 textbook Convex Analysis. Modern convex optimization, as taught in courses like Stanford's CVX-101 led by Stephen Boyd, brings these classical results into a computational framework grounded in numerical algorithms and software.

explain like i'm 5 (eli5)

Imagine you have a blob of play-dough on a table. Pick any two spots inside the play-dough and stretch a piece of string between them. If the string always stays inside the play-dough no matter which two spots you pick, then the play-dough shape is a convex set. A circle, a square, and a triangle all work this way.

Now imagine a crescent moon shape. If you pick one tip of the crescent and the other tip, the string between them goes through empty air outside the moon. That means the crescent is not a convex set.

In machine learning, convex sets are helpful because they make finding the best answer to a problem much easier. If the search space where you are looking for an answer is shaped like a nice convex blob, you will never get tricked into thinking you found the best answer when a better one is hiding somewhere else. That is why many ML algorithms are designed so that their search spaces are convex.

references

Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press. Chapter 2: Convex Sets.
Rockafellar, R. T. (1970). Convex Analysis. Princeton University Press.
"Convex set." Wikipedia. https://en.wikipedia.org/wiki/Convex_set
"Hyperplane separation theorem." Wikipedia. https://en.wikipedia.org/wiki/Hyperplane_separation_theorem
"Supporting hyperplane." Wikipedia. https://en.wikipedia.org/wiki/Supporting_hyperplane
Tibshirani, R. (2013). Convex Optimization, Lecture Notes. Carnegie Mellon University. https://www.stat.cmu.edu/~ryantibs/convexopt-F13/scribes/lec4.pdf
Vandenberghe, L. EE236B Lecture Notes: Convex Sets. UCLA. http://seas.ucla.edu/~vandenbe/ee236b/lectures/sets.pdf
Bertsekas, D. P. (2009). Convex Optimization Theory. Athena Scientific.
"Convex hull." Wikipedia. https://en.wikipedia.org/wiki/Convex_hull
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
"Carathéodory's theorem (convex hull)." Wikipedia. https://en.wikipedia.org/wiki/Carath%C3%A9odory's_theorem_(convex_hull)
"Helly's theorem." Wikipedia. https://en.wikipedia.org/wiki/Helly's_theorem
"Radon's theorem." Wikipedia. https://en.wikipedia.org/wiki/Radon's_theorem
"Krein-Milman theorem." Wikipedia. https://en.wikipedia.org/wiki/Krein%E2%80%93Milman_theorem
"Convex cone." Wikipedia. https://en.wikipedia.org/wiki/Convex_cone
"Conic optimization." Wikipedia. https://en.wikipedia.org/wiki/Conic_optimization
"Projections onto convex sets." Wikipedia. https://en.wikipedia.org/wiki/Projections_onto_convex_sets
Chok, J., & Vasil, G. M. (2025). Convex optimization over a probability simplex. Journal of Machine Learning Research, 26. https://www.jmlr.org/papers/volume26/23-1166/23-1166.pdf
Pilanci, M., & Ergen, T. (2020). Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks. International Conference on Machine Learning. https://arxiv.org/abs/2002.10553
Ferrari-Trecate, G., et al. (2025). Convexity in ReLU neural networks: Beyond ICNNs? Journal of Mathematical Imaging and Vision. https://link.springer.com/article/10.1007/s10851-025-01253-x
Boyd, S. CVX-101: Stanford Convex Optimization Course. https://web.stanford.edu/class/ee364a/
MIT OpenCourseWare. 6.253 Convex Analysis and Optimization, Spring 2012. https://ocw.mit.edu/courses/6-253-convex-analysis-and-optimization-spring-2012/

mathematical definition

examples of convex sets

non-convex examples

properties of convex sets

intersection

closure and interior

affine transformations

Minkowski sum

perspective and linear-fractional maps

operations preserving convexity

convex hull

Caratheodory's theorem

extreme points and the Krein-Milman theorem

supporting hyperplane theorem

separating hyperplane theorem

Helly, Radon, and Tverberg theorems

convex cones

important named cones

dual cone

normal cone

convex sets in optimization

conic optimization

projection onto a convex set

projections onto convex sets (POCS)

convex sets in machine learning

convex sets and neural network verification

convex sets in reinforcement learning

relationship to convex functions

brief history

explain like i'm 5 (eli5)

see also

references

Improve this article

Related Articles

ARC-AGI 2

Convergence

Convex Function

Convex Optimization

Gradient

Bellman Equation

mathematical definition

examples of convex sets

non-convex examples

properties of convex sets

intersection

closure and interior

affine transformations

Minkowski sum

perspective and linear-fractional maps

operations preserving convexity

convex hull

Caratheodory's theorem

extreme points and the Krein-Milman theorem

supporting hyperplane theorem

separating hyperplane theorem

Helly, Radon, and Tverberg theorems

convex cones

important named cones

dual cone

normal cone

convex sets in optimization

conic optimization

projection onto a convex set

projections onto convex sets (POCS)

convex sets in machine learning

convex sets and neural network verification

convex sets in reinforcement learning

relationship to convex functions

brief history

explain like i'm 5 (eli5)

see also

references

Related Articles

ARC-AGI 2

Convergence

Convex Function

Convex Optimization

Gradient

Bellman Equation