Landmarks

Landmarks are reference points used as anchors in two largely separate areas of machine learning. In manifold learning and dimension reduction, landmarks are a small subset of data points selected to represent the geometry of a much larger dataset, so that expensive operations on a full N x N distance or kernel matrix can be replaced by cheaper operations on an N x L matrix with L much smaller than N. In computer vision, landmarks are predefined semantic points on objects, faces, hands, or bodies, for example the corners of the eyes, the tip of the nose, or the joints of a finger. The two uses share the same underlying intuition, namely that a sparse set of well-chosen anchor points can summarize a structure that is too expensive to model in full, but the algorithms, datasets, and applications are otherwise distinct.

This article covers both meanings. The first half explains landmark-based dimension reduction, including Landmark MDS and the Nystrom approximation. The second half covers facial, body, and hand landmarks in computer vision, the canonical landmark sets used by datasets and libraries, and the major detection methods.

A brief disambiguation

The word landmark shows up in machine learning in three related senses.

Sense	What it means	Typical context
Landmarks in dimension reduction	A small subset of data points used to approximate distances or kernel evaluations on the full dataset	Landmark MDS, Landmark Isomap, Nystrom approximation, spectral clustering at scale
Anatomical landmarks	Predefined semantic points on a face, body, hand, or organ	dlib face alignment, MediaPipe Face Mesh, medical image registration
Object-pose landmarks	Fixed semantic points on a rigid object used for 6-DoF pose estimation	Robot grasping, AR object tracking, satellite docking

In classical computer vision the difference between keypoints and landmarks matters: keypoints are detector-driven, found wherever the local image structure is distinctive, while landmarks are predefined semantic positions that the model is trained to localize. The two terms do leak into each other, however, and many recent papers use them interchangeably.

Landmarks in dimension reduction

Many dimension reduction and clustering algorithms scale poorly with the number of data points. Classical multidimensional scaling, kernel PCA, spectral clustering, Gaussian process regression, and Isomap all require the eigendecomposition of an N x N matrix, which costs O(N^3) time and O(N^2) memory. Even storing the full pairwise distance matrix becomes infeasible once N exceeds a few tens of thousands. Landmark methods solve this by choosing a smaller set of L landmark points, performing the expensive computation on a reduced L x L or L x N matrix, then projecting the remaining points into the same low-dimensional space using cheap linear operations.

The asymptotic gain is large. Doing classical MDS on N points costs O(N^3). Doing Landmark MDS with L landmarks costs roughly O(L^3 + L N d), which is linear in N for fixed L and d. For datasets with millions of points this is the difference between an algorithm that runs in seconds and one that does not run at all.

Landmark MDS

Landmark Multidimensional Scaling, introduced by Vin de Silva and Joshua B. Tenenbaum in a 2004 Stanford technical report titled Sparse multidimensional scaling using landmark points, was the first algorithm to make the landmark idea explicit for multidimensional scaling. The procedure has three steps. First, pick L landmarks from the N data points. Second, run classical MDS on the L x L distance matrix between landmarks to obtain a d-dimensional embedding of the landmarks. Third, embed each remaining point by a distance-based triangulation, which uses only the L distances from that point to the landmarks.

The triangulation step has a clean linear algebra interpretation. If the landmark embedding has coordinates Y in R^(L x d) and the squared distances from a new point to the landmarks are stored in a vector delta, then the embedding for the new point is given by a closed-form pseudoinverse expression that depends only on Y and the mean squared landmark distances. The cost of embedding a new point is O(L d), so the algorithm scales linearly with N once the landmarks have been chosen and the landmark embedding has been computed.

De Silva and Tenenbaum showed that LMDS reproduces classical MDS exactly when L equals N, and that the error grows slowly with the ratio L / N when landmarks span the data well. In practice L on the order of a few hundred to a few thousand is enough for most natural datasets, even when N runs into the millions.

Landmark Isomap

Isomap, the nonlinear dimension reduction algorithm by Tenenbaum, de Silva, and Langford published in Science in 2000, replaces Euclidean distances with geodesic distances measured along a k-nearest-neighbor graph and then applies classical MDS. The bottleneck is again the N x N distance matrix and its eigendecomposition. Landmark Isomap, also called L-Isomap, uses the same landmark trick. It computes geodesic distances only between landmarks and the rest of the dataset, builds an L x N distance matrix instead of N x N, and embeds the remaining points by Landmark MDS. The result is a manifold embedding that scales to datasets where full Isomap would be impractical, with a small accuracy cost in regions where landmarks are sparsely distributed.

Nystrom approximation

The Nystrom approximation, brought into machine learning by Christopher Williams and Matthias Seeger in their 2001 NeurIPS paper Using the Nystrom method to speed up kernel machines, is the spectral counterpart to Landmark MDS. Given an N x N kernel matrix K that is too large to store or factorize, the Nystrom method picks L landmark columns of K, computes the small L x L block W from those columns, and forms a low-rank approximation

K approx C W^+ C^T,

where C is the N x L matrix of kernel values between all points and the landmarks and W^+ is the pseudoinverse of W. The approximate eigenvectors of K can then be recovered from the eigendecomposition of W, which is an O(L^3) operation, plus matrix multiplications that scale linearly in N. This brings kernel PCA, spectral clustering, and Gaussian process regression from O(N^3) down to O(L^3 + N L^2), and makes kernel methods practical on datasets with hundreds of thousands of points.

Nystrom and Landmark MDS are closely related. Both choose L anchor points, both build a small matrix on those anchors, and both extend the result to the rest of the data by a linear operation. The main difference is that Nystrom works on a positive semidefinite kernel matrix and approximates its spectrum, while Landmark MDS works on a squared distance matrix and approximates the inner product matrix derived from it.

Choosing landmarks

The quality of any landmark method depends on which points are chosen. The literature has converged on a small set of practical strategies, summarized below.

Selection strategy	How it works	Notes
Uniform random sampling	Pick L points uniformly at random from the dataset	The default in Williams and Seeger (2001); cheap and surprisingly competitive
MaxMin (farthest-point sampling)	Iteratively add the point farthest from the current set	Ensures landmarks span the data; default in many Isomap implementations
K-means centers	Run k-means with L clusters and use the cluster centers as landmarks	Empirically strong; first justified theoretically by Zhang, Tsang, and Kwok (2008) and later by Oglic and Gartner (2017)
Leverage-score sampling	Sample columns with probability proportional to their statistical leverage	Provides theoretical error bounds via random matrix theory
Greedy column selection	Choose columns that minimize a Frobenius-norm error directly	Used in column subset selection and CUR decompositions
Active selection	Pick landmarks that resolve the most current uncertainty in the embedding	Useful when distance computations are expensive

In practice random sampling is hard to beat for moderate L, k-means landmarks help when the data has clear clusters, and leverage-score sampling shines when theoretical guarantees matter. Most production code for Nystrom-based spectral clustering or Gaussian processes ships with all three options.

Other landmark-based methods

The landmark idea has been ported to most other algorithms that depend on N x N matrices.

Landmark Locally Linear Embedding (Landmark LLE): computes the LLE reconstruction weights only on the landmarks and propagates them to the remaining points.
Landmark spectral clustering: builds the affinity matrix between landmarks and all other points, then performs spectral clustering on a much smaller graph.
Landmark t-SNE and UMAP variants: several scalable variants of t-SNE and UMAP use landmark embeddings to seed positions for new points without recomputing the full optimization.
K-medoids clustering: the partition-based clustering algorithm uses medoids, which are themselves landmarks, as cluster centers and assigns points based on distance to medoids.

These methods do not all use the term landmark explicitly, but they share the same trick: replace an O(N^2) or O(N^3) computation with an O(L N) or O(L^2 N) one by introducing a small set of representative anchors.

Landmarks in computer vision

In computer vision, landmarks are predefined semantic points on an object whose locations have a fixed meaning across instances. The 31st point in the dlib 68-point face model is always the tip of the nose. The 12th point in the COCO body skeleton is always the left hip. This consistency lets downstream models reason about pose, identity, expression, or shape using a compact, interpretable representation rather than dense pixel data.

The earliest computer-vision landmarks came from anatomy and medical imaging, where radiologists had been marking corresponding points on X-rays for decades. The transition to automated detection started in the 1990s with statistical shape models and matured in the 2010s with deep regression and heatmap networks.

Landmarks vs keypoints

The distinction between keypoints and landmarks is mostly a matter of where the points come from.

Aspect	Detector-style keypoints	Landmarks
Origin	Found by a detector wherever the image is locally distinctive	Predefined positions in a labeling protocol
Number	Variable, depends on image content	Fixed, defined by the model or dataset
Identity	Anonymous, matched by descriptor similarity	Semantic, point i always means the same thing
Examples	SIFT, ORB, SuperPoint corners and blobs	68-point face model, COCO body 17, MediaPipe hand 21
Typical task	Image matching, SfM, SLAM	Face alignment, pose estimation, AR filters

In modern multi-task networks the boundary blurs. RetinaFace, for example, regresses both an anonymous bounding box and five fixed semantic landmarks in the same forward pass, and many human-pose papers swap landmarks and keypoints in successive sentences.

Facial landmarks

Facial landmarks are by far the most studied category. The standard sets, in order of granularity, are the following.

Set	Points	Source	Typical use
5-point	5	Eye centers, nose tip, mouth corners; used by MTCNN, RetinaFace, ArcFace pipelines	Face alignment for face recognition
21-point	21	Older face SDKs, AAM-style models	Coarse expression and pose
29-point	29	LFPW protocol (Belhumeur et al., 2011)	Cascaded regression research
68-point	68	Multi-PIE / iBUG 300-W protocol (Sagonas et al., 2013)	dlib, scikit-image, classical pipelines
98-point	98	WFLW dataset (Wu et al., CVPR 2018)	Boundary-aware alignment, occlusion robustness
106-point	106	JD AI Grand Challenge dataset, Chinese commercial SDKs	Mobile beauty filters
194-point	194	HELEN dataset (Le et al., 2012)	Dense facial parts segmentation
468-point (Face Mesh)	468	MediaPipe Face Mesh (Kartynnik et al., 2019)	AR filters, virtual try-on, full mesh fitting
5023-vertex (FLAME)	5023	FLAME 3D head model (Li et al., 2017)	3D face reconstruction, avatar driving

The 5-point set is the bare minimum needed to align a face for recognition. Two eyes and a nose define orientation, and the two mouth corners constrain scale. Face recognition systems such as ArcFace and AdaFace assume their input has been similarity-warped to a canonical pose using exactly these five points, which is one reason MTCNN and RetinaFace, the dominant face detectors, both regress them.

The 68-point set is the most influential research protocol. Sagonas, Tzimiropoulos, Zafeiriou, and Pantic introduced it at the iBUG 300 Faces in-the-Wild Challenge held at ICCV 2013, by re-annotating the LFPW, AFW, HELEN, XM2VTS, and FRGC datasets with the 68-point Multi-PIE markup and adding a new 135-image set of difficult faces. The result, often called iBUG 300-W, became the standard benchmark for face alignment for nearly a decade, and the 68-point layout is still the default in dlib, scikit-image, and many academic baselines.

The 98-point WFLW set, from the Look at Boundary paper by Wu, Qian, Yang, Wang, and Loy (CVPR 2018), adds points along the eyebrow, eye, mouth, and jawline contours and tags each face with attributes such as occlusion, blur, and pose. It is the standard for evaluating dense alignment under challenging conditions.

MediaPipe Face Mesh, described by Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann in Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs (arXiv 2019), takes the dense end of the spectrum. Its 468-point output is not a flat sparse landmark set; it is a triangulated mesh that approximates the full surface of the face, regressed in 3D from a single RGB camera in real time on a phone. The mesh is what makes Snapchat-style AR filters, beauty effects, and virtual try-on work without a depth sensor.

FLAME, by Tianye Li, Timo Bolkart, Michael Black, Hao Li, and Javier Romero (SIGGRAPH Asia 2017), goes further still. It is a 3D morphable model with 5023 vertices, learned from over 33,000 head scans, that parameterizes identity, expression, and pose using a small number of latent codes. Many modern face avatars and reconstruction pipelines fit FLAME parameters to a sparse set of detected landmarks as initialization, then refine against image evidence.

Body pose landmarks

Human body landmarks define a kinematic skeleton. The two dominant labeling conventions are MPII and COCO.

Skeleton	Joints	Source	Notes
MPII	16	MPII Human Pose dataset (Andriluka et al., CVPR 2014)	Tree rooted at pelvis; standard for single-person 2D pose
COCO Keypoints	17	Microsoft COCO (Lin et al., ECCV 2014)	Dominant benchmark since 2016; OKS metric
OpenPose BODY_25	25	OpenPose (Cao et al., CMU, 2017)	Body plus feet
MediaPipe Pose (BlazePose)	33	BlazePose (Bazarevsky et al., 2020)	Includes wrist and finger reference points
COCO-WholeBody	133	Jin et al., ECCV 2020	Body, face, hands, feet in one model
Halpe Full-Body	136	AlphaPose Halpe model	Used in action recognition pipelines

COCO 17 is now the de facto standard for multi-person 2D pose. The 17 points are nose, left and right eye, left and right ear, left and right shoulder, elbow, wrist, hip, knee, and ankle. Each point carries an x, y position and a visibility flag. The COCO keypoint challenge has driven most progress in 2D pose estimation since 2016.

Hand landmarks

Hand pose models settle on a 21-point skeleton. MediaPipe Hands, from Bazarevsky and colleagues at Google in 2020, predicts 21 3D landmarks per hand: one wrist point and four points per finger (knuckle, two finger joints, and fingertip). The full MediaPipe Holistic model combines 33 body landmarks, 21 per hand, and 468 face mesh landmarks for a total of 543 points per person, all tracked in real time on a mobile device.

Detection methods

Facial and body landmark detection has gone through three major waves: statistical shape models, cascaded regression, and deep heatmap or coordinate regression.

Method	Year	Authors	Approach
Active Shape Models (ASM)	1995	Cootes, Taylor, Cooper, Graham	PCA shape model plus local intensity profiles, iterative fit
Active Appearance Models (AAM)	2001	Cootes, Edwards, Taylor	Joint shape and texture PCA, fit by minimizing texture residual
Constrained Local Models (CLM)	2008	Cristinacce and Cootes	Local patch experts plus a shape prior
Explicit Shape Regression (ESR)	2012	Cao, Wei, Wen, Sun	Cascaded regression on shape-indexed pixel differences
Robust Cascaded Pose Regression (RCPR)	2013	Burgos-Artizzu, Perona, Dollar	Cascaded regression with occlusion handling
Ensemble of Regression Trees (ERT)	2014	Kazemi and Sullivan	One-millisecond face alignment used by dlib
MTCNN	2016	Zhang, Zhang, Li, Qiao	Cascaded CNN for joint face detection and 5-point alignment
Face Alignment Network (FAN)	2017	Bulat and Tzimiropoulos	Stacked hourglass for 2D and 3D landmarks
3DDFA	2017	Zhu, Liu, Lei, Li	Cascaded CNN that fits a dense 3DMM
PFLD	2019	Guo and colleagues	Lightweight MobileNet backbone for mobile devices
Face Mesh	2019	Kartynnik et al.	Real-time 468-point 3D mesh on mobile GPU
RetinaFace	2020	Deng, Guo, Zhou, Yu, Zafeiriou	Single-shot multi-level detector with built-in 5-point alignment
BlazeFace	2019	Bazarevsky et al.	Mobile-first single-shot detector with 6 keypoints

Active Shape Models, introduced by Cootes, Taylor, Cooper, and Graham in Computer Vision and Image Understanding (1995), built a Point Distribution Model from PCA on aligned training shapes and fit it to new images by alternating between local intensity-profile search at each landmark and global shape regularization. Active Appearance Models (Cootes, Edwards, and Taylor, 2001) extended this by jointly modeling shape and texture, fitting both at once.

The cascaded regression era started with Cao, Wei, Wen, and Sun's Face alignment by Explicit Shape Regression at CVPR 2012. ESR initializes from a mean shape and refines it through a series of regressors that operate on shape-indexed pixel differences, with no explicit shape model in the loop. Burgos-Artizzu, Perona, and Dollar followed in 2013 with RCPR, which added robust regression that handles occlusion, and Kazemi and Sullivan's 2014 One Millisecond Face Alignment with an Ensemble of Regression Trees gave the field the speed jump that made real-time alignment practical on commodity CPUs. The dlib library's well-known 68-point shape predictor is a direct implementation of this ERT approach, trained on iBUG 300-W.

The deep learning era began with MTCNN by Zhang, Zhang, Li, and Qiao (IEEE Signal Processing Letters, 2016), which cascades three small CNNs (PNet, RNet, ONet) to do face detection and 5-point alignment in one pass. Bulat and Tzimiropoulos's 2017 ICCV paper How far are we from solving the 2D and 3D Face Alignment problem? introduced the Face Alignment Network, a stacked hourglass that regresses landmark heatmaps and ships with both 2D and 3D variants. Their accompanying LS3D-W dataset, with 230,000 3D landmark annotations, became a standard 3D benchmark. PFLD by Guo and colleagues (arXiv 2019) made the same task run at 140 fps on a phone using a MobileNet backbone of just 2.1 megabytes.

For 3D and dense landmarks, Zhu, Liu, Lei, and Li's 3DDFA (TPAMI 2017) fits a 3DMM in full pose range using cascaded CNNs, and the MediaPipe Face Mesh model produces a 468-point 3D mesh in real time. Single-shot multi-task detectors such as RetinaFace and BlazeFace fold landmark regression into the detection head, eliminating the need for a separate alignment stage.

Use cases

Facial and body landmarks underlie a wide range of applications.

Face recognition preprocessing: ArcFace, AdaFace, and most other recognition models warp the input face to a canonical pose using a similarity transform fit to five landmarks. Recognition accuracy drops noticeably without this alignment step.
AR filters and beauty effects: Snapchat lenses, Instagram filters, and TikTok effects use face mesh landmarks to attach virtual hats, glasses, and makeup to the face.
Driver monitoring: automotive systems use eye and mouth landmarks to detect drowsiness and distraction; the eye aspect ratio derived from six landmarks per eye is a classical signal.
Medical imaging: anatomical landmarks support image registration, growth tracking in pediatric radiology, and surgical navigation in image-guided surgery.
Animation and motion capture: face mesh landmarks drive avatar facial expressions in real time for live streaming and virtual production. FLAME and other 3DMMs fit landmark sets to produce full 3D head animation.
Sign language and gesture recognition: hand and body landmarks feed sign-language translation models, replacing pixel inputs with a compact skeletal representation.
Fitness coaching: body pose landmarks check exercise form, count repetitions, and provide real-time feedback in apps such as Apple Fitness Plus.
Robotics and grasping: object landmarks define grasp affordances for robotic manipulators, including kPAM and Dense Object Nets.

Implementations and libraries

Most landmark detectors have well-tested open-source implementations.

Library	Languages	Landmarks provided
dlib	C++, Python	68-point face landmarks via the Kazemi-Sullivan ERT model; 5-point variant
MediaPipe	C++, Python, JS, Android, iOS	468-point face mesh, 33-point pose, 21-point hand, holistic
face-alignment (1adrianb)	PyTorch	2D and 3D Face Alignment Network from Bulat and Tzimiropoulos
InsightFace	MXNet, PyTorch	RetinaFace detector and 5-point alignment, 106-point dense alignment
OpenCV Facemark API	C++, Python	LBF, AAM, and Kazemi facemark models
MMPose	PyTorch	HRNet, ViTPose, RTMPose for body, face, and hand landmarks
PIPNet	PyTorch	Pixel-in-pixel regression for 68/98/19-point face alignment
AlphaPose	PyTorch	Halpe 136-point full-body landmarks
OpenPose	C++	25-point body, 21-point hand, 70-point face from CMU
3DDFA_V2	PyTorch	Real-time 3DMM fitting and dense face landmarks

A minimal dlib face-landmark workflow looks like the following.

import dlib, cv2
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
img = cv2.imread("face.jpg")
for det in detector(img, 1):
    shape = predictor(img, det)
    points = [(p.x, p.y) for p in shape.parts()]

Modern context

The split between detection and landmark regression has narrowed in the last few years. RetinaFace, BlazeFace, and YOLOv8-face all regress face landmarks in the same head as the bounding box, so a single forward pass returns both. On the dense end, neural rendering and 3D head models such as DECA, EMOCA, and SPECTRE replace the small landmark set with full 3D mesh parameters fit directly from images, often supervised by sparse landmarks as auxiliary targets. The 5-point or 68-point landmark output is rarely the final goal anymore; it is usually a stepping stone toward face recognition, expression analysis, avatar driving, or 3D reconstruction.

For body landmarks, models such as ViTPose (Xu et al., NeurIPS 2022) and RTMPose (Jiang et al., 2023) push COCO keypoint AP into the low 80s on test-dev while running at real-time speeds, and dense human-mesh recovery models such as PIXIE, SPIN, and HMR2 fit a full SMPL or SMPL-X body using landmarks as one supervision signal among many.

Privacy and regulation

Face landmarks are biometric identifiers in most legal frameworks, even when no original image is stored. The European Union's General Data Protection Regulation classifies biometric data, including face geometry, as a special category under Article 9, requiring explicit consent or another lawful basis for processing. Illinois's Biometric Information Privacy Act (BIPA, 2008) imposes notice-and-consent requirements and has led to large class-action settlements against Facebook, TikTok, and others. Several U.S. states have followed Illinois with similar statutes, and city-level bans on government use of face recognition (San Francisco 2019, Boston 2020, others) effectively constrain landmark-based identification systems as well. Practitioners building face-landmark systems should assume the resulting embeddings are regulated personal data even when the source pixels are deleted.

Connection between the two meanings

The two senses of landmark sit in different branches of machine learning, but they share a clean idea. In dimension reduction, a few well-chosen anchor points let you summarize the geometry of a million-point dataset without paying the full O(N^2) cost. In computer vision, a few well-chosen anchor points on a face let you summarize identity, pose, or expression without working with the full pixel grid. Both uses gain leverage from sparsity: most of the information that a downstream system needs lives at a small number of carefully selected positions, not in the dense interior of the data.

References

de Silva, V. and Tenenbaum, J. B. "Sparse multidimensional scaling using landmark points." Stanford University Technical Report, 2004.
Tenenbaum, J. B., de Silva, V., and Langford, J. C. "A Global Geometric Framework for Nonlinear Dimensionality Reduction." Science, vol. 290, no. 5500, 2000, pp. 2319-2323.
de Silva, V. and Tenenbaum, J. B. "Global Versus Local Methods in Nonlinear Dimensionality Reduction." Advances in Neural Information Processing Systems (NeurIPS), 2003.
Williams, C. K. I. and Seeger, M. "Using the Nystrom Method to Speed Up Kernel Machines." Advances in Neural Information Processing Systems (NeurIPS), 2001.
Drineas, P. and Mahoney, M. W. "On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning." Journal of Machine Learning Research, vol. 6, 2005, pp. 2153-2175.
Zhang, K., Tsang, I. W., and Kwok, J. T. "Improved Nystrom Low-Rank Approximation and Error Analysis." International Conference on Machine Learning (ICML), 2008.
Oglic, D. and Gartner, T. "Nystrom Method with Kernel K-means++ Samples as Landmarks." International Conference on Machine Learning (ICML), 2017.
Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J. "Active Shape Models, Their Training and Application." Computer Vision and Image Understanding, vol. 61, no. 1, 1995, pp. 38-59.
Cootes, T. F., Edwards, G. J., and Taylor, C. J. "Active Appearance Models." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, 2001, pp. 681-685.
Cao, X., Wei, Y., Wen, F., and Sun, J. "Face Alignment by Explicit Shape Regression." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2887-2894.
Burgos-Artizzu, X. P., Perona, P., and Dollar, P. "Robust Face Landmark Estimation under Occlusion." International Conference on Computer Vision (ICCV), 2013, pp. 1513-1520.
Kazemi, V. and Sullivan, J. "One Millisecond Face Alignment with an Ensemble of Regression Trees." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1867-1874.
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., and Pantic, M. "300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge." ICCV Workshops, 2013, pp. 397-403.
Zhang, K., Zhang, Z., Li, Z., and Qiao, Y. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters, vol. 23, no. 10, 2016, pp. 1499-1503.
Bulat, A. and Tzimiropoulos, G. "How far are we from solving the 2D and 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)." International Conference on Computer Vision (ICCV), 2017, pp. 1021-1030.
Zhu, X., Liu, X., Lei, Z., and Li, S. Z. "Face Alignment in Full Pose Range: A 3D Total Solution." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, 2019, pp. 78-92.
Guo, X., Li, S., Yu, J., Zhang, J., Ma, J., Ma, L., Liu, W., and Ling, H. "PFLD: A Practical Facial Landmark Detector." arXiv:1902.10859, 2019.
Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., and Zhou, Q. "Look at Boundary: A Boundary-Aware Face Alignment Algorithm." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2129-2138.
Kartynnik, Y., Ablavatski, A., Grishchenko, I., and Grundmann, M. "Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs." arXiv:1907.06724, 2019.
Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., and Zafeiriou, S. "RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
Bazarevsky, V., Kartynnik, Y., Vakunov, A., Raveendran, K., and Grundmann, M. "BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs." arXiv:1907.05047, 2019.
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., and Grundmann, M. "BlazePose: On-Device Real-Time Body Pose Tracking." CVPR Workshop on Computer Vision for Augmented and Virtual Reality, 2020.
Li, T., Bolkart, T., Black, M. J., Li, H., and Romero, J. "Learning a Model of Facial Shape and Expression from 4D Scans." ACM Transactions on Graphics (SIGGRAPH Asia), vol. 36, no. 6, 2017, pp. 194:1-194:17.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, C. L. "Microsoft COCO: Common Objects in Context." European Conference on Computer Vision (ECCV), 2014.
Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B. "2D Human Pose Estimation: New Benchmark and State of the Art Analysis." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., and Sheikh, Y. "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, 2021, pp. 172-186.
Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., and Kumar, N. "Localizing Parts of Faces Using a Consensus of Exemplars." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
European Parliament and Council. "Regulation (EU) 2016/679, General Data Protection Regulation (GDPR), Article 9." 2016.
State of Illinois. "Biometric Information Privacy Act (BIPA), 740 ILCS 14." 2008.

A brief disambiguation

Landmarks in dimension reduction

Landmark MDS

Landmark Isomap

Nystrom approximation

Choosing landmarks

Other landmark-based methods

Landmarks in computer vision

Landmarks vs keypoints

Facial landmarks

Body pose landmarks

Hand landmarks

Detection methods

Use cases

Implementations and libraries

Modern context

Privacy and regulation

Connection between the two meanings

Related topics

References

Improve this article

Related Articles

Machine learning terms/Computer Vision

Photography

LeNet

Computer-use agent

Computer-use model

OCR Models

A brief disambiguation

Landmarks in dimension reduction

Landmark MDS

Landmark Isomap

Nystrom approximation

Choosing landmarks

Other landmark-based methods

Landmarks in computer vision

Landmarks vs keypoints

Facial landmarks

Body pose landmarks

Hand landmarks

Detection methods

Use cases

Implementations and libraries

Modern context

Privacy and regulation

Connection between the two meanings

Related topics

References

Related Articles

Machine learning terms/Computer Vision

Photography

LeNet

Computer-use agent

Computer-use model

OCR Models