IoU

See also: Machine learning terms

Intersection over Union (IoU), also known as the Jaccard index or Jaccard similarity coefficient, is a statistic used to compare the similarity of two sets. In computer vision and machine learning, it is the dominant metric for measuring how well a predicted region (a bounding box or a segmentation mask) overlaps with a ground truth region. IoU is computed as the area of overlap between two regions divided by the area of their union, producing a value between 0 (no overlap) and 1 (perfect overlap).

The metric appears across nearly every modern visual recognition pipeline. It defines the matching criterion in benchmarks such as PASCAL VOC, Microsoft COCO, Cityscapes, and KITTI. It serves as the suppression criterion inside non-maximum suppression. It is the basis of a large family of regression losses used to train object detectors, including IoU loss, GIoU, DIoU, CIoU, SIoU, EIoU, and Wise-IoU. The same quantity is also used outside vision, for example in document deduplication, set similarity in databases, and ecological community comparison, which was the original setting in which Paul Jaccard introduced the index in 1901.[^jaccard1901]

Definition

Set-theoretic form

For two finite sets $A$ and $B$, the Jaccard index is defined as

$$J(A, B) = \frac{|A \cap B|}{|A \cup B|} = \frac{|A \cap B|}{|A| + |B| - |A \cap B|}.$$

The value satisfies $0 \leq J(A, B) \leq 1$, with $J(A, B) = 1$ if and only if $A = B$ (and both are non-empty), and $J(A, B) = 0$ when $A \cap B = \emptyset$. By convention $J(\emptyset, \emptyset) = 1$.[^jaccard_wiki]

Bounding box form

In object detection, $A$ and $B$ are typically axis-aligned rectangles in image coordinates. Letting $A = (x_1^A, y_1^A, x_2^A, y_2^A)$ and $B = (x_1^B, y_1^B, x_2^B, y_2^B)$ denote the top-left and bottom-right corners, the intersection is itself a rectangle with corners

$$x_1^I = \max(x_1^A, x_1^B), \quad y_1^I = \max(y_1^A, y_1^B),$$

$$x_2^I = \min(x_2^A, x_2^B), \quad y_2^I = \min(y_2^A, y_2^B).$$

If $x_2^I > x_1^I$ and $y_2^I > y_1^I$, the intersection area is $(x_2^I - x_1^I)(y_2^I - y_1^I)$; otherwise it is zero. The IoU is then

$$\text{IoU}(A, B) = \frac{\text{Area}(A \cap B)}{\text{Area}(A) + \text{Area}(B) - \text{Area}(A \cap B)}.$$

Segmentation mask form

For binary segmentation, $A$ and $B$ are sets of foreground pixels. With true positives $TP$, false positives $FP$, and false negatives $FN$ counted at the pixel level,

$$\text{IoU} = \frac{TP}{TP + FP + FN}.$$

This is the form used in PASCAL VOC's segmentation evaluation and in Cityscapes' per-class IoU.[^pascal_voc_2010][^cityscapes]

Intuition

IoU answers a single question: of all the area covered by the prediction and the ground truth combined, how much do they share? Two boxes that overlap completely score 1. Two boxes that touch at a single edge score 0. A prediction that is twice as large as the ground truth and fully contains it scores 0.5, because the union is twice the intersection. The same logic applies to pixel masks: doubling the prediction's footprint while keeping the same true positive area halves the IoU.

A useful property is that IoU is symmetric: $\text{IoU}(A, B) = \text{IoU}(B, A)$. It is also scale-aware in a relative sense. Shifting both boxes by a few pixels has a much larger effect when the boxes are small than when they are large, so a 5-pixel error on a 20-pixel-wide pedestrian penalises an IoU score far more than the same error on a 200-pixel-wide bus.

Mathematical properties

The IoU has several properties that explain why it is preferred over older similarity measures such as the simple overlap coefficient or pixel accuracy:

Property	Description
Range	$[0, 1]$ for any pair of non-empty sets
Symmetry	$J(A, B) = J(B, A)$
Identity	$J(A, A) = 1$ for any non-empty $A$
Triangle-related	$1 - J(A, B)$ is the Jaccard distance, a true metric on the space of finite sets
Scale-invariance	$J(\lambda A, \lambda B) = J(A, B)$ for any positive scaling $\lambda$
Class imbalance robustness	Unlike pixel accuracy, IoU does not become trivially high when one class dominates the image

The Jaccard distance $d_J(A, B) = 1 - J(A, B)$ satisfies the triangle inequality and is a proper metric, which is why it is used in clustering and retrieval contexts.[^jaccard_wiki]

The relationship between IoU and the Dice coefficient (also called F1 score in the binary case) is monotonic:

$$\text{Dice} = \frac{2 \cdot \text{IoU}}{1 + \text{IoU}}, \quad \text{IoU} = \frac{\text{Dice}}{2 - \text{Dice}}.$$

Because both metrics are monotonically related, ranking models by IoU and ranking them by Dice yields the same ordering. The two are not equal, however, and Dice tends to give numerically larger values for the same prediction.

History

The quantity now called IoU was first published in its modern form by the Swiss botanist Paul Jaccard in 1901, in a study comparing alpine flora. He called it the coefficient de communauté and used it to compare which plant species occurred together in different mountain plots.[^jaccard1901] An equivalent ratio had been described earlier by the geologist Grove Karl Gilbert in 1884 as a "ratio of verification" for weather forecasts. The same statistic was independently rediscovered by Taffee Tadashi Tanimoto at IBM in the 1950s, and is therefore sometimes called the Tanimoto coefficient, especially in cheminformatics.[^jaccard_wiki]

The Jaccard index entered modern computer vision through the PASCAL VOC challenge. The first VOC paper that codified the evaluation procedure was published in the International Journal of Computer Vision in 2010 by Mark Everingham and colleagues. They specified that, for an object detection result to count as a true positive, the predicted bounding box must overlap a ground truth box with IoU greater than 0.5.[^pascal_voc_2010] This 0.5 threshold became the de facto standard for detection evaluation throughout the early deep learning era and is still the default for many practical reports.

When the Microsoft COCO dataset was released in 2014, its authors argued that a single threshold rewards loose localisation. They introduced the now-standard mAP@[0.5:0.95], averaged over ten IoU thresholds from 0.5 to 0.95 in steps of 0.05.[^coco_paper] This stricter protocol forced subsequent detectors to produce tighter, better-fitting boxes.

Applications in computer vision

Object detection evaluation

In the object detection setting, a predicted box is matched to a ground truth box if their IoU exceeds a threshold $\tau$. The match defines true positives, false positives (predictions that do not match) and false negatives (ground truth boxes with no matching prediction). Average precision (AP) is then computed from the resulting precision-recall curve.

Different benchmarks pick different IoU thresholds:

Benchmark	Threshold	Notes
PASCAL VOC	0.5	Single fixed threshold; mAP averaged over 20 classes
COCO	0.5 to 0.95 in steps of 0.05	Average of 10 IoU values, called mAP@[.5:.95] or simply AP
LVIS	0.5 to 0.95 in steps of 0.05	Same as COCO but emphasises long-tail classes
KITTI (cars, 2D and 3D)	0.7	Stricter threshold reflects driving-safety requirements
KITTI (pedestrians, cyclists)	0.5	Smaller objects allow looser matching
Open Images	0.5	Hierarchical class structure with relaxed matching

A single value such as AP@0.5 reports how often a model finds objects loosely; AP@0.75, often abbreviated AP75, measures stricter localisation; AP@[.5:.95] integrates the two.[^coco_metrics]

Semantic segmentation

For semantic segmentation, the standard metric is mean Intersection over Union (mIoU), sometimes also written Jaccard index. The IoU is computed for each class across the entire test set, treating the predicted and ground truth pixel masks as sets, and then averaged over classes:

$$\text{mIoU} = \frac{1}{C} \sum_{c=1}^{C} \frac{TP_c}{TP_c + FP_c + FN_c}.$$

The mean is taken across $C$ classes so that minority classes such as poles or traffic signs are not drowned out by dominant classes such as road or sky. The Cityscapes benchmark reports both IoU per class and IoU per category (a coarser grouping), and Cityscapes' leaderboard ranks models by mean class IoU.[^cityscapes] PASCAL VOC's segmentation track uses the same per-class IoU formulation.[^pascal_voc_2010]

Instance segmentation

In instance segmentation (for example, the Mask R-CNN style of output), each predicted mask is matched to a ground truth mask using mask IoU rather than box IoU, and the COCO-style AP@[.5:.95] is reported on the resulting matches. COCO maintains separate AP scores for bounding boxes and masks.[^coco_paper]

Non-maximum suppression

Object detectors usually emit hundreds of overlapping candidate boxes per object. Non-maximum suppression (NMS) trims these to a single best box per object using IoU as the suppression criterion. A typical NMS routine sorts candidates by class score, picks the highest-scoring box, and removes any other box whose IoU with the chosen box exceeds a threshold (commonly 0.45 or 0.5). The procedure is repeated until no candidates remain. Soft-NMS, introduced by Bodla et al., reduces (rather than removes) the score of overlapping boxes proportionally to their IoU, which preserves recall when objects of the same class genuinely overlap.[^soft_nms]

Tracking and re-identification

IoU is also the matching cost in single-stage trackers such as IoU-Tracker and the cascade matching stage of DeepSORT and ByteTrack. A track and a detection are linked across consecutive frames when their boxes overlap by more than a chosen IoU threshold.

IoU as a loss function

For most of the 2010s, object detectors were trained with smooth L1 or L2 loss applied to the four box coordinates. This is suboptimal because two boxes can have a low coordinate-wise distance and still poor IoU, and vice versa. UnitBox, introduced by Yu et al. at ACM Multimedia in 2016, was the first detector to back-propagate directly through the IoU.[^unitbox] The IoU loss is

$$\mathcal{L}_{\text{IoU}} = 1 - \text{IoU}(A, B).$$

This loss has two well-known shortcomings:

The gradient vanishes when the predicted and ground truth boxes do not overlap, because IoU is constant at zero over the entire non-overlap region.
Even when boxes overlap, IoU does not distinguish between predictions that are far from the ground truth and predictions that are near it but not yet overlapping; both score zero.

A family of variants, summarised below, has been proposed to address these issues.

Variants of IoU loss

Variant	Year and venue	Authors	Key idea
IoU loss (UnitBox)	2016, ACM MM	Yu et al.	Back-propagate through IoU directly instead of L2 on coordinates
GIoU	2019, CVPR	Rezatofighi, Tsoi, Gwak, Sadeghian, Reid, Savarese	Adds a penalty for the empty area of the smallest enclosing box, providing gradient when boxes do not overlap
DIoU	2020, AAAI	Zheng, Wang, Liu, Li, Ye, Ren	Adds a normalised distance term between box centers; converges faster than GIoU
CIoU	2020, AAAI	Zheng et al.	Adds a third term for aspect-ratio consistency on top of DIoU
EIoU and Focal-EIoU	2022, Neurocomputing	Zhang, Ren, Zhang, Jia, Wang, Tan	Replaces CIoU's aspect-ratio term with explicit width and height differences and adds focal weighting
SIoU	2022, arXiv	Gevorgyan	Adds an angle cost between the line connecting box centers and the image axes, plus a redefined distance and shape cost
Wise-IoU (WIoU)	2023, arXiv	Tong, Chen and others	Dynamic, non-monotonic focusing weight based on outlier degree of each anchor
Probabilistic IoU (ProbIoU)	2021, arXiv	Llerena, Zeni, Kristen, Jung	Models boxes as 2D Gaussians, uses Hellinger or Bhattacharyya distance for differentiable IoU on rotated boxes
PIoU (Pixels-IoU)	2020, ECCV	Chen, Yang, Zhang and others	Pixel-wise approximation suitable for oriented bounding boxes

Generalized IoU

GIoU was introduced by Hamid Rezatofighi and colleagues at CVPR 2019. It addresses the vanishing-gradient problem of plain IoU when boxes do not overlap. Letting $C$ be the smallest axis-aligned box that contains both $A$ and $B$, GIoU is defined as

$$\text{GIoU} = \text{IoU} - \frac{|C \setminus (A \cup B)|}{|C|}.$$

The additional term penalises the empty area inside the enclosing box, so that even when boxes do not overlap, smaller enclosing boxes give higher GIoU values. GIoU lies in the range $[-1, 1]$. The authors reported consistent gains over smooth-L1 and plain IoU on PASCAL VOC and COCO when GIoU was plugged into Faster R-CNN, Mask R-CNN, and YOLOv3.[^giou]

Distance IoU and Complete IoU

DIoU and CIoU were proposed in the same paper by Zhaohui Zheng and colleagues at AAAI 2020. DIoU adds the normalised squared distance between the centers of the two boxes:

$$\text{DIoU} = \text{IoU} - \frac{\rho^2(\mathbf{b}, \mathbf{b}^{gt})}{c^2},$$

where $\rho$ denotes Euclidean distance, $\mathbf{b}$ and $\mathbf{b}^{gt}$ are the centers of the predicted and ground truth boxes, and $c$ is the diagonal of the smallest enclosing box. CIoU adds a further consistency term $\alpha v$ that penalises differences in aspect ratio:

$$\text{CIoU} = \text{IoU} - \frac{\rho^2(\mathbf{b}, \mathbf{b}^{gt})}{c^2} - \alpha v,$$

where $v$ measures aspect-ratio dissimilarity and $\alpha$ is a positive trade-off coefficient. The authors showed that DIoU also improves NMS by replacing IoU as the suppression criterion (DIoU-NMS), preserving boxes that overlap but have distinct centers (for example, two adjacent pedestrians).[^diou]

SIoU

The SCYLLA-IoU (SIoU) loss, introduced by Zhora Gevorgyan in 2022, adds an angle cost. The intuition is that earlier IoU losses do not consider the direction of the offset between the predicted and ground truth box centers, so the predicted box can wander before converging. SIoU's combined loss includes an angle cost, a distance cost re-weighted by that angle, a shape cost on width and height differences, and the IoU term itself.[^siou]

Wise-IoU

Wise-IoU (WIoU) was proposed by Zanjia Tong, Yuhang Chen and colleagues in 2023. Earlier focal-style losses such as Focal-EIoU upweight harder examples monotonically. WIoU uses a dynamic, non-monotonic focusing function based on each anchor's outlier degree, so that very low-quality anchors (likely mislabelled or extremely poor crops) are also down-weighted. The authors reported AP gains on YOLOv7 trained on MS-COCO.[^wise_iou]

Probabilistic IoU and rotated boxes

For oriented bounding boxes used in aerial imagery and text detection, the standard rectangular IoU is hard to differentiate. The Pixels-IoU loss (Chen et al., ECCV 2020) approximates IoU pixel by pixel.[^piou] Llerena and colleagues took a different approach, modelling each box as a 2D Gaussian distribution and using Hellinger or Bhattacharyya distance as a probabilistic analogue of IoU, giving the ProbIoU family of losses.[^probiou]

Loss vs. evaluation metric

A practical distinction: IoU as an evaluation metric is computed on hard predictions and requires no gradient. IoU as a training loss must be differentiable with respect to the predicted box parameters. This is why training and evaluation often use slightly different IoU variants. Models commonly train with CIoU or GIoU and are then evaluated using vanilla IoU at one or several thresholds.

Standard benchmarks that rely on IoU

Benchmark	Domain	IoU usage
PASCAL VOC (2007 to 2012)	2D detection, segmentation	mAP@0.5 for detection; per-class IoU for segmentation
Microsoft COCO (2014 onwards)	Detection, instance segmentation, keypoints	mAP@[.5:.95], AP@0.5, AP@0.75; mask IoU for segmentation
Cityscapes (2016)	Urban driving scenes	Mean IoU per class and per category
ADE20K (2017)	Scene parsing	Mean IoU over 150 classes
KITTI (2012, 2017)	Autonomous driving 2D and 3D	IoU thresholds 0.7 (cars) and 0.5 (pedestrians, cyclists)
LVIS (2019)	Long-tail detection	COCO-style AP@[.5:.95]
Open Images (2018)	Large-scale detection	mAP@0.5 with hierarchy-aware matching
nuScenes (2019)	3D detection in autonomous driving	Uses center distance instead of IoU for matching, but reports IoU as a secondary metric
Waymo Open Dataset (2020)	3D detection and tracking	3D IoU thresholds 0.7 and 0.5

Implementations

PyTorch (torchvision)

The torchvision.ops module provides differentiable and non-differentiable IoU utilities. The basic call computes a pairwise IoU matrix between two sets of boxes in xyxy format:[^torchvision_box_iou]

import torch
from torchvision.ops import box_iou, generalized_box_iou_loss, complete_box_iou_loss

boxes1 = torch.tensor([[0, 0, 100, 100], [50, 50, 150, 150]], dtype=torch.float32)
boxes2 = torch.tensor([[10, 10, 110, 110]], dtype=torch.float32)

iou_matrix = box_iou(boxes1, boxes2)
# iou_matrix has shape (2, 1)

# Loss variants for training
pred = torch.tensor([[10.0, 10.0, 90.0, 90.0]], requires_grad=True)
target = torch.tensor([[0.0, 0.0, 100.0, 100.0]])

loss_giou = generalized_box_iou_loss(pred, target, reduction="mean")
loss_ciou = complete_box_iou_loss(pred, target, reduction="mean")
loss_giou.backward()

Torchvision also exposes box_iou, generalized_box_iou, distance_box_iou, complete_box_iou, and the corresponding _loss versions.

TensorFlow / Keras

The tf.keras.metrics.MeanIoU metric maintains a confusion matrix and computes the mean per-class IoU. It is most often used for semantic segmentation:[^tf_meaniou]

import tensorflow as tf

m = tf.keras.metrics.MeanIoU(num_classes=2)
m.update_state([0, 0, 1, 1], [0, 1, 0, 1])
print(m.result().numpy())  # 0.33333334

model.compile(
    optimizer="sgd",
    loss="categorical_crossentropy",
    metrics=[tf.keras.metrics.MeanIoU(num_classes=21)],
)

For bounding boxes, the TensorFlow Models library and TF Addons historically provided tfa.losses.GIoULoss, and Keras CV provides keras_cv.losses.IoULoss and keras_cv.losses.CIoULoss for modern detector training.

A minimal NumPy reference

A bare implementation of box IoU in NumPy is short enough to be illustrative:

import numpy as np

def box_iou(box_a, box_b):
    """Compute IoU between two boxes in xyxy format."""
    x1 = max(box_a<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>, box_b<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>)
    y1 = max(box_a<sup><a href="#cite_note-1" class="cite-ref">[1]</a></sup>, box_b<sup><a href="#cite_note-1" class="cite-ref">[1]</a></sup>)
    x2 = min(box_a<sup><a href="#cite_note-2" class="cite-ref">[2]</a></sup>, box_b<sup><a href="#cite_note-2" class="cite-ref">[2]</a></sup>)
    y2 = min(box_a<sup><a href="#cite_note-3" class="cite-ref">[3]</a></sup>, box_b<sup><a href="#cite_note-3" class="cite-ref">[3]</a></sup>)
    inter = max(0.0, x2 - x1) * max(0.0, y2 - y1)
    area_a = (box_a<sup><a href="#cite_note-2" class="cite-ref">[2]</a></sup> - box_a<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>) * (box_a<sup><a href="#cite_note-3" class="cite-ref">[3]</a></sup> - box_a<sup><a href="#cite_note-1" class="cite-ref">[1]</a></sup>)
    area_b = (box_b<sup><a href="#cite_note-2" class="cite-ref">[2]</a></sup> - box_b<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>) * (box_b<sup><a href="#cite_note-3" class="cite-ref">[3]</a></sup> - box_b<sup><a href="#cite_note-1" class="cite-ref">[1]</a></sup>)
    union = area_a + area_b - inter
    return inter / union if union > 0 else 0.0

Vectorised implementations replace the scalar max and min with element-wise NumPy or PyTorch operations and broadcast across all $N \times M$ pairs of boxes.

Limitations

IoU is dominant in computer vision because it is simple, scale-aware, and aligned with human notions of overlap, but it has well-known weaknesses.

Vanishing gradient when boxes do not overlap. Plain IoU is a constant zero whenever there is no intersection, so an IoU loss provides no learning signal in that regime. GIoU, DIoU, CIoU, and SIoU were all introduced to fix this.
Threshold sensitivity. A 0.49 IoU is treated as a complete miss at threshold 0.5, even though it is visually almost identical to a 0.51 hit. Averaging over many thresholds (as COCO does) smooths this discontinuity but does not eliminate it.
Bias against small objects. A small absolute pixel error around a small object causes a much larger drop in IoU than the same error around a large object. Benchmarks therefore often report AP-small, AP-medium, and AP-large separately.
No notion of class accuracy. IoU only measures localisation quality. A perfect box on the wrong class still scores 1 by IoU; the surrounding mAP machinery is what folds in classification.
Insensitivity to relative position when IoU is identical. Two predictions with the same IoU but very different center distances or aspect ratios are indistinguishable to plain IoU. DIoU adds center distance, and CIoU adds aspect ratio, in part to address this.
Poor handling of rotated or non-rectangular shapes. Computing the intersection of two rotated rectangles or arbitrary polygons is more involved and is not differentiable in closed form for general shapes; this motivates ProbIoU and Pixels-IoU for oriented detection.
Mask IoU can be dominated by large connected regions. In semantic segmentation, a few large classes such as road or sky drive the per-image IoU; the per-class mIoU mitigates this but does not eliminate it.

IoU is therefore usually combined with other metrics: precision, recall, F1, average precision, and confusion-matrix style breakdowns.

Metric	Relationship to IoU
Dice coefficient (F1 in the binary case)	$\text{Dice} = \frac{2 \cdot \text{IoU}}{1 + \text{IoU}}$; monotonically related to IoU
Pixel accuracy	Fraction of correctly classified pixels; biased by class imbalance, IoU is preferred
Frequency-weighted IoU (FWIoU)	Weighted average of per-class IoU, weighted by class frequency
Boundary IoU	Computed only on a thin band around object boundaries; emphasises edge accuracy
Tversky index	Generalisation of Jaccard with separately weighted FP and FN; reduces to IoU when both weights are 1
Hausdorff distance	Distance-based dissimilarity, complementary to IoU for boundary quality in medical segmentation
Average Precision (AP)	Computed over precision-recall pairs derived from IoU-based matching

Use outside computer vision

The Jaccard form of IoU is widely used beyond images:

Information retrieval and deduplication. MinHash sketches estimate Jaccard similarity between large sets of shingles in near-duplicate document detection (used historically by AltaVista and other early web search engines).[^jaccard_wiki]
Cheminformatics. The Tanimoto coefficient on molecular fingerprints (the same formula as Jaccard, applied to bit strings) is the standard ligand-similarity measure.
Recommender systems. Item-to-item similarity for binary user-item interactions is often computed with the Jaccard index.
Ecology. Comparison of species lists across plots, the original setting of Jaccard's 1901 paper.
Genomics. Overlap of feature sets between samples in single-cell or variant-call analyses.

Summary

IoU is the workhorse spatial-overlap metric in computer vision. It is simple to compute, intuitive to interpret, and aligned with the way humans judge whether two regions match. Its limitations have driven a productive line of research into gradient-friendly variants (GIoU, DIoU, CIoU, EIoU, SIoU, WIoU, ProbIoU) and matching-aware NMS schemes (Soft-NMS, DIoU-NMS). It defines the matching criterion in PASCAL VOC, COCO, KITTI, Cityscapes, and most modern benchmarks, and it sits inside both the training loss and the post-processing of essentially every contemporary object detector.

Explain like I'm 5 (ELI5)

Imagine you and your friend both draw a square around a cat in the same picture. You want to check how well the two squares match. Put one drawing on top of the other. The piece of paper that is covered by both squares is the intersection. The piece of paper that is covered by either square is the union. Divide the small (intersection) by the big (union). If the squares match perfectly you get 1. If they do not overlap at all you get 0. Computers use that same number, called IoU, to grade themselves when they try to find objects in pictures.

Definition

Set-theoretic form

Bounding box form

Segmentation mask form

Intuition

Mathematical properties

History

Applications in computer vision

Object detection evaluation

Semantic segmentation

Instance segmentation

Non-maximum suppression

Tracking and re-identification

IoU as a loss function

Variants of IoU loss

Generalized IoU

Distance IoU and Complete IoU

SIoU

Wise-IoU

Probabilistic IoU and rotated boxes

Loss vs. evaluation metric

Standard benchmarks that rely on IoU

Implementations

PyTorch (torchvision)

TensorFlow / Keras

A minimal NumPy reference

Limitations

Related metrics

Use outside computer vision

Summary

Explain like I'm 5 (ELI5)

References

Improve this article

Related Articles

Machine learning terms/Computer Vision

Photography

LeNet

Bounding Box

YOLO (object detection)

COCO dataset

Definition

Set-theoretic form

Bounding box form

Segmentation mask form

Intuition

Mathematical properties

History

Applications in computer vision

Object detection evaluation

Semantic segmentation

Instance segmentation

Non-maximum suppression

Tracking and re-identification

IoU as a loss function

Variants of IoU loss

Generalized IoU

Distance IoU and Complete IoU

SIoU

Wise-IoU

Probabilistic IoU and rotated boxes

Loss vs. evaluation metric

Standard benchmarks that rely on IoU

Implementations

PyTorch (torchvision)

TensorFlow / Keras

A minimal NumPy reference

Limitations

Related metrics

Use outside computer vision

Summary

Explain like I'm 5 (ELI5)

References

Related Articles

Machine learning terms/Computer Vision

Photography

LeNet

Bounding Box

YOLO (object detection)

COCO dataset