Datasets
19 articles
C4 (Colossal Clean Crawled Corpus)
Common Crawl, NLP, Pretraining Corpora
CIFAR-10
Benchmarks, Computer Vision
COCO dataset
Computer Vision, Machine Learning, Object Detection
Common Crawl
Machine Learning, Natural Language Processing, Open Data
Dolma
Large Language Models, Open Source AI
FineWeb
Machine Learning, Natural Language Processing, Open Source AI
HotpotQA
2018 in AI, Multi-hop reasoning, NLP benchmarks
Iris dataset
AI Benchmarks, Machine Learning, Statistics
LAION
Computer Vision, Machine Learning, Open Data
LAION-5B
Generative AI, Image-Text Data, Open AI Data
LVIS (Large Vocabulary Instance Segmentation)
Computer Vision, Instance Segmentation, Long-Tailed Recognition
MNIST
Computer Vision, Machine Learning
Open X-Embodiment
Google DeepMind, Robot Learning, Robotics
PASCAL VOC
AI Benchmarks, Computer Vision, Machine Learning
RedPajama
Machine Learning, Natural Language Processing, Open Source AI
RefinedWeb
Large Language Models, Open Source AI
Segment Anything Model and Dataset (SAM and SA-1B)
Computer Vision, Foundation Models, Meta AI
SuperGLUE
Benchmarks, Natural Language Processing
The Pile (dataset)
Machine Learning, Natural Language Processing, Open Source AI