Datasets
22 articles
C4 (Colossal Clean Crawled Corpus)
Common Crawl, NLP, Pretraining Corpora
CIFAR-10
Benchmarks, Computer Vision
COCO dataset
Computer Vision, Machine Learning, Object Detection
Common Crawl
Machine Learning, Natural Language Processing, Open Data
Cosmopedia
Large Language Models, Open Source AI
Dolma
Large Language Models, Open Source AI
FineWeb
Machine Learning, Natural Language Processing, Open Source AI
FineWeb-Edu
Large Language Models, Machine Learning
HotpotQA
2018 in AI, Multi-hop reasoning, NLP benchmarks
Iris dataset
AI Benchmarks, Machine Learning, Statistics
LAION
Computer Vision, Machine Learning, Open Data
LAION-5B
Generative AI, Image-Text Data, Open AI Data
LVIS (Large Vocabulary Instance Segmentation)
Computer Vision, Instance Segmentation, Long-Tailed Recognition
MNIST
Computer Vision, Machine Learning
Open X-Embodiment
Google DeepMind, Robot Learning, Robotics
PASCAL VOC
AI Benchmarks, Computer Vision, Machine Learning
RedPajama
Machine Learning, Natural Language Processing, Open Source AI
RefinedWeb
Large Language Models, Open Source AI
Segment Anything Model and Dataset (SAM and SA-1B)
Computer Vision, Foundation Models, Meta AI
SuperGLUE
Benchmarks, Natural Language Processing
The Pile (dataset)
Machine Learning, Natural Language Processing, Open Source AI
TxT360
Large Language Models, Open Source AI