Data & Datasets
104 articles
Bucketing
Machine Learning
C4 (Colossal Clean Crawled Corpus)
Natural Language Processing
CIFAR-10
AI Benchmarks, Computer Vision
COCO dataset
Computer Vision, Machine Learning
Categorical Data
Machine Learning, Statistics
CharXiv
AI Benchmarks
Class-Imbalanced Dataset
Machine Learning
Common Corpus
Natural Language Processing, Open Source AI
Common Crawl
Machine Learning, Natural Language Processing
Common Pile
Natural Language Processing, Open Source AI
Continuous Feature
Machine Learning, Statistics
Convenience Sampling
Machine Learning, Statistics
Cosmopedia
Large Language Models, Open Source AI
Coverage Bias
AI Ethics, Machine Learning
DCLM (DataComp for Language Models)
AI Benchmarks, Natural Language Processing, Open Source AI
Data Augmentation
Deep Learning, Machine Learning
Data Provenance Initiative
Machine Learning
Data Set or Dataset
Machine Learning
Data preprocessing
Machine Learning
Data-centric AI (DCAI)
MLOps
DatologyAI
AI Companies
Dense Feature
Machine Learning
Derived label
Machine Learning
Dimension Reduction
Machine Learning
Discrete Feature
Machine Learning
Dolma
Large Language Models, Open Source AI
Downsampling
Deep Learning, Machine Learning
Ego-Exo4D
Computer Vision, Meta AI
Ego4D
Computer Vision, Meta AI
Feature
Machine Learning
Feature Cross
Machine Learning
Feature Engineering
Machine Learning
Feature Extraction
Machine Learning
Feature Selection
Algorithms, Machine Learning
Feature Set
Data Science, Machine Learning
Feature Vector
Machine Learning
FineWeb
Machine Learning, Natural Language Processing, Open Source AI
FineWeb-2
Machine Learning
FineWeb-Edu
Large Language Models, Machine Learning
Ground Truth
Machine Learning
HotpotQA
AI Benchmarks, Artificial Intelligence, Natural Language Processing
How to Prevent OpenAI and Google From Training Their LLMs on Your Website's Data
Large Language Models
Imbalanced Dataset
Machine Learning
Instance
Machine Learning
Inter-rater agreement
Model Evaluation, Statistics
Iris dataset
AI Benchmarks, Machine Learning, Statistics
LAION
Computer Vision, Machine Learning
LAION-5B
Generative AI
LVIS (Large Vocabulary Instance Segmentation)
Computer Vision
Label
Machine Learning
MMMLU
AI Benchmarks
MNIST
Computer Vision, Machine Learning
Mecka
Robotics Companies
MetaCLIP
Meta AI, Multimodal AI
MimicGen
Embodied AI, NVIDIA, Robotics
Nemotron-CC
NVIDIA, Natural Language Processing
Noise
Machine Learning
Non-Response Bias
Machine Learning, Statistics
Normalization
Machine Learning
Numerical Data
Machine Learning