Data-centric AI (DCAI)

Revision as of 08:53, 28 February 2023 by Elegant angel (talk | contribs) (Created page with "==Introduction== Model-centric AI is the paradigm taught in most ML classes and revolves around producing the best model given a clean, well-curated dataset. In contrast, Data-centric AI involves systematically engineering data to build better AI systems. Data-centric AI can come in two forms: *algorithms that understand data and use that information to improve models *algorithms that modify data to improve ML models. Examples of this include curriculum learning (...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

Model-centric AI is the paradigm taught in most ML classes and revolves around producing the best model given a clean, well-curated dataset. In contrast, Data-centric AI involves systematically engineering data to build better AI systems.

Data-centric AI can come in two forms:

  • algorithms that understand data and use that information to improve models
  • algorithms that modify data to improve ML models.

Examples of this include curriculum learning (training on ‘easy’ data first), and confident learning (filtering mislabeled data).

Steps to Supervised Learning

Recall that model-centric AI is focused on creating the best model for a particular dataset, while datacentric AI is focused on producing the best dataset to feed an ML model. In order to deploy the most effective supervised learning systems, it is important that you do both. This data-centric AI pipe can look something like this:

  1. Analyze the data and fix any fundamental problems. Then, transform it into ML-appropriate.
  2. Create a base ML model using the correctly formatted dataset.
  3. This model can be used to improve your dataset ( techniques covered in this class).
  4. You can try different modeling techniques to improve your model using the improved data and get the best possible model.

Despite tempting temptation, don't skip Step 2 through Step 4. You can repeat Steps 3-4 multiple times to deploy the most effective ML systems.

Examples of Data-centric AI