Model-centric AI is the paradigm taught in most ML classes and revolves around producing the best model given a clean, well-curated dataset. In contrast, Data-centric AI involves systematically engineering data to build better AI systems.
Data-centric AI can come in two forms:
Examples of this include curriculum learning (training on ‘easy’ data first), and confident learning (filtering mislabeled data).
Recall that model-centric AI is focused on creating the best model for a particular dataset, while datacentric AI is focused on producing the best dataset to feed an ML model. In order to deploy the most effective supervised learning systems, it is important that you do both. This data-centric AI pipe can look something like this:
Despite tempting temptation, don't skip Step 2 through Step 4. You can repeat Steps 3-4 multiple times to deploy the most effective ML systems.
This field covers the following methods:
OpenAI stated openly that errors in data and labels were the main problem in training famous ML models such as Dall-E, GPT-3 and ChatGPT. These are stills from the demo of DallE 2.
Tesla was able to produce autonomous driving systems that are far more advanced than comparable competitors by using model-assisted data improvement (Step 3). The key to this success is the Data Engine. These slides are from Andrej Karpathy (Tesla Director of AI 2021).
In the US, data quality problems cost $3 trillion per year. It is difficult to guarantee data quality in large datasets without using algorithms. ChatGPT, a ML system that relies on human feedback to correct shortcomings arising out of low-quality training data has used ChatGPT as an example. However, automated methods are required to ensure that ML models are trained using clean data. Recent research has highlighted the importance of data-centric AI. This is an approach that uses simple methods to change the dataset and creates more accurate models. This course will teach you how to improve any ML model using its data. It can be used to train and supervised ML models.