Jump to content

Reporting bias: Difference between revisions

 
Line 2: Line 2:
'''Reporting bias''' in machine learning is a systematic distortion of the information used to train and evaluate machine learning models. This distortion arises when the data being used to train a model is influenced by factors that are not representative of the true underlying phenomenon. These factors can lead to an overestimation or underestimation of certain model predictions, ultimately affecting the performance and reliability of the model. This article will discuss the causes and implications of reporting bias, as well as strategies to mitigate its effects.
'''Reporting bias''' in machine learning is a systematic distortion of the information used to train and evaluate machine learning models. This distortion arises when the data being used to train a model is influenced by factors that are not representative of the true underlying phenomenon. These factors can lead to an overestimation or underestimation of certain model predictions, ultimately affecting the performance and reliability of the model. This article will discuss the causes and implications of reporting bias, as well as strategies to mitigate its effects.


==Causes of reporting bias==
==Causes==
Reporting bias can stem from several sources, including data collection methods, sample selection, and data preprocessing.
Reporting bias can stem from several sources, including data collection methods, sample selection, and data preprocessing.


===Data Collection Methods===
===Data collection methods===
Data collection methods can introduce reporting bias if they selectively capture certain types of information. For example, in the context of [[sentiment analysis]], a model trained on product reviews may be biased if users are more likely to write reviews when they have strong opinions, either positive or negative, about the product.
Data collection methods can introduce reporting bias if they selectively capture certain types of information. For example, in the context of [[sentiment analysis]], a model trained on product reviews may be biased if users are more likely to write reviews when they have strong opinions, either positive or negative, about the product.


===Sample Selection===
===Sample selection===
Sample selection can lead to reporting bias when the data used to train a model is not representative of the target population. This may occur if the data is collected from a biased subset of the population, or if the data is subject to [[sampling bias]] in which certain instances are overrepresented or underrepresented in the sample.
Sample selection can lead to reporting bias when the data used to train a model is not representative of the target population. This may occur if the data is collected from a biased subset of the population, or if the data is subject to [[sampling bias]] in which certain instances are overrepresented or underrepresented in the sample.


===Data Preprocessing===
===Data preprocessing===
Data preprocessing, such as data cleaning, feature extraction, and data transformation, can also introduce reporting bias. For instance, the removal of outliers or the imputation of missing values may lead to a distortion of the underlying data distribution.
Data preprocessing, such as data cleaning, feature extraction, and data transformation, can also introduce reporting bias. For instance, the removal of outliers or the imputation of missing values may lead to a distortion of the underlying data distribution.


106

edits