Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
m (Text replacement - "Category:Machine learning terms" to "Category:Machine learning terms Category:not updated") |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{see also|Machine learning terms}} | {{see also|Machine learning terms}} | ||
==Introduction== | ==Introduction== | ||
Z-score normalization is a type of data scaling that transforms data values to have a mean of zero and standard deviation of one. This transformation occurs by subtracting the mean from each value and dividing by its standard deviation. The results are known as Z- | [[Z-score normalization]] is a type of [[data scaling]] that transforms [[data]] values to have a [[mean]] of zero and [[standard deviation]] of one. This transformation occurs by subtracting the mean from each value and dividing by its standard deviation. The results are known as [[Z-score]]s, which indicate how far away from the mean each data point is. | ||
Data normalization in machine learning is a critical preprocessing step that helps boost the performance of many | Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s. | ||
==Example== | ==Simple Example== | ||
A feature with the mean of 500 and a standard deviation of 100. | A [[feature]] with the mean of 500 and a standard deviation of 100. | ||
{| class="wikitable" | {| class="wikitable" | ||
| | | | ||
Line 23: | Line 23: | ||
==Why is Z-score normalization used?== | ==Why is Z-score normalization used?== | ||
Z-score normalization is a technique commonly used in machine learning to address the issue of feature scaling. When features in a dataset have different scales or units, it can cause issues for certain machine learning algorithms that rely on distance-based calculations such as k-nearest neighbors (KNN) or support vector | Z-score normalization is a technique commonly used in machine learning to address the issue of [[feature scaling]]. When features in a dataset have different scales or units, it can cause issues for certain machine learning algorithms that rely on distance-based calculations such as [[k-nearest neighbors]] (KNN) or [[support vector machine]]s (SVM), which require equal weighting across all features in the analysis. With Z-score normalization, however, we can standardize these dimensions so that each contributes equally to our analysis. | ||
==How is Z-score normalization performed?== | ==How is Z-score normalization performed?== | ||
Z-score normalization is a straightforward formula that can be applied to each feature within an array. It consists of: | Z-score normalization is a straightforward formula that can be applied to each feature within an array. It consists of: | ||
Z = (x - µ) / σ | |||
*Z is the Z-score for a particular data value | |||
*x is its original data value | |||
*µ stands for mean of all data values in that feature | |||
*σ stands for standard deviation of those data values for the feature | |||
To apply Z-score normalization to a dataset, we must perform the following steps: | To apply Z-score normalization to a dataset, we must perform the following steps: | ||
#Calculate the mean and standard deviation for each feature in the [[dataset]]. | |||
#For each data value within a feature, subtract its mean value and divide by its standard deviation. | |||
#These values correspond to Z-scores for each data point. | |||
==Example== | ==Real-life Example== | ||
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table: | Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table: | ||
Height (cm) | {| class="wikitable" | ||
| 180 | 85 | 150 | 55 | ! Height (cm) | ||
! Weight (kg) | |||
|- | |||
| 180 || 85 | |||
|- | |||
| 150 || 55 | |||
|- | |||
|} | |||
Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table: | Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table: | ||
| Features | {| class="wikitable" | ||
Height (cm) | ! Features | ||
Weight (kg) | ! Mean | ||
! Standard Deviation | |||
|- | |||
| Height (cm) || 166 || 10.954 | |||
|- | |||
| Weight (kg) || 65.6 || 14.834 | |||
|- | |||
|} | |||
By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table: | By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table: | ||
Height | {| class="wikitable" | ||
| | ! Height Z-score | ||
| | ! Weight Z-score | ||
| 1. | |- | ||
| -1. | | 1.27807 || 1.30781 | ||
| - | |- | ||
| -1.46065 || -0.71457 | |||
|- | |||
|} | |||
==Explain Like I'm 5 (ELI5)== | ==Explain Like I'm 5 (ELI5)== | ||
Line 66: | Line 86: | ||
[[Category:Terms]] [[Category:Machine learning terms]] | [[Category:Terms]] [[Category:Machine learning terms]] [[Category:not updated]] |