Z-score normalization: Difference between revisions

m
No edit summary
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{see also|Machine learning terms}}
{{see also|Machine learning terms}}
==Introduction==
==Introduction==
Data normalization in machine learning is a critical preprocessing step that helps boost the performance of many algorithms. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of features. One widely-used technique for normalization is Z-score normalization (also referred to as standardization).
[[Z-score normalization]] is a type of [[data scaling]] that transforms [[data]] values to have a [[mean]] of zero and [[standard deviation]] of one. This transformation occurs by subtracting the mean from each value and dividing by its standard deviation. The results are known as [[Z-score]]s, which indicate how far away from the mean each data point is.


==What is Z-score normalization?==
Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s.
Z-score normalization is a type of data scaling that transforms data values to have a mean of zero and standard deviation of one. This transformation occurs by subtracting the mean from each value and dividing by its standard deviation. The results are known as Z-scores, which indicate how far away from the mean each data point is.
 
==Simple Example==
A [[feature]] with the mean of 500 and a standard deviation of 100.
{| class="wikitable"
|
|-
! raw value
! Z-score
|-
| 500 || 0
|-
| 600 || 1
|-
| 355 || -1.45
|-
|}
[[Model]] will train on the Z-score instead of raw values


==Why is Z-score normalization used?==
==Why is Z-score normalization used?==
Z-score normalization is a technique commonly used in machine learning to address the issue of feature scaling. When features in a dataset have different scales or units, it can cause issues for certain machine learning algorithms that rely on distance-based calculations such as k-nearest neighbors (KNN) or support vector machines (SVM), which require equal weighting across all features in the analysis. With Z-score normalization, however, we can standardize these dimensions so that each contributes equally to our analysis.
Z-score normalization is a technique commonly used in machine learning to address the issue of [[feature scaling]]. When features in a dataset have different scales or units, it can cause issues for certain machine learning algorithms that rely on distance-based calculations such as [[k-nearest neighbors]] (KNN) or [[support vector machine]]s (SVM), which require equal weighting across all features in the analysis. With Z-score normalization, however, we can standardize these dimensions so that each contributes equally to our analysis.


==How is Z-score normalization performed?==
==How is Z-score normalization performed?==
Z-score normalization is a straightforward formula that can be applied to each feature within an array. It consists of:
Z-score normalization is a straightforward formula that can be applied to each feature within an array. It consists of:


$z = (x - mu) / sigma$
Z = (x - µ) / σ
Where: ($z$ is the Z-score for a particular data value); ($x$ is its original data value); and ($mu$ stands for mean of all features in that feature; while $sigma$ stands for standard deviation of those data values).
*Z is the Z-score for a particular data value
*x is its original data value
stands for mean of all data values in that feature
stands for standard deviation of those data values for the feature


To apply Z-score normalization to a dataset, we must perform the following steps:
To apply Z-score normalization to a dataset, we must perform the following steps:


1. Calculate the mean and standard deviation for each feature in the dataset.
#Calculate the mean and standard deviation for each feature in the [[dataset]].
2. For each data value within a feature, subtract its mean value and divide by its standard deviation.
#For each data value within a feature, subtract its mean value and divide by its standard deviation.
3. These values correspond to Z-scores for each data point.
#These values correspond to Z-scores for each data point.


==Example==
==Real-life Example==
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table:
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table:


Height (cm) | Weight (kg) |
{| class="wikitable"
| 180 | 85 | 150 | 55
! Height (cm)
! Weight (kg)
|-
| 180 || 85
|-
| 150 || 55
|-
|}


Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table:
Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table:


| Features | Mean | Standard Deviation |
{| class="wikitable"
Height (cm): 166 | 10.954
! Features
Weight (kg): 65.6 | 14.834 |
! Mean
! Standard Deviation
|-
| Height (cm) || 166 || 10.954
|-
| Weight (kg) || 65.6 || 14.834
|-
|}


By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table:
By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table:


Height (cm) | Weight (kg)
{| class="wikitable"
| 0.39 | 0.22
! Height Z-score
| -0.26 | 0.08
! Weight Z-score
| 1.04 | 1.28
|-
| -1.17 | -1.12
| 1.27807 || 1.30781
| -0.
|-
| -1.46065 || -0.71457
|-
|}


==Explain Like I'm 5 (ELI5)==
==Explain Like I'm 5 (ELI5)==
Line 50: Line 86:




[[Category:Terms]] [[Category:Machine learning terms]]
[[Category:Terms]] [[Category:Machine learning terms]] [[Category:not updated]]