Z-score normalization: Difference between revisions

m
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 5: Line 5:
Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s.
Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s.


==Example==
==Simple Example==
A feature with the mean of 500 and a standard deviation of 100.
A [[feature]] with the mean of 500 and a standard deviation of 100.
{| class="wikitable"
{| class="wikitable"
|
|
Line 40: Line 40:
#These values correspond to Z-scores for each data point.
#These values correspond to Z-scores for each data point.


==Example==
==Real-life Example==
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table:
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table:


Height (cm) | Weight (kg) |
{| class="wikitable"
| 180 | 85 | 150 | 55
! Height (cm)
! Weight (kg)
|-
| 180 || 85
|-
| 150 || 55
|-
|}


Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table:
Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table:


| Features | Mean | Standard Deviation |
{| class="wikitable"
Height (cm): 166 | 10.954
! Features
Weight (kg): 65.6 | 14.834 |
! Mean
! Standard Deviation
|-
| Height (cm) || 166 || 10.954
|-
| Weight (kg) || 65.6 || 14.834
|-
|}


By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table:
By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table:


Height (cm) | Weight (kg)
{| class="wikitable"
| 0.39 | 0.22
! Height Z-score
| -0.26 | 0.08
! Weight Z-score
| 1.04 | 1.28
|-
| -1.17 | -1.12
| 1.27807 || 1.30781
| -0.
|-
| -1.46065 || -0.71457
|-
|}


==Explain Like I'm 5 (ELI5)==
==Explain Like I'm 5 (ELI5)==
Line 69: Line 86:




[[Category:Terms]] [[Category:Machine learning terms]]
[[Category:Terms]] [[Category:Machine learning terms]] [[Category:not updated]]