Z-score normalization: Difference between revisions
No edit summary |
m (Text replacement - "Category:Machine learning terms" to "Category:Machine learning terms Category:not updated") |
||
(3 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s. | Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s. | ||
==Example== | ==Simple Example== | ||
A [[feature]] with the mean of 500 and a standard deviation of 100. | A [[feature]] with the mean of 500 and a standard deviation of 100. | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 40: | Line 40: | ||
#These values correspond to Z-scores for each data point. | #These values correspond to Z-scores for each data point. | ||
==Example== | ==Real-life Example== | ||
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table: | Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table: | ||
{| class="wikitable" | {| class="wikitable" | ||
! Height (cm) | ! Height (cm) | ||
! Weight (kg | ! Weight (kg) | ||
|- | |- | ||
| 180 || 85 | | 180 || 85 | ||
Line 58: | Line 56: | ||
{| class="wikitable" | {| class="wikitable" | ||
! Features | ! Features | ||
! Mean | ! Mean | ||
Line 72: | Line 68: | ||
By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table: | By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table: | ||
Height | {| class="wikitable" | ||
| | ! Height Z-score | ||
| | ! Weight Z-score | ||
| 1. | |- | ||
| -1. | | 1.27807 || 1.30781 | ||
| - | |- | ||
| -1.46065 || -0.71457 | |||
|- | |||
|} | |||
==Explain Like I'm 5 (ELI5)== | ==Explain Like I'm 5 (ELI5)== | ||
Line 87: | Line 86: | ||
[[Category:Terms]] [[Category:Machine learning terms]] | [[Category:Terms]] [[Category:Machine learning terms]] [[Category:not updated]] |
Latest revision as of 21:16, 17 March 2023
- See also: Machine learning terms
Introduction
Z-score normalization is a type of data scaling that transforms data values to have a mean of zero and standard deviation of one. This transformation occurs by subtracting the mean from each value and dividing by its standard deviation. The results are known as Z-scores, which indicate how far away from the mean each data point is.
Data normalization in machine learning is a critical preprocessing step that helps boost the performance of many algorithms. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of features.
Simple Example
A feature with the mean of 500 and a standard deviation of 100.
raw value | Z-score |
---|---|
500 | 0 |
600 | 1 |
355 | -1.45 |
Model will train on the Z-score instead of raw values
Why is Z-score normalization used?
Z-score normalization is a technique commonly used in machine learning to address the issue of feature scaling. When features in a dataset have different scales or units, it can cause issues for certain machine learning algorithms that rely on distance-based calculations such as k-nearest neighbors (KNN) or support vector machines (SVM), which require equal weighting across all features in the analysis. With Z-score normalization, however, we can standardize these dimensions so that each contributes equally to our analysis.
How is Z-score normalization performed?
Z-score normalization is a straightforward formula that can be applied to each feature within an array. It consists of:
Z = (x - µ) / σ
- Z is the Z-score for a particular data value
- x is its original data value
- µ stands for mean of all data values in that feature
- σ stands for standard deviation of those data values for the feature
To apply Z-score normalization to a dataset, we must perform the following steps:
- Calculate the mean and standard deviation for each feature in the dataset.
- For each data value within a feature, subtract its mean value and divide by its standard deviation.
- These values correspond to Z-scores for each data point.
Real-life Example
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table:
Height (cm) | Weight (kg) |
---|---|
180 | 85 |
150 | 55 |
Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table:
Features | Mean | Standard Deviation |
---|---|---|
Height (cm) | 166 | 10.954 |
Weight (kg) | 65.6 | 14.834 |
By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table:
Height Z-score | Weight Z-score |
---|---|
1.27807 | 1.30781 |
-1.46065 | -0.71457 |
Explain Like I'm 5 (ELI5)
Hey there! Have you ever noticed how when having different-sized toys, it can be challenging to compare them? Well, in machine learning it works similarly - sometimes numbers may appear huge while other times they may appear tiny. Comparing and using these numbers in calculations can be tricky since the larger ones often have greater effects than their smaller counterparts.
Z-score normalization can help! It's a way of making all numbers the same size so we can compare them more easily. Imagine having an automated machine that could shrink or grow each toy so they were all the same size, no matter their original dimensions!
To do this, we utilize some math to calculate the average size and spread of all numbers. Then we adjust each one so it fits within this average and spread, creating similar-sized numbers so we can compare and use them more conveniently.