Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
No edit summary |
m (Text replacement - "Category:Machine learning terms" to "Category:Machine learning terms Category:not updated") |
||
(4 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s. | Data [[normalization]] in [[machine learning]] is a critical preprocessing step that helps boost the performance of many [[algorithm]]s. Normalization involves scaling data to a specified range or distribution to reduce the impact of differences in scale or units of [[feature]]s. | ||
==Example== | ==Simple Example== | ||
A feature with the mean of 500 and a standard deviation of 100. | A [[feature]] with the mean of 500 and a standard deviation of 100. | ||
{| class="wikitable" | {| class="wikitable" | ||
| | | | ||
Line 40: | Line 40: | ||
#These values correspond to Z-scores for each data point. | #These values correspond to Z-scores for each data point. | ||
==Example== | ==Real-life Example== | ||
Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table: | Let us assume we have a dataset with two features, height (in cm) and weight (in kg), that we would like to apply Z-score normalization to. The data values for these features can be seen in the following table: | ||
Height (cm) | {| class="wikitable" | ||
| 180 | 85 | 150 | 55 | ! Height (cm) | ||
! Weight (kg) | |||
|- | |||
| 180 || 85 | |||
|- | |||
| 150 || 55 | |||
|- | |||
|} | |||
Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table: | Before applying Z-score normalization to the dataset, we must first calculate the mean and standard deviation for each feature. These values can be found in the following table: | ||
| Features | {| class="wikitable" | ||
Height (cm) | ! Features | ||
Weight (kg) | ! Mean | ||
! Standard Deviation | |||
|- | |||
| Height (cm) || 166 || 10.954 | |||
|- | |||
| Weight (kg) || 65.6 || 14.834 | |||
|- | |||
|} | |||
By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table: | By applying the formula for Z-score normalization to each data value in our dataset, we can calculate Z-scores individually. The results are displayed in the following table: | ||
Height | {| class="wikitable" | ||
| | ! Height Z-score | ||
| | ! Weight Z-score | ||
| 1. | |- | ||
| -1. | | 1.27807 || 1.30781 | ||
| - | |- | ||
| -1.46065 || -0.71457 | |||
|- | |||
|} | |||
==Explain Like I'm 5 (ELI5)== | ==Explain Like I'm 5 (ELI5)== | ||
Line 69: | Line 86: | ||
[[Category:Terms]] [[Category:Machine learning terms]] | [[Category:Terms]] [[Category:Machine learning terms]] [[Category:not updated]] |