Feature: Difference between revisions

1,341 bytes added ,  17 March 2023
m
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{see also|Machine learning terms}}
==Introduction==
==Introduction==
[[Feature]] is an [[input]] variable to a [[machine learning model]]. An [[example]] consists of 1 or more features.
[[Feature]] is an [[input]] variable to a [[machine learning model]]. An [[example]] consists of 1 or more features.
Line 7: Line 8:


==Example==
==Example==
 
{| class="wikitable"
|+ 2 [[example]]s with 3 features and a label each
|-
! colspan="3"| Features
! colspan="1"| Label
|-
! Location
! Bedroom #
! Bathroom #
! Price
|-
| Midtown || 3 || 2 || $800,000
|-
| Uptown || 2 || 2 || $500,000
|-
|}


==Types of Features==
==Types of Features==
Line 13: Line 29:


===Numerical Features===
===Numerical Features===
Numeric features are variables that take on numerical values, such as age, height, weight or temperature. These features may either be continuous or discrete; continuous ones take any value within a range while discrete ones only accept specific ones.
Numeric features are variables that take on numerical values, such as age, height, weight or temperature. These features may either be [[continuous]] or [[discrete]]; continuous ones take any value within a range while discrete ones only accept specific ones.


Machine learning often employs standardization or normalization of numerical features to a common scale, which can help enhance the accuracy of resulting models. Standardization involves subtracting the mean and dividing by the standard deviation; conversely, normalization involves scaling values between 0 and 1.
Machine learning often employs standardization or [[normalization]] of numerical features to a common scale, which can help enhance the [[accuracy]] of resulting models. Standardization involves subtracting the mean and dividing by the standard deviation; conversely, normalization involves scaling values between 0 and 1.


===Categorical Features===
===Categorical Features===
Categorical features are variables that take on a set of values or categories, such as gender, color, occupation. Usually represented as strings or integers, categorical features can be one-hot encoded - this means each category is represented as a binary variable.
Categorical features are variables that take on a set of values or categories, such as gender, color, occupation. Usually represented as strings or integers, categorical features can be [[one-hot encoding|one-hot encoded]] - this means each category is represented as a binary variable.


One-hot encoding involves creating a new binary variable for each category, setting it to 1 if the data point belongs to that category and 0 otherwise. This enables machine learning algorithms to treat each category as its own feature, potentially improving model accuracy.
One-hot encoding involves creating a new binary variable for each category, setting it to 1 if the data point belongs to that category and 0 otherwise. This enables machine learning algorithms to treat each category as its own feature, potentially improving model accuracy.


===Text Features===
===Text Features===
Text features are variables containing natural language text, such as product reviews, customer feedback, news articles and more. They require a different approach to feature engineering due to their often unstructured nature and high noise content.
Text features are variables containing [[natural language]] text, such as product reviews, customer feedback, news articles and more. They require a different approach to feature engineering due to their often unstructured nature and high noise content.


Machine learning typically preprocesses text features and transforms them into numerical representations such as a bag-of-words matrix or TF-IDF matrix. A bag-of-words matrix represents each document as a vector of word counts, while the TF-IDF matrix displays each document's term frequencies adjusted for their importance within the corpus.
Machine learning typically preprocesses text features and transforms them into numerical representations such as a [[bag-of-words]] matrix or [[TF-IDF]] matrix. A bag-of-words matrix represents each document as a vector of word counts, while the TF-IDF matrix displays each document's term frequencies adjusted for their importance within the corpus.


==Feature Selection==
==Feature Selection==
Feature selection is the process of identifying and selecting the most pertinent features for a machine learning problem. Its aim is to reduce data dimensionality while retaining informative elements pertinent to the target variable.
[[Feature selection]] is the process of identifying and selecting the most important features for a machine learning problem. Its aim is to reduce data dimensionality while retaining informative elements pertinent to the target variable.


When selecting features for feature selection, there are three primary methods: filter methods, wrapper methods and embedded methods. Filter methods involve ranking features based on statistical significance or correlation with the target variable and selecting those with the highest ranking. Wrapper methods evaluate different subsets of features using a machine learning algorithm and selecting those which produce optimal performance. Embedded methods incorporate feature selection into training of the machine learning algorithm itself.
When selecting features for feature selection, there are three primary methods: filter methods, wrapper methods and embedded methods. Filter methods involve ranking features based on statistical significance or correlation with the target variable and selecting those with the highest ranking. Wrapper methods evaluate different subsets of features using a machine learning algorithm and selecting those which produce optimal performance. Embedded methods incorporate feature selection into training of the machine learning algorithm itself.


==Feature Engineering==
==Feature Engineering==
Feature engineering is the process of creating new features from existing ones in order to enhance the accuracy and usefulness of models generated. This involves applying domain-specific knowledge in order to transform or combine raw features into more informative representations.
[[Feature engineering]] is the process of creating new features from existing ones in order to enhance the accuracy and usefulness of models generated. This involves applying domain-specific knowledge in order to transform or combine raw features into more informative representations.
 
Feature engineering can involve several techniques, such as [[scaling]], [[normalization]], [[binning]], [[one-hot encoding]] of polynomial features and interaction terms. The purpose is to extract the most informative signal from data while reducing noise and redundancy within features.
 
==Explain Like I'm 5 (ELI5)==
Machine learning works like this: features are like cues that assist the computer in discovering something.
 
Imagine playing a guessing game with your friend where they must guess which animal you are thinking of. You might provide some clues such as "it has fur" or "it's really big." These features act like cues to help them identify which animal it is you have in mind.
 
Machine learning involves providing features to a computer so it can learn about something, like pictures of animals. We might give the computer cues such as "it has four legs" or "it has pointy ears." These cues help the machine recognize which animal is in a picture quickly and accurately.
 
By giving the computer numerous features, it can learn to recognize patterns in data and make predictions on its own. It's like how you can guess what animal your friend is thinking of based on clues provided.


Feature engineering can involve several techniques, such as scaling, normalization, binning, one-hot encoding of polynomial features and interaction terms. The purpose is to extract the most informative signal from data while reducing noise and redundancy within features.
[[Category:Terms]] [[Category:Machine learning terms]] [[Category:not updated]]