Feature

Introduction

Feature is an input variable to a machine learning model. An example consists of 1 or more features.

Machine learning takes into account features, which are quantifiable aspects or characteristics of a data point that are used to build predictive models. These elements also referred to as predictors or independent variables, are selected based on their capacity for explaining variations in the dependent variable - that is, the target variable that the model seeks to predict also known as label.

Features are an integral component of machine learning algorithms, as they influence the accuracy and effectiveness of the resulting models. By selecting relevant features and creating a suitable model, machine learning algorithms can learn patterns and relationships within data and use them to make predictions on new, unseen data sets.

Example

2 examples with 3 features and a label each
Features			Label
Location	Bedroom #	Bathroom #	Price
Midtown	3	2	$800,000
Uptown	2	2	$500,000

Types of Features

Features can be broadly divided into three types: numerical, categorical and text. Each requires a distinct approach when preprocessing and feature engineering.

Numerical Features

Numeric features are variables that take on numerical values, such as age, height, weight or temperature. These features may either be continuous or discrete; continuous ones take any value within a range while discrete ones only accept specific ones.

Machine learning often employs standardization or normalization of numerical features to a common scale, which can help enhance the accuracy of resulting models. Standardization involves subtracting the mean and dividing by the standard deviation; conversely, normalization involves scaling values between 0 and 1.

Categorical Features

Categorical features are variables that take on a set of values or categories, such as gender, color, occupation. Usually represented as strings or integers, categorical features can be one-hot encoded - this means each category is represented as a binary variable.

One-hot encoding involves creating a new binary variable for each category, setting it to 1 if the data point belongs to that category and 0 otherwise. This enables machine learning algorithms to treat each category as its own feature, potentially improving model accuracy.

Text Features

Text features are variables containing natural language text, such as product reviews, customer feedback, news articles and more. They require a different approach to feature engineering due to their often unstructured nature and high noise content.

Machine learning typically preprocesses text features and transforms them into numerical representations such as a bag-of-words matrix or TF-IDF matrix. A bag-of-words matrix represents each document as a vector of word counts, while the TF-IDF matrix displays each document's term frequencies adjusted for their importance within the corpus.

Feature Selection

Feature selection is the process of identifying and selecting the most pertinent features for a machine learning problem. Its aim is to reduce data dimensionality while retaining informative elements pertinent to the target variable.

When selecting features for feature selection, there are three primary methods: filter methods, wrapper methods and embedded methods. Filter methods involve ranking features based on statistical significance or correlation with the target variable and selecting those with the highest ranking. Wrapper methods evaluate different subsets of features using a machine learning algorithm and selecting those which produce optimal performance. Embedded methods incorporate feature selection into training of the machine learning algorithm itself.

Feature Engineering

Feature engineering is the process of creating new features from existing ones in order to enhance the accuracy and usefulness of models generated. This involves applying domain-specific knowledge in order to transform or combine raw features into more informative representations.

Feature engineering can involve several techniques, such as scaling, normalization, binning, one-hot encoding of polynomial features and interaction terms. The purpose is to extract the most informative signal from data while reducing noise and redundancy within features.