Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{see also|Machine learning terms}} | {{see also|Machine learning terms}} | ||
==Introduction== | ==Introduction== | ||
Binary classification is a type of machine learning problem where the goal is to classify input data into one of two classes or categories. These | [[Binary classification]] is a type of [[machine learning]] problem where the goal is to classify [[input]] [[data]] into one of two [[classes]] or categories. These classes may be labeled positive (1) and negative (0), or true(1) and false(0) respectively. The classes are mutually exclusive. | ||
==What is Binary Classification?== | ==What is Binary Classification?== | ||
In binary classification, a machine learning algorithm learns to classify input data into one of two | In binary classification, a machine learning algorithm learns to classify input data into one of two classes based on [[label]]ed [[training data]]. When given input data, this algorithm makes an educated guess as to which class the input belongs in. | ||
Binary classification involves classifying input data into two classes based on learned patterns from training data, such as spam or not spam, fraud or not fraud and disease or not disease. The goal is to accurately classify new input data into the appropriate class based on these learned patterns. | Binary classification involves classifying input data into two classes based on learned patterns from training data, such as spam or not spam, fraud or not fraud and disease or not disease. The goal is to accurately classify new input data into the appropriate class based on these learned patterns. | ||
Binary classification is a supervised learning task, meaning the algorithm is trained using labeled data where each data point has been assigned a label indicating its class membership. | Binary classification is a [[supervised learning]] task, meaning the [[algorithm]] is trained using labeled data where each data point has been assigned a label indicating its class membership. | ||
==How is Binary Classification Accomplished?== | ==How is Binary Classification Accomplished?== | ||
Binary classification is usually accomplished using a machine learning algorithm trained on labeled training data. When presented with input data, this algorithm makes an educated guess as to which class it belongs in. | Binary classification is usually accomplished using a machine learning algorithm trained on labeled training data. When presented with input data, this algorithm makes an educated guess as to which class it belongs in. | ||
The performance of an algorithm is evaluated on a separate set of test data from the training data. The labeled test data is labeled, and the performance of the algorithm is judged by comparing predicted labels to true labels in this test set. | The performance of an algorithm is evaluated on a separate set of test data from the training data. The labeled test data is labeled, and the performance of the algorithm is judged by comparing predicted labels to true labels in this [[test set]]. | ||
Binary classification requires the use of machine learning algorithms such as logistic regression, decision | Binary classification requires the use of machine learning algorithms such as [[logistic regression]], [[decision tree]]s, [[random forest]]s and [[support vector machine]]s (SVM). The specific choice depends on the problem being solved and characteristics of the data. | ||
==Evaluation Metrics for Binary Classification== | ==Evaluation Metrics for Binary Classification== | ||
The performance of a binary classification model is evaluated using various | The performance of a binary classification model is evaluated using various [[metric]]s such as [[accuracy]], [[precision]], [[recall]] and [[F1 score]]. | ||
Accuracy measures the percentage of correct predictions made by the model on a set of test data. Precision is the proportion of true positive predictions among all positive predictions made by the model, while recall measures how many true positive samples there were among all actual positive samples in the test data. The F1 score is calculated as an average of precision and recall. | Accuracy measures the percentage of correct predictions made by the model on a set of test data. Precision is the proportion of true positive predictions among all positive predictions made by the model, while recall measures how many true positive samples there were among all actual positive samples in the test data. The F1 score is calculated as an average of precision and recall. |