See also: Decision tree, Oblique condition, Threshold for decision trees, Categorical feature
An in-set condition is a type of split condition used inside a decision tree node. It tests whether the value of a single categorical feature belongs to a specified subset of that feature's possible values. Google's Machine Learning Glossary defines it as "a condition that tests for the presence of one item in a set of items," and gives the example house-style in [tudor, colonial, cape]. During inference, if the example's house-style value is one of those three, the condition evaluates to Yes and the example follows the positive branch; otherwise it goes the other way.
In-set conditions sit alongside the other two condition families that decision-forest libraries normally recognise: axis-aligned (numerical threshold) conditions and oblique conditions. They are the standard tool for categorical splits because they encode group membership directly inside one tree node, which is far more compact and usually more accurate than splitting on a one-hot encoded version of the same feature.
The formal shape of an in-set condition is
is feature_x in S?
where feature_x is a single categorical feature and S is a subset of that feature's vocabulary. Some examples:
species in {"cat", "dog", "bird"} (from Google's decision-forests course)country in {France, Germany, Spain}userAgent in {"Mozilla/5.0", "InternetExplorer/10.0"}fruit in {apple, banana}If the example's value lies in S, the example is sent down one child; otherwise it is sent down the other. Because there are only two children, the in-set condition is binary, just like a numerical threshold split. A multiway categorical split (one branch per category) is a different construct used by C4.5 and is not an in-set condition.
Google's decision-forests documentation groups split conditions into three families. The table below summarises how they differ.
| Condition type | Example | Features used | Feature kind | Typical libraries |
|---|---|---|---|---|
| Axis-aligned (numerical threshold) | area > 200 | One | Numerical | All decision-tree libraries |
| In-set | species in {cat, dog, bird} | One | Categorical | LightGBM, XGBoost (>=1.5), TensorFlow Decision Forests, YDF, CART |
| Oblique | 5 * num_legs + 2 * num_eyes >= 10 | Multiple | Numerical | YDF, TF-DF, oblique-tree research libraries |
An axis-aligned condition uses a threshold for decision trees to compare a single numerical feature against a learned cut-point. An oblique condition draws a hyperplane that combines several numerical features at once, which can capture diagonal decision boundaries that axis-aligned trees need many splits to approximate. An in-set condition is the categorical analogue of the axis-aligned condition: one feature, two children, but the test is set membership instead of inequality.
A categorical feature has no natural ordering. "Tudor" is not less than or greater than "colonial." Standard numerical splits do not apply directly, so trees need a different mechanism. There are three common approaches, each with different consequences for tree shape and accuracy.
| Approach | What it does | Trade-off |
|---|---|---|
| One-hot encoding plus axis-aligned splits | Convert each category to a 0/1 column, then use threshold conditions | Each split isolates a single category, leading to deep, imbalanced trees, especially for high-cardinality features |
| Multiway categorical split | One child per category | Used by C4.5 and ID3; leaves with very few samples, hurts variance |
| In-set condition (binary partition) | One child for categories in S, another for the rest | Compact, supports balanced trees, used by CART, LightGBM, TF-DF, and modern XGBoost |
Google's glossary notes that "in-set conditions usually lead to more efficient decision trees than conditions that test one-hot encoded features." The intuition: with one-hot, the tree can only ever pick one category at a time per split, so it takes many levels to express "the colour is red, blue, or green." An in-set condition expresses the same logic in a single node.
Finding the best in-set condition is harder than finding the best numerical threshold. With K distinct categories, the number of non-trivial bipartitions of the value set is 2^(K-1) - 1. Searching this space exhaustively becomes intractable around K = 30 or so.
Luckily there is a classic shortcut. Walter D. Fisher's 1958 paper "On Grouping for Maximum Homogeneity" (Journal of the American Statistical Association) shows that for a binary classification target with a concave splitting criterion such as Gini or entropy, the optimal partition can be found by:
This reduces the search from O(2^K) to O(K log K) and gives the same answer as the brute-force search. Breiman et al. proved an analogous result for regression with squared error in the original CART book. Most modern gradient-boosting libraries use Fisher's trick or a gradient-based extension of it.
| Setting | Optimal-split complexity |
|---|---|
| Brute force | O(2^(K-1)) partitions |
| Binary classification, Gini or entropy (Fisher 1958) | O(K log K) after sorting |
| Regression, squared error | O(K log K) after sorting by mean target |
| Multiclass, generic criteria | NP-hard in general; libraries use heuristics |
Not every decision-tree library implements in-set conditions, and those that do use slightly different splitting strategies. The table below covers the libraries most practitioners encounter.
| Library | Native in-set support | Splitting strategy | Notes |
|---|---|---|---|
| CART (Breiman et al., 1984) | Yes | Fisher-style sort for binary targets, exhaustive otherwise | The original native-categorical implementation |
| C4.5 (Quinlan, 1993) | Multiway, not in-set | One branch per category | Different family of categorical split |
scikit-learn DecisionTreeClassifier / Regressor | No | Requires one-hot or ordinal encoding upstream | HistGradientBoosting* does support native categoricals |
| LightGBM | Yes, via categorical_feature | Sorts histogram by sum_gradient / sum_hessian, picks best partition; Fisher 1958 + gradient extension | O(K log K) |
| XGBoost (>=1.5) | Yes, via enable_categorical=True with tree_method='hist' or 'approx' | Sorts gradient histogram, then partitions; max_cat_to_onehot controls the cutoff | Native support added 2021, still marked experimental in some versions |
| CatBoost | Native categorical support, but uses ordered target statistics rather than in-set conditions | Encodes categoricals as numerical target stats, then uses standard threshold splits in symmetric trees | Different mechanism, similar end goal |
| TensorFlow Decision Forests and YDF | Yes, first-class | Three different categorical-split algorithms, including optimal Fisher-style | YDF documents "optimal categorical splits" as a headline feature |
The scikit-learn case is worth flagging. The classic DecisionTreeClassifier and RandomForestClassifier cannot consume categorical features directly; you have to encode them yourself. For high-cardinality features this often means one-hot encoding, which scales badly. Practitioners who care about categorical performance in scikit-learn often switch to HistGradientBoostingClassifier, which does support native categoricals, or move to LightGBM or XGBoost.
Suppose you are predicting whether a customer will click an ad and one of your features is device_type with four possible values: phone, tablet, desktop, tv. Click-through rates in your training data look like this:
| Device type | Examples | Clicks | Click rate |
|---|---|---|---|
| phone | 4000 | 800 | 20% |
| tablet | 1000 | 150 | 15% |
| desktop | 3000 | 300 | 10% |
| tv | 500 | 25 | 5% |
Using Fisher's trick, you sort the categories by click rate descending: phone, tablet, desktop, tv. The candidate in-set conditions then reduce to the three prefix splits:
| Candidate condition | Left (Yes) child | Right (No) child |
|---|---|---|
device_type in {phone} | phone | tablet, desktop, tv |
device_type in {phone, tablet} | phone, tablet | desktop, tv |
device_type in {phone, tablet, desktop} | phone, tablet, desktop | tv |
You compute the Gini impurity reduction for each candidate and keep the best one. This is exactly three evaluations instead of 2^(4-1) - 1 = 7. The saving looks small here but grows fast: with K = 50 categories, 2^49 - 1 is more than 5 * 10^14, while K log K is roughly 280.
In-set conditions tend to dominate one-hot baselines in two situations:
For low-cardinality features (say, K <= 4), the difference is usually small. XGBoost's max_cat_to_onehot parameter exists for exactly this reason: below the threshold it falls back to one-hot, since the simpler representation is typically as good and slightly cheaper.
In-set conditions are powerful but not free.
The choice of split-condition family has a direct effect on tabular model quality. On benchmarks dominated by categorical features such as Criteo, Avazu, or KDD Cup data, libraries with native in-set support consistently outperform pipelines that one-hot encode then run a categorical-blind tree learner. That is one reason LightGBM, XGBoost, and CatBoost have largely displaced scikit-learn's classic tree implementations in industry tabular workflows.
In-set conditions also matter for interpretability work. A condition like country in {US, CA, MX} is more readable than "country_US == 1 OR country_CA == 1 OR country_MX == 1" reconstructed across three tree levels, even if the two encode the same logic.