In-set condition

An in-set condition is a type of split condition used inside a decision tree node. It tests whether the value of a single categorical feature belongs to a specified subset of that feature's possible values. Google's Machine Learning Glossary defines it as "a condition that tests for the presence of one item in a set of items," and gives the example house-style in [tudor, colonial, cape]. During inference, if the example's house-style value is one of those three, the condition evaluates to Yes and the example follows the positive branch; otherwise it goes the other way.

In-set conditions sit alongside the other two condition families that decision-forest libraries normally recognise: axis-aligned (numerical threshold) conditions and oblique conditions. They are the standard tool for categorical splits because they encode group membership directly inside one tree node, which is far more compact and usually more accurate than splitting on a one-hot encoded version of the same feature.

What the condition actually looks like

The formal shape of an in-set condition is

is feature_x in S?

where feature_x is a single categorical feature and S is a subset of that feature's vocabulary. Some examples:

species in {"cat", "dog", "bird"} (from Google's decision-forests course)
country in {France, Germany, Spain}
userAgent in {"Mozilla/5.0", "InternetExplorer/10.0"}
fruit in {apple, banana}

If the example's value lies in S, the example is sent down one child; otherwise it is sent down the other. Because there are only two children, the in-set condition is binary, just like a numerical threshold split. A multiway categorical split (one branch per category) is a different construct used by C4.5 and is not an in-set condition.

How in-set conditions compare to other condition types

Google's decision-forests documentation groups split conditions into three families. The table below summarises how they differ.

Condition type	Example	Features used	Feature kind	Typical libraries
Axis-aligned (numerical threshold)	`area > 200`	One	Numerical	All decision-tree libraries
In-set	`species in {cat, dog, bird}`	One	Categorical	LightGBM, XGBoost (>=1.5), TensorFlow Decision Forests, YDF, CART
Oblique	`5 * num_legs + 2 * num_eyes >= 10`	Multiple	Numerical	YDF, TF-DF, oblique-tree research libraries

An axis-aligned condition uses a threshold for decision trees to compare a single numerical feature against a learned cut-point. An oblique condition draws a hyperplane that combines several numerical features at once, which can capture diagonal decision boundaries that axis-aligned trees need many splits to approximate. An in-set condition is the categorical analogue of the axis-aligned condition: one feature, two children, but the test is set membership instead of inequality.

Why categorical features need a special condition type

A categorical feature has no natural ordering. "Tudor" is not less than or greater than "colonial." Standard numerical splits do not apply directly, so trees need a different mechanism. There are three common approaches, each with different consequences for tree shape and accuracy.

Approach	What it does	Trade-off
One-hot encoding plus axis-aligned splits	Convert each category to a 0/1 column, then use threshold conditions	Each split isolates a single category, leading to deep, imbalanced trees, especially for high-cardinality features
Multiway categorical split	One child per category	Used by C4.5 and ID3; leaves with very few samples, hurts variance
In-set condition (binary partition)	One child for categories in `S`, another for the rest	Compact, supports balanced trees, used by CART, LightGBM, TF-DF, and modern XGBoost

Google's glossary notes that "in-set conditions usually lead to more efficient decision trees than conditions that test one-hot encoded features." The intuition: with one-hot, the tree can only ever pick one category at a time per split, so it takes many levels to express "the colour is red, blue, or green." An in-set condition expresses the same logic in a single node.

Computational complexity

Finding the best in-set condition is harder than finding the best numerical threshold. With K distinct categories, the number of non-trivial bipartitions of the value set is 2^(K-1) - 1. Searching this space exhaustively becomes intractable around K = 30 or so.

Luckily there is a classic shortcut. Walter D. Fisher's 1958 paper "On Grouping for Maximum Homogeneity" (Journal of the American Statistical Association) shows that for a binary classification target with a concave splitting criterion such as Gini or entropy, the optimal partition can be found by:

Sorting the categories by the proportion of positive labels they each carry.
Treating the sorted categories as if they were ordered numerical values.
Picking the best prefix split.

This reduces the search from O(2^K) to O(K log K) and gives the same answer as the brute-force search. Breiman et al. proved an analogous result for regression with squared error in the original CART book. Most modern gradient-boosting libraries use Fisher's trick or a gradient-based extension of it.

Setting	Optimal-split complexity
Brute force	`O(2^(K-1))` partitions
Binary classification, Gini or entropy (Fisher 1958)	`O(K log K)` after sorting
Regression, squared error	`O(K log K)` after sorting by mean target
Multiclass, generic criteria	NP-hard in general; libraries use heuristics

Library support

Not every decision-tree library implements in-set conditions, and those that do use slightly different splitting strategies. The table below covers the libraries most practitioners encounter.

Library	Native in-set support	Splitting strategy	Notes
CART (Breiman et al., 1984)	Yes	Fisher-style sort for binary targets, exhaustive otherwise	The original native-categorical implementation
C4.5 (Quinlan, 1993)	Multiway, not in-set	One branch per category	Different family of categorical split
scikit-learn `DecisionTreeClassifier` / `Regressor`	No	Requires one-hot or ordinal encoding upstream	`HistGradientBoosting*` does support native categoricals
LightGBM	Yes, via `categorical_feature`	Sorts histogram by `sum_gradient / sum_hessian`, picks best partition; Fisher 1958 + gradient extension	`O(K log K)`
XGBoost (>=1.5)	Yes, via `enable_categorical=True` with `tree_method='hist'` or `'approx'`	Sorts gradient histogram, then partitions; `max_cat_to_onehot` controls the cutoff	Native support added 2021, still marked experimental in some versions
CatBoost	Native categorical support, but uses ordered target statistics rather than in-set conditions	Encodes categoricals as numerical target stats, then uses standard threshold splits in symmetric trees	Different mechanism, similar end goal
TensorFlow Decision Forests and YDF	Yes, first-class	Three different categorical-split algorithms, including optimal Fisher-style	YDF documents "optimal categorical splits" as a headline feature

The scikit-learn case is worth flagging. The classic DecisionTreeClassifier and RandomForestClassifier cannot consume categorical features directly; you have to encode them yourself. For high-cardinality features this often means one-hot encoding, which scales badly. Practitioners who care about categorical performance in scikit-learn often switch to HistGradientBoostingClassifier, which does support native categoricals, or move to LightGBM or XGBoost.

A worked example

Suppose you are predicting whether a customer will click an ad and one of your features is device_type with four possible values: phone, tablet, desktop, tv. Click-through rates in your training data look like this:

Device type	Examples	Clicks	Click rate
phone	4000	800	20%
tablet	1000	150	15%
desktop	3000	300	10%
tv	500	25	5%

Using Fisher's trick, you sort the categories by click rate descending: phone, tablet, desktop, tv. The candidate in-set conditions then reduce to the three prefix splits:

Candidate condition	Left (Yes) child	Right (No) child
`device_type in {phone}`	phone	tablet, desktop, tv
`device_type in {phone, tablet}`	phone, tablet	desktop, tv
`device_type in {phone, tablet, desktop}`	phone, tablet, desktop	tv

You compute the Gini impurity reduction for each candidate and keep the best one. This is exactly three evaluations instead of 2^(4-1) - 1 = 7. The saving looks small here but grows fast: with K = 50 categories, 2^49 - 1 is more than 5 * 10^14, while K log K is roughly 280.

When in-set conditions help most

In-set conditions tend to dominate one-hot baselines in two situations:

High-cardinality features. Things like ZIP code, product SKU, user ID, or city name can have thousands of distinct values. One-hot encoding produces thousands of nearly-empty columns and forces the tree to split them off one at a time, which wastes capacity and inflates depth. A single in-set condition can carve the value set into two meaningful groups in one node.
Imbalanced category frequencies. If most examples land in a handful of categories and the rest are rare, axis-aligned splits on one-hot columns push the tree toward extreme imbalance, since splitting off a rare category leaves a tiny child. In-set conditions can group rare categories together against frequent ones in a single test.

For low-cardinality features (say, K <= 4), the difference is usually small. XGBoost's max_cat_to_onehot parameter exists for exactly this reason: below the threshold it falls back to one-hot, since the simpler representation is typically as good and slightly cheaper.

Limitations and caveats

In-set conditions are powerful but not free.

For some criteria (notably multiclass log-loss without simplifying assumptions), finding the optimal in-set partition is NP-hard, so libraries use greedy or gradient-based heuristics. The result is not always the global optimum.
Native categorical handling can overfit on rare categories. Practitioners often set a minimum number of examples per category, or merge rare values into an "other" bucket, to keep splits stable.
Models that use in-set conditions are slightly less portable. A serialised tree carries the categorical vocabularies it was trained with, so feeding the model an unseen category at inference time has to be handled explicitly. Both LightGBM and YDF treat unknown categories as out-of-dictionary and route them down a default branch.
Some downstream tooling assumes purely numerical splits, which can complicate exporting an in-set tree to formats like ONNX or PMML.

Why this matters

The choice of split-condition family has a direct effect on tabular model quality. On benchmarks dominated by categorical features such as Criteo, Avazu, or KDD Cup data, libraries with native in-set support consistently outperform pipelines that one-hot encode then run a categorical-blind tree learner. That is one reason LightGBM, XGBoost, and CatBoost have largely displaced scikit-learn's classic tree implementations in industry tabular workflows.

In-set conditions also matter for interpretability work. A condition like country in {US, CA, MX} is more readable than "country_US == 1 OR country_CA == 1 OR country_MX == 1" reconstructed across three tree levels, even if the two encode the same logic.

References

Google for Developers. "Machine Learning Glossary: Decision Forests, In-set condition." https://developers.google.com/machine-learning/glossary/df
Google for Developers. "Types of conditions." Decision Forests course. https://developers.google.com/machine-learning/decision-forests/conditions
Google for Developers. "Growing decision trees." Decision Forests course. https://developers.google.com/machine-learning/decision-forests/growing
Fisher, W. D. (1958). "On Grouping for Maximum Homogeneity." Journal of the American Statistical Association, 53(284), 789-798.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). "Classification and Regression Trees." Wadsworth.
Quinlan, J. R. (1993). "C4.5: Programs for Machine Learning." Morgan Kaufmann.
LightGBM documentation. "Optimal Split for Categorical Features." https://lightgbm.readthedocs.io/en/latest/Features.html
LightGBM documentation. "Advanced Topics: Categorical Feature Support." https://lightgbm.readthedocs.io/en/latest/Advanced-Topics.html
XGBoost documentation. "Categorical Data." https://xgboost.readthedocs.io/en/stable/tutorials/categorical.html
CatBoost documentation. "Transforming categorical features to numerical features." https://catboost.ai/docs/en/concepts/algorithm-main-stages_cat-to-numberic
TensorFlow Decision Forests and YDF documentation. https://ydf.readthedocs.io/
scikit-learn documentation. "Categorical Feature Support in Gradient Boosting." https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_categorical.html

In-set condition

What the condition actually looks like

How in-set conditions compare to other condition types

Why categorical features need a special condition type

Computational complexity

Library support

A worked example

When in-set conditions help most

Limitations and caveats

Why this matters

See also

References

Improve this article

What the condition actually looks like

How in-set conditions compare to other condition types

Why categorical features need a special condition type

Computational complexity

Library support

A worked example

When in-set conditions help most

Limitations and caveats

Why this matters

See also

References

What the condition actually looks like

How in-set conditions compare to other condition types

Why categorical features need a special condition type

Computational complexity

Library support

A worked example

When in-set conditions help most

Limitations and caveats

Why this matters

See also

References

Improve this article

Related Articles

Axis-aligned condition

Binary condition

Leaf

Oblique condition

Threshold (for decision trees)

Gradient boosted (decision) trees (GBT)

What the condition actually looks like

How in-set conditions compare to other condition types

Why categorical features need a special condition type

Computational complexity

Library support

A worked example

When in-set conditions help most

Limitations and caveats

Why this matters

See also

References

Related Articles

Axis-aligned condition

Binary condition

Leaf

Oblique condition

Threshold (for decision trees)

Gradient boosted (decision) trees (GBT)