Uplift modeling (also called incremental modeling, true lift modeling, or net modeling) is a set of machine learning and statistical techniques that predict the incremental impact of a treatment or action on an individual's outcome. Rather than asking "Will this customer buy?" (the question addressed by a standard classification model), uplift modeling asks "Will this customer buy because of our intervention?" This reframing moves the problem from pure prediction into the domain of causal inference, making uplift modeling one of the most practically important bridges between predictive analytics and causal reasoning.
The quantity that uplift models estimate goes by several names in the academic literature: individual treatment effect (ITE), heterogeneous treatment effect (HTE), and conditional average treatment effect (CATE). All of these refer to the same core idea: measuring how much a treatment changes an outcome for a specific individual or subgroup, conditional on their observed characteristics.
Imagine you run a lemonade stand and you have coupons to give out. Some kids will buy lemonade whether or not they get a coupon. Other kids will never buy lemonade no matter what. A few kids will only buy lemonade if they get a coupon. And there might even be kids who get annoyed by the coupon and walk away.
Uplift modeling is like a magic sorting hat that tells you which kids are which. You want to give coupons only to the kids who would buy because of the coupon, and skip everyone else. That way you do not waste coupons on kids who would have bought anyway, and you do not bother the kids who get annoyed.
In grown-up terms, companies send promotions, doctors prescribe treatments, and politicians run ad campaigns. Uplift modeling helps them figure out who will actually change their behavior because of the action, so resources go where they matter most.
Uplift modeling is built on the potential outcomes framework (also called the Rubin Causal Model, after Donald Rubin). For each individual i, there are two potential outcomes:
The individual treatment effect is defined as:
ITE_i = Y_i(1) - Y_i(0)
The fundamental problem is that we can only ever observe one of these two outcomes for any given individual. A person either received the marketing email or they did not; we cannot rewind time and observe both realities. This impossibility of observing both potential outcomes simultaneously is sometimes called the "fundamental problem of causal inference," a term attributed to Paul Holland (1986).
Because the ITE is never directly observable, uplift modeling relies on estimating the conditional average treatment effect (CATE), defined as:
CATE(x) = E[Y(1) - Y(0) | X = x]
where X represents the vector of observed covariates for an individual. The CATE captures how the average treatment effect varies across subgroups defined by different covariate values.
For CATE estimates to have a causal interpretation, several assumptions must hold:
| Assumption | Also called | What it requires |
|---|---|---|
| Stable Unit Treatment Value Assumption (SUTVA) | Consistency + no interference | Each individual's outcome depends only on their own treatment assignment, and there is a single well-defined version of each treatment level |
| Conditional ignorability | Unconfoundedness, selection on observables | Given the observed covariates X, treatment assignment is independent of potential outcomes: (Y(0), Y(1)) is independent of W given X |
| Positivity | Overlap, common support | Every individual has a strictly positive probability of receiving each treatment level: 0 < P(W=1 given X=x) < 1 for all x |
In randomized experiments and A/B tests, these assumptions hold by design (though SUTVA must still be verified). In observational studies, the ignorability and positivity assumptions must be justified on substantive grounds, and violations can lead to biased treatment effect estimates.
A framework introduced by Nicholas Radcliffe and Patrick Surry (1999) divides the population into four segments based on how they respond to treatment. This segmentation is central to understanding why uplift modeling is valuable.
| Segment | Also called | Behavior without treatment | Behavior with treatment | Treatment effect |
|---|---|---|---|---|
| Persuadables | Movable customers | Would not convert | Convert after treatment | Positive uplift |
| Sure Things | Certain responders | Would convert anyway | Still convert | Zero uplift |
| Lost Causes | Never responders | Would not convert | Still do not convert | Zero uplift |
| Sleeping Dogs | Do-not-disturbs | Would convert on their own | Stop converting when treated | Negative uplift |
The entire goal of uplift modeling is to identify the Persuadables and target only them. Sure Things waste marketing budget since they would have converted regardless. Lost Causes also waste budget. Sleeping Dogs are the most dangerous group: targeting them actually decreases conversions. Traditional response models cannot distinguish Persuadables from Sure Things because both groups show high predicted response rates.
The existence of Sleeping Dogs is well documented in retention marketing. For example, a customer who receives an unexpected retention offer may interpret it as a signal that the company expects them to leave, which paradoxically triggers churn.
Uplift modeling methods can be grouped into several families: meta-learner approaches, direct uplift modeling (tree-based methods), transformed outcome methods, and neural network-based methods. Each family has distinct strengths and trade-offs.
Meta-learners are strategies that decompose the uplift estimation problem into one or more standard supervised learning tasks. They are called "meta" learners because they can wrap around any base machine learning algorithm (such as random forest, gradient boosting, or neural networks). The most widely used meta-learners were formalized by Kunzel, Sekhon, Bickel, and Yu in their influential 2019 PNAS paper.
| Meta-learner | Models fitted | Key idea | Best when | Main weakness |
|---|---|---|---|---|
| S-Learner | 1 | Include treatment indicator as a regular feature | Treatment effect is strong; quick baseline | Regularization biases treatment effect toward zero |
| T-Learner | 2 | Fit separate models for treated and control groups | Treatment and control groups are roughly equal in size | Struggles with imbalanced treatment/control splits |
| X-Learner | 2 + 2 + propensity | Two-stage imputation with propensity score weighting | One group is much larger than the other | More complex; requires propensity score estimation |
| R-Learner | Nuisance + CATE | Residualize outcome and treatment, then regress residuals | Observational data with confounding | Sensitive to quality of nuisance parameter estimates |
| DR-Learner | Nuisance + CATE | Regress doubly robust pseudo-outcomes on covariates | Robustness to model misspecification is needed | Requires accurate estimation of at least one nuisance function |
The simplest approach fits a single model on the entire dataset, including the treatment indicator W as just another input feature alongside the covariates X. The estimated CATE is the difference in the model's predictions when the treatment variable is set to 1 versus 0:
CATE_hat(x) = mu_hat(x, W=1) - mu_hat(x, W=0)
where mu_hat is the fitted model for E[Y | X, W].
The main drawback is that regularization in the base learner (such as L1 or L2 penalties) may shrink the treatment variable's coefficient toward zero, effectively underestimating the true effect. If the treatment signal is weak relative to the covariates, the model may ignore the treatment variable entirely. Despite this limitation, the S-learner serves as a useful and fast baseline. Victor Lo's 2002 "True Lift Model" paper described an early version of this approach.
This approach trains two completely separate models: one on treated observations (mu_1) and one on control observations (mu_0). The CATE estimate is the difference between the two predictions:
CATE_hat(x) = mu_1_hat(x) - mu_0_hat(x)
Because each model only sees its own group, T-learners avoid the regularization bias of S-learners. However, when one group is much smaller than the other, that group's model may overfit or suffer from high variance. Another weakness is that each model is optimized independently for prediction accuracy, not for the difference between them, so prediction errors in the two models can compound rather than cancel.
Proposed by Kunzel et al. (2019), the X-learner addresses the T-learner's weakness with imbalanced groups through a two-stage procedure:
The X-learner is provably efficient when one treatment group is much larger than the other, because it leverages the larger group's model to impute counterfactuals for the smaller group. It can also adapt to structural properties of the CATE function, such as sparsity or approximate linearity.
Developed by Nie and Wager (2021), the R-learner is based on Robinson's (1988) partially linear model. The key insight is a decomposition that isolates the treatment effect from confounding:
The residualization step removes the confounding signal, leaving only the causal component. The R-learner has the Neyman orthogonality property, which gives it a root-n rate of convergence under weaker conditions on nuisance function approximation. This makes it well suited for observational data where confounding is present. The name "R-learner" comes from its reliance on Robinson's decomposition.
Formalized by Kennedy (2023), this approach constructs "doubly robust pseudo-outcomes" that combine inverse propensity weighting with outcome regression. The pseudo-outcome for each individual is:
phi_i = mu_1_hat(X_i) - mu_0_hat(X_i) + W_i * (Y_i - mu_1_hat(X_i)) / e_hat(X_i) - (1 - W_i) * (Y_i - mu_0_hat(X_i)) / (1 - e_hat(X_i))
The CATE is then estimated by regressing these pseudo-outcomes on covariates X using any flexible regression method.
The key advantage is double robustness: the CATE estimate remains consistent as long as either the outcome models (mu_0, mu_1) or the propensity model (e) is correctly specified, though not necessarily both. Kennedy showed that the DR-learner can achieve minimax optimal rates for CATE estimation under appropriate conditions, making it theoretically attractive. In practice, cross-fitting (sample splitting) is used when estimating the nuisance models to avoid overfitting bias.
Several methods estimate uplift directly using modified decision trees and ensemble approaches. These methods modify the splitting criteria and estimation procedures of classical tree algorithms to target treatment effect heterogeneity rather than prediction accuracy.
Rzepakowski and Jaroszewicz (2010) proposed decision trees that split nodes to maximize the difference in treatment effect between child nodes, rather than maximizing prediction accuracy. The splitting criterion is based on distributional divergence measures between the treatment and control outcome distributions in child nodes. Three divergence measures are commonly used:
| Divergence measure | Formula basis | Properties |
|---|---|---|
| Kullback-Leibler (KL) divergence | KL(P_T | |
| Squared Euclidean distance | Sum of (P_T - P_C)^2 | Symmetric; computationally simple |
| Chi-squared divergence | Sum of (P_T - P_C)^2 / P_C | Asymmetric; related to the chi-squared test statistic |
The tree is grown by selecting the split that maximizes the chosen divergence measure, and leaf nodes provide uplift estimates as the difference in average outcomes between treated and control units in that leaf.
Athey and Imbens (2016) introduced causal trees, which adapt standard decision tree algorithms for heterogeneous treatment effect estimation. A key innovation is the honesty property: the training data is split into two disjoint subsamples. One subsample (the "splitting" sample) is used to determine the tree structure, and the other (the "estimation" sample) is used to estimate treatment effects within each leaf. This separation ensures that the treatment effect estimates are unbiased and have valid confidence intervals, because the estimates do not depend on the same data used to select the tree structure.
Wager and Athey (2018) extended the causal tree idea to forests by building many honest causal trees and averaging their predictions. Each tree is grown on a bootstrap subsample, and honesty is maintained within each tree. Causal forests provide several theoretical guarantees:
The Generalized Random Forest (GRF) framework, developed by Athey, Tibshirani, and Wager (2019), generalizes causal forests to a broader class of estimands defined as solutions to local moment equations. The GRF software package (available in R and C++) has become a standard tool in applied causal machine learning, supporting not only CATE estimation but also quantile regression, instrumental variables estimation, and local linear forests.
Hill (2011) adapted Bayesian Additive Regression Trees (BART) to the causal setting. BART fits a sum-of-trees model embedded in a Bayesian inferential framework, with priors that regularize tree complexity. For causal inference, the approach works by fitting a flexible nonparametric model to the response surface E[Y | X, W] and then predicting counterfactual outcomes by toggling the treatment variable. The ITE is estimated as mu_hat(x, W=1) - mu_hat(x, W=0).
Causal BART naturally provides uncertainty quantification through its Bayesian posterior, yielding credible intervals for individual treatment effects without additional bootstrapping or asymptotic arguments. However, Hill's approach can exhibit bias under strong confounding, which led Hahn, Murray, and Carvalho (2020) to develop Bayesian Causal Forests (BCF), a variant that explicitly models the propensity score to reduce regularization-induced confounding bias.
Proposed by Athey and Imbens (2015), the transformed outcome approach converts the uplift estimation problem into a standard regression problem. For binary treatment with known propensity score e(x), the transformed outcome is defined as:
Y* = W * Y / e(x) - (1 - W) * Y / (1 - e(x))
where Y is the observed outcome, W is the treatment indicator (0 or 1), and e(x) is the propensity score (probability of treatment). The key property is that the conditional expectation of Y* equals the true CATE:
E[Y* | X = x] = CATE(x)
This means that any standard regression method (linear regression, random forest, gradient boosting, neural networks) can be applied to model Y* as a function of covariates, turning uplift estimation into a familiar supervised learning problem.
The main downside is that the transformed outcome can have high variance, especially when propensity scores are close to 0 or 1. In randomized experiments with equal treatment/control allocation (e(x) = 0.5), the variance is more manageable. Wayfair's "pylift" Python package was built around this approach.
Deep learning has been increasingly applied to treatment effect estimation, motivated by the ability of neural networks to learn complex nonlinear functions and flexible representations.
Shalit, Johansson, and Sontag (2017) proposed learning a shared representation of the covariates that is balanced between treatment and control groups. TARNet consists of a shared network body that maps raw features X into a learned representation Z, followed by separate prediction heads for each treatment level. The model is trained to minimize prediction error while also minimizing the distributional distance (measured by Wasserstein distance or Maximum Mean Discrepancy) between the treated and control groups in representation space. This balance encourages the learned representation to capture prognostic information while discouraging reliance on features that merely predict treatment assignment.
Shi, Blei, and Veitch (2019) extended TARNet by adding a third output head that predicts the propensity score e(x). This additional head acts as a form of targeted regularization: it forces the shared representation to encode information sufficient for estimating the probability of treatment, which, by the sufficiency of the propensity score for adjustment (Rosenbaum and Rubin, 1983), ensures the representation captures the information needed for unbiased treatment effect estimation. DragonNet also incorporates a regularization procedure inspired by Targeted Maximum Likelihood Estimation (TMLE) that encourages the model to have non-parametrically optimal asymptotic properties.
Louizos et al. (2017) proposed a deep generative model for causal inference that can handle settings with hidden confounders. CEVAE uses a variational autoencoder to model a latent variable Z that represents unobserved confounders, using observed covariates X as noisy proxies. The generative model specifies distributions for X, treatment W, and outcome Y given Z, and variational inference is used to approximate the posterior distribution of Z. CEVAE can estimate causal effects even when not all confounders are directly observed, provided that the observed covariates carry information about the latent confounders.
Evaluating uplift models is fundamentally harder than evaluating standard predictive models because the ground-truth individual treatment effect is never observed. Standard metrics like accuracy, AUC, or RMSE do not apply directly. Instead, the field relies on specialized metrics that evaluate how well the model ranks individuals by their estimated treatment effect.
The uplift curve plots the cumulative incremental effect as a function of the fraction of the population targeted, with individuals ordered by the model's predicted uplift from highest to lowest. If the model ranks individuals correctly, targeting the top-k% of the population should capture more incremental effect than targeting a random k%.
The Area Under the Uplift Curve (AUUC) summarizes the curve into a single number. A model with a higher AUUC assigns higher uplift scores to individuals who truly benefit from treatment. The AUUC is analogous to AUC-ROC in standard classification but measures ranking quality in terms of treatment effect rather than outcome prediction.
The Qini curve, introduced by Radcliffe (2007), is closely related to the uplift curve. It is a generalization of the Lorenz curve traditionally used in direct marketing for response models. The Qini curve plots the number of incremental positive outcomes as a function of the number of individuals targeted (rather than the fraction). The Qini coefficient is the area between the model's Qini curve and the random targeting diagonal. A model with a higher Qini coefficient is better at identifying individuals who benefit most from treatment.
| Metric | What it measures | Analogy to standard ML |
|---|---|---|
| Uplift curve | Cumulative incremental effect vs. fraction targeted | ROC curve |
| AUUC | Area under the uplift curve | AUC-ROC |
| Qini curve | Incremental positive outcomes vs. number targeted | Precision-recall curve |
| Qini coefficient | Area between Qini curve and random baseline | Gini coefficient |
| AUTOC / TOC | Targeting Operating Characteristic (Yadlowsky et al., 2021) | Alternative ranking metric based on average treatment effect among top-ranked individuals |
| Cumulative gain | Incremental gain per decile | Lift chart |
Because true individual-level uplift is unobservable, practitioners typically evaluate models using held-out A/B testing data. The standard procedure is:
A well-calibrated uplift model should show high observed uplift in the top deciles and low or negative uplift in the bottom deciles. If the observed treatment effect is roughly constant across all deciles, the model has failed to capture meaningful heterogeneity.
An additional validation technique is the uplift by decile bar chart, where bars represent the observed uplift within each decile. This visualization is intuitive for business stakeholders and makes it easy to determine the optimal targeting threshold.
Uplift modeling and standard predictive modeling differ in fundamental ways. Understanding these differences is important because applying a standard response model to a targeting problem can waste resources or, worse, cause harm by targeting Sleeping Dogs.
| Aspect | Standard predictive model | Uplift model |
|---|---|---|
| Target variable | Observed outcome (Y) | Unobserved individual treatment effect (ITE) |
| Question answered | "Will this person convert?" | "Will this person convert because of the intervention?" |
| Training data | Can use any labeled dataset | Requires randomized experiment or valid quasi-experiment |
| Evaluation | Accuracy, AUC, F1 score, RMSE | AUUC, Qini coefficient, uplift curves |
| Risk of misuse | May target Sure Things (wastes resources) | Specifically avoids Sure Things and Sleeping Dogs |
| Relation to causality | Correlational | Causal |
| Feature interpretation | Features predict outcome | Features predict treatment effect heterogeneity |
A standard classification model trained to predict purchase probability will give high scores to both Persuadables and Sure Things, since both groups have high purchase rates. An uplift model, by contrast, only gives high scores to Persuadables because it specifically estimates the incremental effect of treatment.
Uplift modeling has found applications across a wide range of industries. In each case, the core question is the same: which individuals will change their behavior because of an intervention?
The most common application of uplift modeling is in targeted marketing campaigns. Companies use uplift models to decide which customers should receive a promotional email, discount code, or phone call. By targeting only Persuadables, organizations have reported marketing efficiency gains of 15 to 30 percent compared to traditional targeting that uses response models.
In churn prevention, uplift models identify customers who will stay only if given a retention offer, avoiding spending on customers who would stay regardless or who cannot be retained. This application is especially valuable because churn campaigns are prone to the Sleeping Dog effect: some loyal customers interpret a retention offer as a signal that the company expects them to leave.
In healthcare, uplift modeling helps identify which patients will benefit most from a specific treatment. This is central to precision medicine, where the goal is to match patients with therapies that are effective for their individual profile rather than relying on average treatment effects from clinical trials. For example, uplift models can help determine which cancer patients benefit from an aggressive chemotherapy regimen versus a less intensive protocol, or which patients with depression respond better to cognitive behavioral therapy versus medication.
E-commerce platforms use uplift models for personalized pricing, estimating how much a discount will increase each customer's purchase probability. This allows companies to offer discounts only to price-sensitive customers (Persuadables in pricing terms) while charging full price to customers who would buy at any price (Sure Things). The net value formulation of uplift modeling explicitly incorporates heterogeneous treatment costs, enabling constrained policy optimization under budget limits.
Political campaigns have used uplift modeling to target voter outreach efforts, identifying voters who are persuadable on specific issues and directing canvassing or advertising resources toward them rather than toward voters whose minds are already made up. The 2008 and 2012 U.S. presidential campaigns were early high-profile adopters of these techniques.
In digital advertising, uplift modeling is used to measure incrementality: the fraction of conversions that are genuinely caused by an ad impression. Because many users who see an ad would have converted anyway (Sure Things), raw conversion rates overstate ad effectiveness. Uplift models estimate the incremental conversions attributable to the ad, providing a more accurate measure of return on ad spend. Criteo, a major ad-tech company, has published a large-scale benchmark dataset specifically for uplift modeling in the advertising context.
Many real-world scenarios involve more than two treatment options. For example, a marketing campaign might offer different discount levels (10%, 20%, 30%) or different communication channels (email, SMS, phone call). Extending uplift modeling to multiple treatments requires estimating the CATE for each treatment versus control, and then selecting the optimal treatment for each individual.
Zhao et al. (2019) extended the meta-learner framework (including X-learner and R-learner) to the multiple treatment setting and introduced a net value optimization framework. This framework accounts for:
The net value CATE for treatment t and individual i is:
NetCATEi(t) = CATE_i(t) * Value_i - TriggeredCost_i(t) - ImpressionCost_i(t)
The optimal treatment for each individual is the one with the highest net value CATE, subject to budget constraints. This formulation turns the problem into a constrained optimization problem that can be solved with standard techniques.
Uplift modeling sits at the intersection of machine learning and causal inference. While traditional machine learning focuses on prediction (estimating E[Y|X]), uplift modeling focuses on causal estimation (estimating E[Y(1) - Y(0)|X]). This connection has driven a productive exchange between the econometrics, statistics, and machine learning communities over the past two decades.
Key foundational frameworks that underpin uplift modeling include:
The development of methods like causal forests, R-learners, and DR-learners reflects a broader trend toward combining the flexibility of modern machine learning algorithms with the rigor of causal identification theory.
Several open-source libraries make uplift modeling accessible to practitioners. The table below summarizes the most widely used packages.
| Library | Developer | Language | Key features |
|---|---|---|---|
| CausalML | Uber | Python | Meta-learners (S, T, X, R, DR), uplift trees, DragonNet, CEVAE, policy optimization, sensitivity analysis |
| EconML | Microsoft Research | Python | DML, causal forests, DR-learner, IV methods, SHAP integration, metalearner API |
| scikit-uplift | Open source | Python | scikit-learn-compatible API, Solo/Two Model approaches, Qini/AUUC metrics, visualization |
| grf (Generalized Random Forests) | Stanford | R / C++ | Causal forests with valid confidence intervals, local linear forests, quantile regression, IV estimation |
| pylift | Wayfair | Python | Transformed outcome approach, Qini-based evaluation |
| DoWhy | Microsoft Research | Python | Causal graph specification, identification, estimation, and refutation testing |
| UpliftML | Booking.com | Python | Scalable uplift modeling on PySpark, meta-learners, evaluation |
| DoubleML | Open source | Python / R | Double machine learning framework, cross-fitting, various nuisance estimators |
A typical uplift modeling workflow proceeds as follows:
Several publicly available datasets are commonly used for evaluating uplift models:
| Dataset | Source | Size | Treatment | Outcome | Notes |
|---|---|---|---|---|---|
| Hillstrom Email Marketing | Kevin Hillstrom / MineThatData | ~64,000 | Email campaign (men's/women's) | Visit, conversion | Classic benchmark; small size limits power |
| Criteo Uplift | Criteo AI Lab | ~25 million | Ad exposure | Visit, conversion | Large-scale; from real incrementality tests |
| Criteo Large-Scale ITE | Criteo Research | ~13.9 million | Ad exposure | Visit, conversion | Multiple RCTs combined; 210x larger than prior benchmarks |
| Lenta | Lenta (Russian retail) | ~690,000 | Promotional offer | Purchase | Retail marketing dataset |
| X5 RetailHero | X5 Retail Group | ~250,000 | SMS campaign | Purchase | Retail uplift competition |
| Starbucks | Udacity | ~120,000 | Promotional offer | Purchase | Simulated dataset for educational use |
The Criteo datasets are particularly valuable because of their scale and because they come from real randomized experiments, providing a realistic evaluation setting.
Despite its practical value, uplift modeling comes with several challenges:
The development of uplift modeling spans several decades and academic communities. The field grew from the intersection of database marketing, statistics, econometrics, and machine learning.
| Year | Milestone |
|---|---|
| 1986 | Holland formalizes the "fundamental problem of causal inference" in the potential outcomes framework |
| 1988 | Robinson proposes the partially linear model, later used as the basis for the R-learner |
| 1999 | Radcliffe and Surry publish the first paper on "differential response analysis," introducing the term uplift modeling and the four customer segments |
| 2002 | Victor Lo introduces the "True Lift Model" in SIGKDD Explorations, describing an early version of the S-learner |
| 2007 | Radcliffe introduces the Qini curve and Qini coefficient for uplift model evaluation |
| 2010 | Rzepakowski and Jaroszewicz develop decision trees specifically designed for uplift estimation using divergence-based splitting criteria |
| 2011 | Hill demonstrates the use of BART for causal inference and heterogeneous treatment effect estimation |
| 2016 | Athey and Imbens propose causal trees with honest estimation, providing valid confidence intervals for tree-based CATE estimates |
| 2017 | Shalit, Johansson, and Sontag propose TARNet and the counterfactual regression framework for neural network-based treatment effect estimation |
| 2017 | Louizos et al. introduce CEVAE for causal inference with latent confounders |
| 2018 | Chernozhukov et al. publish the double/debiased machine learning framework |
| 2018 | Wager and Athey publish the causal forest method in JASA with asymptotic normality results |
| 2019 | Kunzel, Sekhon, Bickel, and Yu formalize the meta-learner framework (S, T, X-learners) in PNAS |
| 2019 | Athey, Tibshirani, and Wager publish the Generalized Random Forest framework in the Annals of Statistics |
| 2019 | Shi, Blei, and Veitch introduce DragonNet with targeted regularization at NeurIPS |
| 2019 | Zhao et al. extend uplift modeling to multiple treatments with cost optimization |
| 2020 | Uber releases CausalML, an open-source Python package for uplift modeling |
| 2021 | Nie and Wager publish the R-learner in Biometrika with quasi-oracle convergence guarantees |
| 2023 | Kennedy publishes optimal DR-learner theory in the Electronic Journal of Statistics |