When should I use Double/Debiased Machine Learning (DML)?

When you have high-dimensional confounders and want ML-flexible estimation of nuisance parameters while preserving valid, root-n inference on the causal parameter.

What is the key assumption of Double/Debiased Machine Learning (DML)?

Conditional exogeneity (selection on observables) plus regularity conditions on the ML estimators (approximate sparsity or sufficient smoothness). The Neyman orthogonality condition ensures the causal parameter estimate is insensitive to small errors in the nuisance estimates.

What is the most common mistake with Double/Debiased Machine Learning (DML)?

Using ML predictions directly for causal inference without the debiasing/cross-fitting steps, which invalidates standard errors due to overfitting bias. Cross-fitting is essential, not optional.

Method·advanced·10 min read

ML + CausalFrontier

Double/Debiased Machine Learning (DML)

Uses machine learning for nuisance parameter estimation while preserving valid inference on the causal parameter of interest.

When to Use: When you have high-dimensional confounders and want ML-flexible estimation of nuisance parameters while preserving valid, root-n inference on the causal parameter.
Assumption: Conditional exogeneity (selection on observables) plus regularity conditions on the ML estimators (approximate sparsity or sufficient smoothness). The Neyman orthogonality condition ensures the causal parameter estimate is insensitive to small errors in the nuisance estimates.
Mistake: Using ML predictions directly for causal inference without the debiasing/cross-fitting steps, which invalidates standard errors due to overfitting bias. Cross-fitting is essential, not optional.
Reading Time: ~10 min read · 11 sections · 6 interactive exercises

One-Line Implementation

RDoubleMLPLR$new(dml_data, ml_l = lrn('regr.ranger'), ml_m = lrn('classif.ranger'), n_folds = 5)$fit()

Stata* ssc install ddml pystacked; multi-step: ddml init, then ddml E[Y|X] and ddml crossfit

PythonLinearDML(model_y=RandomForestRegressor(), model_t=RandomForestClassifier(), cv=5).fit(Y, T, X=X)

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Estimating Price Elasticity of Demand

You want to estimate the causal effect of price on demand for a product. You have observational data with hundreds of potential confounders: competitor prices, seasonality, weather, local demographics, marketing spend, and more.

Traditional regression forces you to specify a functional form: maybe you add log-transformed variables, interactions, and polynomials. But with hundreds of confounders, getting the functional form right is extremely difficult. Machine learning excels at flexibly fitting complex relationships — random forests, gradient boosting, and neural networks can capture nonlinearities and interactions that you would never think to specify.

But there is a catch. If you simply run a random forest to predict the outcome, extract the predicted values, and use them in a second-stage regression, your standard errors are wrong and your point estimate may be biased. ML models overfit, regularize in ways that bias coefficients, and do not distinguish between causal and predictive relationships.

Chernozhukov et al. (2018) addressed this problem with Double/ (DML). The key ideas are:

Neyman orthogonality: Construct the causal estimating equation so that small errors in the ML-estimated nuisance functions do not bias the causal parameter.
: Split the data to avoid overfitting bias — train ML models on one subset, predict on another.

The result: you can use any well-behaved ML method to control for confounders while still getting valid confidence intervals for the causal effect.

AOverview

The Problem with Naive ML

Suppose you want to estimate $\theta_0$ in:

Y = D \theta_0 + g_0(X) + U, \quad E[U \mid X, D] = 0

where $g_0(X)$ is an unknown, potentially complex function of high-dimensional confounders $X$ .

Naive approach: Estimate $\hat{g}(X)$ using ML, then regress $Y - \hat{g}(X)$ on $D$ .

Problem: Even if $\hat{g}$ converges to $g_0$ , ML methods typically converge at a rate slower than $\sqrt{n}$ . This "regularization bias" contaminates the estimate of $\theta_0$ , making standard errors invalid and the point estimate biased.

The DML Solution

DML addresses this through two innovations:

1. Neyman orthogonality. Instead of directly partialing out confounders from $Y$ , also partial out confounders from $D$ — that is, residualize both $Y$ and $D$ on $X$ . This "double residualization" makes the estimating equation for $\theta_0$ insensitive to first-order errors in $\hat{g}$ . The idea goes back to Robinson (1988) and Frisch-Waugh-Lovell, but DML generalizes it to the ML setting.

2. Cross-fitting. Split the sample into $K$ folds. For each fold, train the ML models on the other $K-1$ folds and predict on the held-out fold. This sample-splitting avoids the overfitting problem that arises when the same data are used for both ML estimation and causal inference.

(Plain Language)

Imagine you are estimating a treatment effect, and your estimate depends on how well you model the outcome as a function of confounders. If a small error in your outcome model directly translates into a proportional error in your treatment effect estimate, you have a problem — because ML models inevitably have some error.

means the treatment effect estimate is locally insensitive to errors in the nuisance models. Geometrically, the "gradient" of the causal parameter with respect to the nuisance function is zero at the true values. First-order errors in the nuisance function produce only second-order errors in the causal parameter.

This insensitivity property is why you partial out confounders from both the outcome and the treatment. The resulting estimating equation has the Neyman-orthogonal property.

Common Confusions

"Can I use any ML method?" Almost. The ML learners must satisfy certain convergence rate conditions (roughly, faster than $n^{-1/4}$ ; more precisely, the product of the two nuisance estimation errors must be $o_p(n^{-1/2})$ ). Most standard methods (random forests, gradient boosting, , neural networks) satisfy this condition. But consider avoiding methods that do not converge at all or that have extremely high variance.

"Does DML give me causal effects even with observational data?" DML gives you valid inference conditional on the unconfoundedness assumption, similar to matching methods. It does not solve the omitted variable problem. If there are unmeasured confounders, DML is biased just like OLS would be. DML's advantage is in handling observed confounders more flexibly.

"How is DML different from doubly robust estimation?" The doubly robust property is a building block of DML. DML adds (to handle ML overfitting) and the formal Neyman orthogonality framework (to handle regularization bias). You can think of DML as "doubly robust estimation done right when using ML."

"How many folds for cross-fitting?" 5 folds is a common default. Too few folds means the training set is small (reducing ML performance). Too many folds means each held-out set is small (increasing variance). The theoretical results are robust to the number of folds as long as $K \geq 2$ .

BIdentification

The Partially Linear Model

The simplest DML setup is the partially linear regression model:

Y = D\theta_0 + g_0(X) + U, \quad E[U \mid X, D] = 0

D = m_0(X) + V, \quad E[V \mid X] = 0

where:

$\theta_0$ is the causal parameter of interest
$g_0(X) = E[Y - D\theta_0 | X]$ captures the confounding relationship between $X$ and $Y$ (nuisance)
$m_0(X) = E[D | X]$ is the treatment confounding function (nuisance / propensity score)
$U, V$ are residuals

Identifying Assumptions

Conditional exogeneity (unconfoundedness): $E[U | X, D] = 0$ — conditional on the observed covariates $X$ , the treatment $D$ is as-good-as randomly assigned. All confounders are observed and included in $X$ .
Overlap (positivity): $0 < P(D = 1 \mid X = x) < 1$ for all $x$ in the support of $X$ — every unit has a positive probability of receiving treatment. Without overlap, the propensity score $m_0(X)$ would be degenerate and residualization would fail.
SUTVA (Stable Unit Treatment Value Assumption): No interference between units — one unit's treatment does not affect another unit's outcome.
Sufficient ML convergence: The nuisance estimators $\hat{g}$ and $\hat{m}$ converge at a rate faster than $n^{-1/4}$ , so that the product of their errors is $o_p(n^{-1/2})$ .

The DML Estimator

Step 1: Double residualization.

Estimate $\hat{\ell}_0(X) \approx E[Y \mid X]$ and $\hat{m}_0(X) \approx E[D \mid X]$ using ML. Here $\ell_0(X) \equiv E[Y \mid X] = m_0(X)\theta_0 + g_0(X)$ is the reduced-form conditional expectation — we estimate this directly rather than $g_0(X)$ because $g_0$ depends on the unknown $\theta_0$ .
Compute residuals: $\tilde{Y}_i = Y_i - \hat{\ell}(X_i)$ and $\tilde{D}_i = D_i - \hat{m}(X_i)$

Step 2: Estimate $\theta_0$ .

\hat{\theta}_0 = \frac{\sum_i \tilde{D}_i \tilde{Y}_i}{\sum_i \tilde{D}_i^2}

This estimator is just OLS of $\tilde{Y}$ on $\tilde{D}$ — the Frisch-Waugh-Lovell theorem applied to ML residuals.

Step 3: Cross-fitting. Do the above with sample splitting:

Split data into $K$ folds
For each fold $k$ : train $\hat{\ell}^{(-k)}$ and $\hat{m}^{(-k)}$ on all data except fold $k$ ; predict $\tilde{Y}_i, \tilde{D}_i$ for observations in fold $k$
Pool all residuals and run the final regression

Formal Statement

CVisual Intuition

The DML procedure can be visualized in three steps:

Partial out X from Y: Remove the part of the outcome explained by confounders (the ML-predicted component). What remains ( $\tilde{Y}$ ) is the variation in $Y$ not explained by $X$ .
Partial out X from D: Remove the part of the treatment explained by confounders. What remains ( $\tilde{D}$ ) is the variation in treatment not predicted by observables — the "residual" or "exogenous" variation.
Regress $\tilde{Y}$ on $\tilde{D}$ : The slope of this regression is the causal effect $\hat{\theta}_0$ .

DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: The DML estimating equation is constructed so that the score function has a zero derivative with respect to the nuisance parameters at their true values. This orthogonality means small ML estimation errors produce only second-order bias.

Define the moment condition for the partially linear model:

\psi(W; \theta, \eta) = (Y - D\theta - g(X))(D - m(X))

where $\eta = (g, m)$ are the nuisance functions and $W = (Y, D, X)$ .

Standard moment condition: $E[\psi(W; \theta_0, \eta_0)] = 0$

Neyman orthogonality condition: The Gateaux derivative of $E[\psi(W; \theta_0, \eta)]$ with respect to $\eta$ at $\eta = \eta_0$ is zero:

\left.\frac{\partial}{\partial r} E[\psi(W; \theta_0, \eta_0 + r(\eta - \eta_0))]\right|_{r=0} = 0

For our moment condition:

\frac{\partial}{\partial r}\bigg|_{r=0} E[(Y - D\theta_0 - g_0 - r\Delta g)(D - m_0 - r\Delta m)]

where $U = Y - D\theta_0 - g_0(X)$ is the outcome residual and $V = D - m_0(X)$ is the treatment residual:

= -E[\Delta g \cdot V] - E[U \cdot \Delta m] = 0

The first term is zero because $E[\Delta g(X) \cdot V | X] = \Delta g(X) \cdot E[V | X] = 0$ . The second term is zero because $E[U \cdot \Delta m(X)] = E[E[U | X] \cdot \Delta m(X)] = 0$ .

This orthogonality means that $\hat{\theta}_0$ is robust to first-order perturbations of $\hat{g}$ and $\hat{m}$ around their true values. Combined with cross-fitting (which prevents overfitting bias), this combination yields:

\hat{\theta}_0 = \theta_0 + \frac{1}{n} \sum_i \psi(W_i; \theta_0, \eta_0) + o_p(n^{-1/2})

giving $\sqrt{n}$ -consistency and asymptotic normality.

EImplementation

1# Requires: DoubleML, mlr3, mlr3learners
2# DoubleML: R implementation of Chernozhukov et al. (2018) DML framework
3library(DoubleML)
4library(mlr3)          # mlr3: machine learning framework for R
5library(mlr3learners)  # mlr3learners: additional ML algorithms for mlr3
6
7# --- Step 1: Prepare data for DML ---
8# DoubleMLData specifies the causal structure: outcome, treatment, and confounders
9dml_data <- DoubleMLData$new(
10df,
11y_col = "outcome",     # outcome variable
12d_cols = "treatment",   # treatment variable
13x_cols = paste0("x", 1:50)  # high-dimensional confounders
14)
15
16# --- Step 2: Choose ML learners for nuisance estimation ---
17# ml_l: predicts E[Y mid X] (outcome model) — removes confounding from outcome
18# ml_m: predicts E[D mid X] (treatment model) — removes confounding from treatment
19ml_l <- lrn("regr.ranger", num.trees = 500)     # random forest for outcome
20ml_m <- lrn("classif.ranger", num.trees = 500, predict_type = "prob")  # random forest for propensity
21
22# --- Step 3: Fit DML with cross-fitting ---
23# DoubleMLPLR: partially linear model Y = theta*D + g(X) + epsilon
24# n_folds = 5: 5-fold cross-fitting prevents overfitting bias
25# Cross-fitting is essential — without it, ML overfitting biases theta
26dml_plr <- DoubleMLPLR$new(
27dml_data,
28ml_l = ml_l,
29ml_m = ml_m,
30n_folds = 5
31)
32dml_plr$fit()
33dml_plr$summary()
34# theta: debiased causal effect of treatment on outcome
35
36# --- Step 4: Confidence interval ---
37# Valid inference from Neyman orthogonality + cross-fitting
38dml_plr$confint(level = 0.95)

RequiresDoubleML mlr3 mlr3learners

FDiagnostics

Check the quality of the first-stage ML models. Report cross-validated $R^2$ or MSE for both the outcome model ( $\hat{g}$ ) and the treatment model ( $\hat{m}$ ). If the ML models do not fit well, the residualization may not adequately remove confounding.
Compare ML learners. Run DML with different ML methods (random forest, gradient boosting, lasso) and check whether the causal estimate changes. Robustness to the ML learner choice increases credibility.
Check residual balance. After double residualization, the residualized treatment $\tilde{D}$ should be uncorrelated with $X$ . Check this by regressing $\tilde{D}$ on $X$ — the $R^2$ should be near zero.
Sensitivity to the number of folds. Re-run with $K = 2, 5, 10$ folds. Results should be stable.
Compare with simple OLS. If DML and OLS give similar results, the confounding is well-captured by linear terms and DML's flexibility was not needed (but also did not hurt).

Interpreting Your Results

DML and OLS agree: The confounding relationship is approximately linear. Both are valid. DML's standard errors may be somewhat wider, reflecting additional estimation variance from the cross-fitting procedure.

DML and OLS disagree: Nonlinear confounding matters. Report DML as your main result and discuss why the linear approximation fails.

DML results vary across ML learners: The nuisance functions may not be well-estimated. Consider using ensemble methods or SuperLearner to aggregate multiple ML methods.

GWhat Can Go Wrong

Common Pitfalls

Forgetting cross-fitting. Without sample splitting, overfitting in the ML step biases the causal estimate. Cross-fitting is not optional.
Using ML directly for causal effects. Running a random forest with $D$ as a feature and interpreting the feature importance as a causal effect is wrong. DML carefully separates nuisance estimation from causal estimation.
Ignoring the selection-on-observables assumption. DML handles observed confounders flexibly but does not address unobserved confounders — see the warning in Section A above.
Using DML when you have a clean research design. If you have an RCT, DiD, RDD, or strong IV, those designs are generally considered more credible than selection-on-observables. DML is well-suited for settings where you typically need to rely on observational adjustment with many covariates. For heterogeneous treatment effects, consider causal forests, which build on the same DML framework.
Not checking whether the ML models actually learn something. If the cross-validated $R^2$ for $\hat{g}$ and $\hat{m}$ is near zero, the residualization is not removing confounding, and DML reduces to OLS. It is also advisable to report multiple testing corrections when estimating effects on many outcomes simultaneously.

What Can Go Wrong

Omitting Cross-Fitting: Overfitting Bias Corrupts Inference

Use 5-fold cross-fitting: train the ML models on 4 folds, predict residuals on the held-out fold. Repeat for all folds, then regress residualized outcome on residualized treatment.

DML with cross-fitting: theta = 0.15 (SE = 0.04, 95% CI [0.07, 0.23]). Coverage in simulations: 94.6%. Valid inference.

What Can Go Wrong

Single Residualization: Missing the 'Double' in DML

Partial out confounders from BOTH the outcome Y and the treatment D using ML. Regress the residualized outcome on the residualized treatment (double residualization / Frisch-Waugh-Lovell with ML).

DML estimate: 0.15 (SE = 0.04). The double residualization removes confounding from both sides, yielding a Neyman-orthogonal estimating equation that is insensitive to first-order ML errors.

What Can Go Wrong

Weak First-Stage ML Models: Residualization Fails to Remove Confounding

Use well-tuned ML models (random forest with 500 trees, gradient boosting with cross-validated hyperparameters) for both nuisance functions. Verify cross-validated R-squared is reasonable.

Cross-validated R-squared: 0.65 for outcome model, 0.40 for treatment model. DML estimate: 0.15 (SE = 0.04). Residualized treatment is uncorrelated with covariates (R-squared < 0.01).

HPractice

Concept Check

A researcher runs DML to estimate the effect of advertising on sales, using 200 covariates. The DML estimate is 0.15 (SE = 0.04). The OLS estimate with all 200 covariates is 0.22 (SE = 0.03). What is the most likely explanation for the difference?

OLS is more efficient because it has a smaller standard errorThe difference suggests that nonlinear confounding inflates the OLS estimate, and DML more effectively removes confounding through flexible ML modelingDML is biased because ML models overfit with 200 covariatesThe DML estimate is smaller because residualizing removes some of the true causal effect along with the confounding

Concept Check

What is the purpose of the 'double' in Double/Debiased Machine Learning?

Using two different ML algorithms for comparisonRunning the estimation twice for robustnessPartialing out confounders from BOTH the treatment and the outcome using ML, then estimating the causal effect from the residualsDoubling the sample size through bootstrapping

Guided Exercise

Double Machine Learning: Estimating the Effect of R&D Spending on Firm Productivity

An economist wants to estimate the causal effect of R&D investment (D) on total factor productivity (Y) across 5,000 firms. The challenge is that 180 potential confounders (industry conditions, firm age, market concentration, prior performance) affect both R&D decisions and productivity. She uses DML with a random forest for both the outcome model and the treatment model, with 5-fold cross-fitting.

Error Detective

Read the analysis below carefully and identify the errors.

A marketing researcher estimates the causal effect of digital advertising spending on sales using DML with 150 covariates (competitor prices, seasonality dummies, weather, demographics). She uses a random forest for the outcome model and logistic regression for the treatment model (advertising is binarized as high/low). She reports: "DML estimate: 12% sales increase per ad campaign (SE = 2.1%, p < 0.001). Cross-validated R-squared for the outcome model is 0.78." She writes: "Our DML approach controls for high-dimensional confounders using machine learning, providing a credible causal estimate." She does not report the treatment model's performance.

Select all errors you can find:

Missing treatment model performance metric undermines the double in DML(Missing treatment model diagnostics)

Binarizing a continuous treatment variable discards information and changes the estimand(Binarization of advertising spending)

Error Detective

Read the analysis below carefully and identify the errors.

An applied microeconomist uses DML to estimate the return to college education on earnings, using 80 covariates from census data. He uses gradient boosting for both nuisance models with 5-fold cross-fitting. He reports: "DML estimate: $8,200 annual earnings premium (SE = $450). OLS estimate: $12,500 (SE = $380). The DML estimate is 34% smaller, demonstrating that nonlinear confounding substantially inflates the OLS estimate." He writes: "Since DML handles high-dimensional nonlinear confounding, our estimate represents the true causal return to education." He does not discuss potential unobserved confounders.

Select all errors you can find:

Claiming DML produces the 'true causal effect' without discussing unobserved confounders(The claim that DML provides the 'true causal return')

No comparison across different ML learners to check robustness(Use of only gradient boosting without robustness checks)

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors estimate the effect of corporate tax rate changes on firm investment using DML. They use firm-level panel data for 15,000 firms across 30 countries, with 300 covariates including financial ratios, industry indicators, and macroeconomic variables. They report a DML estimate of -0.45 (a 1 percentage point tax increase reduces investment by 0.45%). They compare with OLS (-0.32) and argue the difference shows OLS understates the negative effect due to nonlinear confounding.

Key Table

Method	Estimate	SE	95% CI
OLS	-0.32	0.05	[-0.42, -0.22]
DML (RF)	-0.45	0.08	[-0.61, -0.29]
DML (GBM)	-0.51	0.09	[-0.69, -0.33]
DML (Lasso)	-0.38	0.07	[-0.52, -0.24]

Outcome model CV R²: 0.72 (RF), 0.75 (GBM), 0.58 (Lasso)
Treatment model CV R²: 0.35 (RF), 0.38 (GBM), 0.22 (Lasso)

Authors' Identification Claim

DML controls for 300 confounders flexibly using ML, providing a credible estimate of the causal effect of taxes on investment. The cross-fitting procedure yields valid inference under unconfoundedness and the required ML convergence-rate conditions.

ISwap-In: When to Use Something Else

OLS with controls: When the number of controls is small, functional form is known, and there is no need for machine-learning flexibility.
IV / 2SLS: When endogeneity cannot be addressed by conditioning on observables and a valid instrument is available — DML assumes conditional exogeneity.
Matching: When a transparent matched-pair design is preferred over a regression-based approach, and the covariate space is moderate.
Doubly robust estimation: When double robustness is desired but the parametric setting suffices — DR estimators share the same doubly-robust logic as DML but are typically applied without cross-fitting.

JReviewer Checklist

Critical Reading Checklist

0 of 8 items checked0%

Is the selection-on-observables (conditional exogeneity) assumption stated and defended?
Is cross-fitting used (not just in-sample ML prediction)?
Are the ML learners' cross-validated performance metrics reported?
Are results compared across different ML learners?
Is a comparison with simple OLS provided?
Is the number of cross-fitting folds reported?
Are standard errors and confidence intervals from the DML procedure reported (not from the ML step)?
Is the Neyman orthogonality property explained or referenced?

Paper Library

Has replication code

Foundational (7)

Bach, P., Chernozhukov, V., Kurz, M. S., & Spindler, M. (2022). DoubleML – An Object-Oriented Implementation of Double Machine Learning in Python.

Journal of Machine Learning ResearchReplication

Bach and colleagues develop the DoubleML Python package, providing a user-friendly object-oriented implementation of the DML framework. The package supports partially linear, interactive, and instrumental variable models with a variety of machine learning methods for nuisance estimation. A companion R package is described separately.

Belloni, A., Chernozhukov, V., & Hansen, C. (2014). Inference on Treatment Effects after Selection among High-Dimensional Controls.

Review of Economic StudiesDOI: 10.1093/restud/rdt044

Belloni, Chernozhukov, and Hansen introduce the post-double-selection LASSO method for inference on treatment effects with many potential controls. This paper is a key precursor to DML, demonstrating how regularized selection in both the treatment and outcome equations can yield valid inference.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/Debiased Machine Learning for Treatment and Structural Parameters.

Econometrics JournalDOI: 10.1111/ectj.12097

Chernozhukov et al. introduce double/debiased machine learning (DML), showing how to combine Neyman orthogonality with cross-fitting to obtain root-n consistent and asymptotically normal estimates of low-dimensional causal parameters while using high-dimensional machine learning for nuisance functions. This paper provides the theoretical foundation for valid inference when first-stage estimation uses flexible ML methods that would otherwise invalidate standard asymptotic arguments. The cross-fitting procedure it introduces is now standard practice for any application combining ML prediction with causal parameter estimation.

Chernozhukov, V., Escanciano, J. C., Ichimura, H., Newey, W. K., & Robins, J. M. (2022). Locally Robust Semiparametric Estimation.

EconometricaDOI: 10.3982/ECTA16294

Chernozhukov, Escanciano, Ichimura, Newey, and Robins develop locally robust semiparametric estimators that extend the DML framework, demonstrating how automatic debiasing with machine learning first-stage estimates can be applied broadly. Their approach yields root-n consistent estimates of causal and structural parameters even when nuisance functions are estimated with regularized machine learning methods.

Fan, Q., Hsu, Y.-C., Lieli, R. P., & Zhang, Y. (2022). Estimation of Conditional Average Treatment Effects with High-Dimensional Data.

Journal of Business & Economic StatisticsDOI: 10.1080/07350015.2020.1811102

Fan and colleagues propose nonparametric estimators for conditional average treatment effects in high-dimensional settings. Their approach uses machine learning to estimate nuisance functions in a first stage, then applies local linear regression for the CATE function of interest, with functional limit theory and multiplier-bootstrap uniform inference.

Robinson, P. M. (1988). Root-N-Consistent Semiparametric Regression.

EconometricaDOI: 10.2307/1912705

Robinson develops the partially linear regression estimator that achieves root-n consistency for the parametric component by partialling out nonparametric nuisance functions. This paper provides the semiparametric foundation that DML generalizes to the machine learning setting.

Semenova, V., & Chernozhukov, V. (2021). Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions.

Econometrics JournalDOI: 10.1093/ectj/utaa027

Semenova and Chernozhukov extend DML to estimate conditional average treatment effects (CATEs) and other causal functions, allowing researchers to characterize treatment effect heterogeneity. They provide inference methods for projections of the CATE onto interpretable subgroups.

Application (1)

Knaus, M. C., Lechner, M., & Strittmatter, A. (2021). Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence.

Econometrics JournalDOI: 10.1093/ectj/utaa014

Knaus, Lechner, and Strittmatter apply DML-based methods to estimate heterogeneous causal effects of a Swiss active labor market program, comparing causal forests, DML, and other machine learning approaches. The paper provides an empirical Monte Carlo framework that uses real data to benchmark different estimators, offering practical guidance for applied researchers choosing among machine learning causal inference tools.

Survey (2)

Athey, S., & Imbens, G. W. (2019). Machine Learning Methods That Economists Should Know About.

Annual Review of EconomicsDOI: 10.1146/annurev-economics-080217-053433

Athey and Imbens provide a broad survey of machine learning methods relevant to economists, covering supervised learning, unsupervised learning, matrix completion, and methods at the intersection of ML and causal inference including DML and causal forests. The paper explains when and why machine learning methods can improve both prediction and causal inference in economics. It serves as an accessible entry point for applied researchers seeking to understand the full landscape of ML tools available for economic applications.

Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach.

Journal of Economic PerspectivesDOI: 10.1257/jep.31.2.87

Mullainathan and Spiess provide an accessible introduction to supervised machine learning for economists, emphasizing how ML differs from classical parameter estimation and where prediction-oriented tools can be useful in empirical economics. The paper is a broad ML-for-economists survey, not a foundational paper on double/debiased machine learning specifically.

One-Line Implementation

Download Full Analysis Code

Motivating Example: Estimating Price Elasticity of Demand#

AOverview#

The Problem with Naive ML#

The DML Solution#

Neyman Orthogonality (Plain Language)#

Common Confusions#

BIdentification#

The Partially Linear Model#

Identifying Assumptions#

The DML Estimator#

Formal Statement#

CVisual Intuition#

DMathematical Derivation#

EImplementation#

FDiagnostics#

Interpreting Your Results#

GWhat Can Go Wrong#

Omitting Cross-Fitting: Overfitting Bias Corrupts Inference

Single Residualization: Missing the 'Double' in DML

Weak First-Stage ML Models: Residualization Fails to Remove Confounding

HPractice#

Paper Summary

Key Table

Authors' Identification Claim

ISwap-In: When to Use Something Else#

JReviewer Checklist#

Critical Reading Checklist

Paper Library

Foundational (7)

Application (1)

Survey (2)

Tags

Motivating Example: Estimating Price Elasticity of Demand

AOverview

The Problem with Naive ML

The DML Solution

(Plain Language)

Common Confusions

BIdentification

The Partially Linear Model

Identifying Assumptions

The DML Estimator

Formal Statement

CVisual Intuition

DMathematical Derivation

EImplementation

FDiagnostics

Interpreting Your Results

GWhat Can Go Wrong

HPractice

ISwap-In: When to Use Something Else

JReviewer Checklist