Lab·tutorial·6 min read

tutorial120 minutes

Lab: Matching Methods from Scratch

Implement propensity score matching, nearest-neighbor matching, and coarsened exact matching: assess balance, choose methods, and recognize matching failures.

Method: Matching (PSM, CEM, NN, Weighting)
Languages: Python, R, Stata
Dataset: Job training program evaluation (simulated LaLonde style)

Overview

Matching methods estimate causal effects by pairing treated units with similar control units based on observed characteristics. The idea is simple: if you can find a control unit that looks exactly like a treated unit on every relevant dimension, their difference in outcomes estimates the treatment effect.

What you will learn:

How to estimate propensity scores and assess their overlap
How to implement nearest-neighbor matching (with and without replacement)
How to implement propensity score matching (PSM) and inverse probability weighting (IPW)
How to assess covariate balance before and after matching
When matching works and when it fails (unobserved confounders)

Prerequisites: OLS regression (see the OLS lab). Understanding of selection bias.

Step 1: The Setting

We simulate a job training program inspired by the evaluation. Workers decide whether to enroll in a job training program, and we observe their earnings afterward. The challenge: workers who choose to enroll differ from workers who do not (selection bias).

1# First-time setup: install.packages(c("MatchIt", "cobalt", "estimatr"))
2library(MatchIt)
3library(cobalt)
4library(estimatr)
5
6set.seed(2024)
7n <- 3000
8
9# Pre-treatment covariates
10age <- round(pmin(pmax(rnorm(n, 33, 10), 18), 60))
11education <- round(pmin(pmax(rnorm(n, 11, 3), 5), 18))
12married <- rbinom(n, 1, 0.4)
13black <- rbinom(n, 1, 0.3)
14hispanic <- rbinom(n, 1, 0.1)
15prev_earnings <- rlnorm(n, 9.5, 1.0)
16
17# Selection into treatment (depends on covariates — NOT random!)
18propensity_true <- plogis(-2 - 0.03 * age + 0.15 * education -
19                         0.5 * married + 0.8 * black +
20                         0.3 * hispanic - 0.0001 * prev_earnings)
21treatment <- rbinom(n, 1, propensity_true)
22
23# Outcome: post-training earnings (true treatment effect = $1,500)
24true_ate <- 1500
25earnings <- 10000 + 200 * age + 500 * education + 2000 * married -
26          1000 * black + 0.3 * prev_earnings +
27          true_ate * treatment + rnorm(n, 0, 3000)
28
29df <- data.frame(earnings, treatment, age, education, married,
30               black, hispanic, prev_earnings)
31
32# Naive difference is biased due to selection on observables
33cat("Naive difference:", mean(earnings[treatment == 1]) -
34  mean(earnings[treatment == 0]), "\n")
35cat("True effect:", true_ate, "\n")

RequiresMatchIt cobalt estimatr

Expected output:

Sample: 3000 workers, 852 treated (28.4%)

Naive difference in means:
  Treated mean earnings:   $26,215
  Control mean earnings:   $25,348
  Naive difference:        $867
  True treatment effect:   $1,500

Statistic	Value
N (total)	3,000
N (treated)	~852
Treatment rate	~28%
Treated mean earnings	~$26,215
Control mean earnings	~$25,348
Naive difference	~$867
True ATE	$1,500

The naive difference (~$867) underestimates the true effect ($1,500) because treated individuals differ systematically from controls on characteristics that also affect earnings.

Step 2: Assess Covariate Imbalance (Pre-Matching)

Before matching, document how different the treated and control groups are.

1# Balance table using cobalt
2bal.tab(treatment ~ age + education + married + black + hispanic +
3      prev_earnings, data = df, un = TRUE,
4      stats = c("m", "v"), thresholds = c(m = 0.1))
5
6# Standardized mean differences > 0.1 indicate imbalance

Requirescobalt

Expected output:

Pre-Matching Covariate Balance
----------------------------------------------------------------------
Variable            Treated    Control       Diff   Std Diff
----------------------------------------------------------------------
age                   31.25      33.82      -2.57     -0.254
education             11.78      10.72       1.06      0.345
married                0.31       0.44      -0.13     -0.262
black                  0.42       0.25       0.17      0.377
hispanic               0.12       0.09       0.03      0.102
prev_earnings      12845.20   15621.40   -2776.20     -0.183
----------------------------------------------------------------------
Standardized differences > 0.1 (10%) indicate meaningful imbalance.

Variable	Treated	Control	Std Diff	Balanced?
age	31.25	33.82	-0.254	No
education	11.78	10.72	0.345	No
married	0.31	0.44	-0.262	No
black	0.42	0.25	0.377	No
hispanic	0.12	0.09	0.102	Borderline
prev_earnings	12,845	15,621	-0.183	No

Nearly all covariates show standardized differences well above the 0.10 threshold, confirming substantial pre-matching imbalance.

Step 3: Estimate Propensity Scores

The propensity score is the probability of being treated given observed covariates: $e(X) = P(D = 1 | X)$ . Matching on the propensity score is equivalent to matching on all covariates simultaneously (Rosenbaum & Rubin, 1983).

1# First-time setup: install.packages(c("ggplot2"))
2# Estimate propensity scores
3ps_model <- glm(treatment ~ age + education + married + black +
4               hispanic + prev_earnings,
5               data = df, family = binomial)
6df$pscore <- predict(ps_model, type = "response")
7
8# Overlap check
9library(ggplot2)
10ggplot(df, aes(x = pscore, fill = factor(treatment))) +
11geom_histogram(alpha = 0.5, position = "identity", bins = 50) +
12scale_fill_manual(values = c("#0072B2", "#D55E00"),
13                  labels = c("Control", "Treated")) +
14labs(title = "Propensity Score Overlap",
15     x = "Propensity Score", fill = "Group") +
16theme_minimal()

Requiresggplot2

Expected output:

Propensity Score Summary:

Statistic	Treated	Control
count	852	2148
mean	0.345	0.254
std	0.138	0.121
min	0.042	0.015
25%	0.238	0.165
50%	0.332	0.240
75%	0.438	0.328
max	0.785	0.742

Step 4: Propensity Score Matching

Match each treated unit to the control unit with the closest propensity score.

1# Propensity score matching using MatchIt
2m_ps <- matchit(treatment ~ age + education + married + black +
3               hispanic + prev_earnings,
4               data = df, method = "nearest",
5               distance = "glm", replace = TRUE)
6
7summary(m_ps)
8
9# Estimate ATT on matched sample
10matched_data <- match.data(m_ps)
11att_model <- lm_robust(earnings ~ treatment, data = matched_data,
12                      se_type = "HC2")
13cat("\nATT estimate:", coef(att_model)["treatment"], "\n")
14cat("True effect:", true_ate, "\n")

RequiresMatchIt

Expected output:

Propensity Score Matching (1:1, nearest neighbor)
ATT estimate:       $1,582
True effect:        $1,500
N treated:          852
N matched controls: 748 unique

Estimator	ATT Estimate	True ATE
PSM (1:1 NN)	~$1,582	$1,500

The PSM estimate is close to the true effect. Some control units are used more than once (748 unique matches for 852 treated units), which is typical with matching with replacement.

Step 5: Assess Post-Matching Balance

The primary objective of matching is to make treated and control groups comparable on observed characteristics. After matching, it is essential to verify whether the procedure achieved adequate covariate balance.

1# Post-matching balance using cobalt
2bal.tab(m_ps, un = TRUE, stats = c("m", "v"), thresholds = c(m = 0.1))
3
4# Love plot (visual balance assessment)
5love.plot(m_ps, thresholds = c(m = 0.1),
6        var.order = "unadjusted",
7        title = "Covariate Balance Before and After Matching")

Requirescobalt

Expected output:

Post-Matching Covariate Balance
-------------------------------------------------------
Variable            Treated    Control   Std Diff
-------------------------------------------------------
age                   31.25      31.48     -0.023
education             11.78      11.72      0.020
married                0.31       0.30      0.020
black                  0.42       0.41      0.022
prev_earnings      12845.20   12918.50     -0.005
-------------------------------------------------------
All standardized differences should be below 0.1 after matching.

Variable	Pre-Match Std Diff	Post-Match Std Diff	Improved?
age	-0.254	-0.023	Yes
education	0.345	0.020	Yes
married	-0.262	0.020	Yes
black	0.377	0.022	Yes
prev_earnings	-0.183	-0.005	Yes

All standardized differences are now well below the 0.10 threshold, indicating excellent post-matching balance.

Concept Check

After matching, you find that standardized differences are below 0.05 for all covariates. Does this mean your treatment effect estimate is unbiased?

Yes — balanced covariates guarantee an unbiased estimate.No — matching can only balance on observed covariates. If there are important unobserved confounders that affect both treatment selection and outcomes, matching will still produce a biased estimate.No — matching is always biased because it reduces sample size.Yes — but only if the propensity score model is correctly specified.

Step 6: Alternative Matching Methods

Coarsened Exact Matching (CEM)

Instead of matching on the propensity score, CEM bins covariates into categories and matches exactly within bins.

1# CEM using MatchIt
2m_cem <- matchit(treatment ~ age + education + married + black +
3                hispanic + prev_earnings,
4                data = df, method = "cem",
5                cutpoints = list(age = c(25, 35, 45),
6                                 education = c(9, 12, 15)))
7
8summary(m_cem)
9
10# ATT estimate
11cem_data <- match.data(m_cem)
12att_cem <- lm_robust(earnings ~ treatment, data = cem_data,
13                    weights = cem_data$weights, se_type = "HC2")
14cat("CEM ATT:", coef(att_cem)["treatment"], "\n")
15cat("True effect:", true_ate, "\n")

RequiresMatchIt

Expected output:

CEM: 2485 matched observations out of 3000 total
Matched cells: 142

CEM ATT estimate: $1,425
True effect:      $1,500

Estimator	Matched Obs	ATT Estimate	True ATE
CEM	~2,485	~$1,425	$1,500

CEM discards ~515 observations that have no exact match within their coarsened stratum. The estimate is close to the true effect.

Inverse Probability Weighting (IPW)

Instead of dropping unmatched units, IPW reweights the sample so that the weighted distribution of covariates is balanced.

1# IPW for ATT
2df$ipw_weight <- ifelse(df$treatment == 1, 1,
3                       df$pscore / (1 - df$pscore))
4
5# Trim extreme weights
6df$ipw_weight_trimmed <- pmin(df$ipw_weight,
7                             quantile(df$ipw_weight, 0.99))
8
9# Weighted regression
10m_ipw <- lm_robust(earnings ~ treatment, data = df,
11                  weights = ipw_weight_trimmed, se_type = "HC2")
12cat("IPW ATT:", coef(m_ipw)["treatment"], "\n")
13cat("True effect:", true_ate, "\n")

Expected output:

IPW ATT estimate: $1,548
True effect:      $1,500

Max weight (untrimmed): 12.5
Max weight (trimmed):   4.2

Estimator	ATT Estimate	True ATE	Max Weight
IPW (trimmed)	~$1,548	$1,500	~4.2

Trimming extreme weights at the 99th percentile stabilizes the estimate while keeping it close to the true effect.

Step 7: Compare All Methods

1# Naive difference (biased due to selection)
2naive <- mean(df$earnings[df$treatment == 1]) -
3       mean(df$earnings[df$treatment == 0])
4
5# OLS with controls (linear adjustment for confounders)
6ols_ctrl <- lm_robust(earnings ~ treatment + age + education + married +
7                     black + hispanic + prev_earnings,
8                     data = df, se_type = "HC2")
9
10# Compare all methods against the true effect
11cat("=== Method Comparison ===\n")
12cat(sprintf("True Effect:      %7.0f\n", true_ate))
13cat(sprintf("Naive:            %7.0f\n", naive))
14cat(sprintf("OLS + Controls:   %7.0f\n", coef(ols_ctrl)["treatment"]))
15cat(sprintf("PSM:              %7.0f\n", coef(att_model)["treatment"]))
16cat(sprintf("CEM:              %7.0f\n", coef(att_cem)["treatment"]))
17cat(sprintf("IPW:              %7.0f\n", coef(m_ipw)["treatment"]))

Expected output:

=== Method Comparison ===
Method                          Estimate      Bias
--------------------------------------------------
True Effect                       $1,500        $0
Naive Difference                    $867      -$633
OLS with Controls                 $1,478       -$22
PSM (1:1 NN)                      $1,582       +$82
CEM                               $1,425       -$75
IPW                               $1,548       +$48

Method	Estimate	Bias	Bias (%)
True Effect	$1,500	$0	0%
Naive Difference	~$867	-$633	-42%
OLS with Controls	~$1,478	-$22	-1%
PSM (1:1 NN)	~$1,582	+$82	+5%
CEM	~$1,425	-$75	-5%
IPW	~$1,548	+$48	+3%

All matching/weighting methods substantially reduce the bias compared to the naive difference. OLS, PSM, CEM, and IPW are all within ~$100 of the true effect, while the naive difference misses by ~$633.

Step 8: Exercises

Matching with replacement. Allow control units to be matched to multiple treated units. How does this change affect the estimate and balance?
Caliper matching. Set a maximum distance for matches (caliper = 0.1 standard deviations of the propensity score). How many units are dropped? How does the estimate change?
Sensitivity analysis. Use the Rosenbaum bounds approach to assess how sensitive your estimate is to an unobserved confounder. At what level of unobserved confounding does the estimate become insignificant?
Bad control. Add a post-treatment variable (e.g., hours worked after training) to the matching covariates. Show that this addition introduces bias by conditioning on a collider.

Expected output

If your code runs correctly, expect to see:

Naive difference in means: Biased (not equal to $1,500), because treatment is correlated with covariates that also affect earnings
OLS with controls: Coefficient on treatment closer to $1,500 but still potentially biased if functional form is wrong
Propensity score matching (PSM): ATT estimate around $1,000–$2,000 (true ATE: $1,500)
Nearest-neighbor matching: Similar to PSM, around $1,000–$2,000
Coarsened exact matching (CEM): ATT estimate around $1,000–$2,000
IPW estimate: Around $1,200–$1,800, closer to the true value with proper trimming
Covariate balance (after matching): Standardized mean differences below 0.10 for all covariates
Covariate balance (before matching): Significant imbalance on age, education, race, and previous earnings
Propensity score overlap: Substantial overlap between treated and control groups, with some thinning in the tails
Sample size: 3,000 workers, with treated share around 25–40%

Summary

In this lab you learned:

Matching reduces selection bias by comparing treated units to similar control units
Propensity score matching, CEM, and IPW are three common approaches with different tradeoffs
It is important to check and report covariate balance before and after matching
Matching requires the "selection on observables" assumption — if unobserved confounders drive selection, matching will be biased
Trim extreme propensity score weights to avoid instability
Generally avoid matching on post-treatment variables (bad controls); see the anti-patterns guide for the canonical treatment
Multiple matching methods should give similar results if the design is sound; large discrepancies signal problems

Overview#

Step 1: The Setting#

Step 2: Assess Covariate Imbalance (Pre-Matching)#

Step 3: Estimate Propensity Scores#

Step 4: Propensity Score Matching#

Step 5: Assess Post-Matching Balance#

Step 6: Alternative Matching Methods#

Coarsened Exact Matching (CEM)#

Inverse Probability Weighting (IPW)#

Step 7: Compare All Methods#

Step 8: Exercises#

Summary#

Overview

Step 1: The Setting

Step 2: Assess Covariate Imbalance (Pre-Matching)

Step 3: Estimate Propensity Scores

Step 4: Propensity Score Matching

Step 5: Assess Post-Matching Balance

Step 6: Alternative Matching Methods

Coarsened Exact Matching (CEM)

Inverse Probability Weighting (IPW)

Step 7: Compare All Methods

Step 8: Exercises

Summary