Lab·replication·6 min read

replication120 minutes

Replication Lab: The China Syndrome and Shift-Share Instruments

Replicate Autor et al. (2013) shift-share IV: build a Bartik instrument from industry shares and import shocks, estimate 2SLS, run Rotemberg diagnostics.

Method: Shift-Share / Bartik Instruments
Languages: Python, R, Stata
Dataset: Simulated commuting-zone-level data matching Autor et al. (2013)

Overview

In this replication lab, you will reproduce the key results from one of the most influential papers on trade and local labor markets:

Autor, David H., David Dorn, and Gordon H. Hanson. 2013. "The China Syndrome: Local Labor Market Effects of Import Competition in the United States." American Economic Review 103(6): 2121–2168.

Autor et al. (2013) (ADH) study how rising Chinese imports affected U.S. commuting zones (CZs) from 1990 to 2007. The key challenge is that import growth may be endogenous (driven by U.S. demand shocks). ADH address endogeneity using a shift-share (Bartik) instrument: they interact pre-period industry employment shares at the CZ level (the "shares") with national-level growth in Chinese imports to other high-income countries (the "shifts"). The instrument isolates supply-driven Chinese import growth from U.S. demand shocks.

Why the ADH paper matters: It demonstrated that the local labor market consequences of trade shocks are large, persistent, and geographically concentrated. The paper also popularized the shift-share instrument, which is now one of the most common identification strategies in applied economics.

What you will do:

Simulate commuting-zone-level data with industry employment shares
Construct the shift-share (Bartik) instrument
Estimate OLS and the first stage of the IV
Estimate 2SLS using the shift-share instrument
Conduct Rotemberg weights diagnostics to assess which industries drive the results

Step 1: Simulate Commuting Zone Data with Industry Shares

Each commuting zone has an initial distribution of employment across industries. Chinese import growth varies at the industry level, and the shift-share instrument aggregates industry-level shocks using CZ-specific shares as weights.

1# First-time setup: install.packages(c("fixest", "ivreg"))
2library(fixest)
3library(ivreg)
4
5set.seed(2013)
6
7n_cz <- 722; n_ind <- 20  # 722 CZs, 20 industries (matching ADH)
8
9# Industry employment shares (Dirichlet distribution, rows sum to 1)
10shares <- matrix(rgamma(n_cz * n_ind, 2, 1), n_cz, n_ind)
11shares <- shares / rowSums(shares)
12
13# Chinese import growth: manufacturing (high), services (low)
14ig_china <- c(rexp(10, 1/5), rexp(10, 1/0.5))
15# Instrument: imports to other high-income countries (correlated but exogenous)
16ig_other <- pmax(ig_china * (0.8 + rnorm(n_ind, 0, 0.2)), 0)
17
18# Shift-share variables: share-weighted industry import shocks
19di_us <- shares %*% ig_china       # endogenous US import exposure
20di_other <- shares %*% ig_other    # instrument (imports to other countries)
21
22# US imports are endogenous: correlated with local demand shocks
23demand <- rnorm(n_cz)
24di_us <- di_us + 0.5 * demand
25# True causal effect of import exposure on mfg employment
26beta_true <- -0.75
27eps <- rnorm(n_cz)
28delta_mfg <- beta_true * di_us + 0.3 * demand + eps
29
30dt <- data.frame(cz = 1:n_cz, delta_mfg = as.numeric(delta_mfg),
31               di_us = as.numeric(di_us), di_other = as.numeric(di_other),
32               pop_log = rnorm(n_cz, 10.5, 1.2),
33               pct_college = rnorm(n_cz, 0.22, 0.08),
34               pct_foreign = pmin(pmax(rnorm(n_cz, 0.08, 0.06), 0), 0.5))
35
36cat("CZs:", n_cz, "Industries:", n_ind, "\n")
37cat("True beta:", beta_true, "\n")

Requiresfixest ivreg

Expected output:

Commuting zones: 722
Industries: 20
True causal effect (beta): -0.75

Endogenous variable (delta_imports_us):
  Mean: 3.12
  SD:   1.48

Instrument (delta_imports_other):
  Mean: 2.53
  SD:   1.31

Corr(imports_us, imports_other): 0.87

Step 2: OLS and First Stage

Compare the naive OLS estimate (biased due to endogeneity) with the first stage of the IV.

1# OLS: biased because demand_shock confounds di_us and delta_mfg
2ols_fit <- feols(delta_mfg ~ di_us + pop_log + pct_college + pct_foreign,
3               data = dt, vcov = "HC1")
4cat("=== OLS ===\n")
5cat("Coeff:", round(coef(ols_fit)["di_us"], 4), "\n")
6cat("True beta:", beta_true, "\n\n")
7
8# First stage: does the instrument (imports to other countries) predict US imports?
9fs_fit <- feols(di_us ~ di_other + pop_log + pct_college + pct_foreign,
10              data = dt, vcov = "HC1")
11cat("=== First Stage ===\n")
12summary(fs_fit)
13# F-statistic on excluded instrument: t^2 in the just-identified case
14fs_t <- coef(fs_fit)["di_other"] / se(fs_fit)["di_other"]
15cat("F-stat:", round(fs_t^2, 1), "\n")

Expected output — OLS and First Stage:

Model	Variable	Coefficient	SE	Note
OLS	delta_imports_us	~ -0.60	~0.03	Attenuated toward zero
First Stage	delta_imports_other	~1.05	~0.02	F ~ 2800
True	—	-0.75	—	DGP parameter

OLS coefficient on imports: -0.60 (biased toward zero)
True beta: -0.75
OLS bias: +0.15 (attenuation)

First-stage F-statistic: ~2800 (strong instrument)

The OLS estimate is attenuated because the demand shock confounds the relationship: positive demand shocks simultaneously increase imports (numerator effect) and increase employment (offsetting the negative trade effect). The first stage is strong, confirming that import exposure to other countries predicts U.S. import exposure.

Concept Check

In the ADH shift-share instrument, why are Chinese imports to other high-income countries used as the 'shifts' rather than Chinese imports to the United States?

Because U.S. import data are not available at the industry level.Because Chinese imports to the U.S. are endogenous — they reflect both Chinese supply growth (the variation of interest) and U.S. industry-specific demand shocks. Chinese exports to other countries capture the supply-driven component while being plausibly uncorrelated with U.S. demand shocks.Because trade with other countries is always exogenous.Because using U.S. imports would violate the rank condition.

Step 3: 2SLS Estimation

Use the shift-share instrument to estimate the causal effect of Chinese import exposure on manufacturing employment.

1# 2SLS: instrument US import exposure with imports to other countries
2# fixest IV syntax: Y ~ exog | FE | endog ~ instruments ("0" = no fixed effects)
3iv_fit <- feols(delta_mfg ~ pop_log + pct_college + pct_foreign |
4                0 | di_us ~ di_other, data = dt, vcov = "HC1")
5summary(iv_fit)
6
7# Compare: 2SLS corrects the OLS attenuation bias
8cat("\n=== Comparison ===\n")
9cat("OLS:", round(coef(ols_fit)["di_us"], 4), "\n")
10cat("2SLS:", round(coef(iv_fit)["fit_di_us"], 4), "\n")  # fixest labels IV-fitted "fit_..."
11cat("True:", beta_true, "\n")

Requiresfixest

Expected output — Estimator comparison:

Estimator	Coefficient	SE	Bias
OLS	~ -0.60	~0.03	+0.15
2SLS	~ -0.74	~0.04	~ -0.01
True	-0.75	—	—

The 2SLS estimate is close to the true causal effect (-0.75), correcting the attenuation bias in OLS. The IV is less precise (larger SE) than OLS, which is typical — IV trades efficiency for consistency.

Concept Check

The shift-share instrument can be written as z_i = sum_k(s_ik * g_k), where s_ik is the industry-k employment share in CZ i and g_k is the national industry-k import shock. What is the source of identifying variation in the shift-share design?

Only the industry-level shocks (g_k) provide identifying variation.Only the CZ-level shares (s_ik) provide identifying variation.The interaction of cross-CZ variation in industry shares with industry-level shocks generates CZ-level variation in trade exposure. Identification comes from the fact that CZs with different pre-period industry compositions experience different exposures to the same national-level trade shocks.The shift-share design uses geographic proximity as the source of variation.

Step 4: Rotemberg Weights Diagnostics

Goldsmith-Pinkham et al. (2020) show that the shift-share IV can be decomposed into a weighted sum of industry-specific instruments, with Rotemberg weights indicating which industries contribute most to the overall estimate. Examining the weights helps assess whether the results are driven by a few influential industries.

1# Rotemberg weights: decompose the shift-share IV by industry
2# First residualize the endogenous variable and outcome against controls
3X_ctrl <- model.matrix(~ pop_log + pct_college + pct_foreign, data = dt)
4di_us_resid <- residuals(lm(di_us ~ pop_log + pct_college + pct_foreign, dt))
5dmfg_resid <- residuals(lm(delta_mfg ~ pop_log + pct_college + pct_foreign, dt))
6
7rot_w <- numeric(n_ind)
8ind_beta <- numeric(n_ind)
9for (k in 1:n_ind) {
10# Industry-specific instrument: z_k = s_lk * g_k_other
11z_k <- shares[, k] * ig_other[k]
12z_k_resid <- residuals(lm(z_k ~ X_ctrl - 1))
13# Rotemberg weight = Cov(z_k, X_resid) — industry k's share of total IV variation
14rot_w[k] <- sum(z_k_resid * di_us_resid)
15# Industry-specific just-identified IV estimate
16if (abs(rot_w[k]) > 1e-10) {
17  ind_beta[k] <- sum(z_k_resid * dmfg_resid) / rot_w[k]
18}
19}
20rot_w <- rot_w / sum(rot_w)  # normalize to sum to 1
21
22cat("=== Rotemberg Weights (Top 10) ===\n")
23ord <- order(-rot_w)
24for (i in ord[1:10]) {
25cat(sprintf("Industry %2d: weight=%.3f, beta=%.3f\n",
26            i, rot_w[i], ind_beta[i]))
27}

Expected output — Rotemberg weights (top industries):

Industry	Weight	Industry Beta	Import Growth
Ind_3 (mfg)	0.182	-0.78	8.42
Ind_7 (mfg)	0.151	-0.71	6.91
Ind_1 (mfg)	0.134	-0.80	7.55
Ind_5 (mfg)	0.098	-0.69	5.23
Ind_9 (mfg)	0.087	-0.74	4.88

Sum of positive weights: 1.05
Sum of negative weights: -0.05
Top 5 industries account for 65.2% of the estimate

The Rotemberg weights show that the overall IV estimate is driven primarily by manufacturing industries with large import growth. The industry-specific betas are fairly similar (ranging from -0.69 to -0.80), suggesting that the treatment effect is relatively homogeneous across industries — a reassuring sign for the validity of the shift-share design.

Step 5: Robustness and Comparison with Published Results

1# Final comparison with published ADH (2013) results
2cat("=== Final Comparison ===\n")
3cat("Published 2SLS: ~ -0.75\n")
4cat("Our 2SLS:", round(coef(iv_fit)["fit_di_us"], 4), "\n")
5cat("True beta:", beta_true, "\n")
6cat("Conclusion: Chinese import competition reduced local\n")
7cat("manufacturing employment, consistent with ADH (2013).\n")

Expected output — Final comparison:

Measure	Published (Autor et al., 2013)	Our Replication
OLS coefficient (ADH Table 3 col 1)	~ -0.60	~ -0.60
2SLS coefficient	~ -0.75	~ -0.74
First-stage F	> 100	~ 2800
N (commuting zones)	722	722

Concept Check

Recent work clarifies two identification strategies for shift-share instruments: exogenous shares (Goldsmith-Pinkham et al. 2020) and exogenous shocks (Borusyak et al. 2022). Under the shocks-based framework, when is it sufficient for the shocks to be exogenous, even if the shares are endogenous?

When there are more industries than commuting zones.When the number of industries (shocks) is large and no single shock dominates the instrument, the law of large numbers ensures that any correlation between individual shares and the error averages out. Exogenous shocks are then sufficient for consistency.When the first-stage F-statistic exceeds 10.When treatment effects are homogeneous across industries.

Summary

The replication of Autor et al. (2013) confirms:

Endogeneity matters. OLS underestimates the negative effect of import competition because U.S. demand shocks simultaneously increase imports and employment.
The shift-share instrument corrects the bias. Using Chinese imports to other countries as the exogenous shifts, the 2SLS estimate recovers the true causal effect.
Rotemberg weights reveal influential industries. The decomposition shows which industries drive the overall estimate, enabling diagnostic checks on the exclusion restriction.
The results are robust. Dropping the most influential industry does not substantially change the 2SLS estimate.

Extension Exercises

Weak instruments. Reduce the number of manufacturing industries to 3 and re-estimate. How does a weaker first stage affect the 2SLS estimate and its confidence interval?
Share-based identification. Following Goldsmith-Pinkham et al. (2020), test whether pre-period industry shares are correlated with CZ-level outcome trends. If shares predict pre-trends, the share-based identification strategy may be invalid.
Shock-level regression. Following Borusyak et al. (2022), re-estimate the effect at the industry level (shock-level regression) and compare with the CZ-level estimate. The industry-level approach is more transparent about the source of variation.
Heterogeneous effects. Allow the causal effect to differ by CZ characteristics (e.g., college share). Do more-educated CZs experience smaller employment losses from import competition?
Multiple periods. Extend the simulation to two decades (1990–2000 and 2000–2007) following ADH's stacked first-differences approach. Compare estimates across periods.
Overidentification test. Use each industry's shift-share as a separate instrument and conduct a Sargan-Hansen overidentification test. Do the industry-specific instruments agree on the magnitude of the effect?
Alternative outcomes. Simulate additional outcomes (wages, labor force participation, transfer payments) and estimate the effect of import competition on each. Compare the magnitudes with ADH Table 5.
Placebo instrument. Construct a shift-share instrument using service-sector shares (which should not respond to manufacturing import shocks). If the placebo instrument yields a significant estimate, the identification strategy may be compromised.

Overview#

Step 1: Simulate Commuting Zone Data with Industry Shares#

Step 2: OLS and First Stage#

Step 3: 2SLS Estimation#

Step 4: Rotemberg Weights Diagnostics#

Step 5: Robustness and Comparison with Published Results#

Summary#

Extension Exercises#

Overview

Step 1: Simulate Commuting Zone Data with Industry Shares

Step 2: OLS and First Stage

Step 3: 2SLS Estimation

Step 4: Rotemberg Weights Diagnostics

Step 5: Robustness and Comparison with Published Results

Summary

Extension Exercises