The package ShiftShareSE
implements confidence intervals
proposed by Adão, Kolesár, and Morales
(2019) for inference in shift-share least squares and
instrumental variables regressions, in which the regressor of interest
(or the instrument) has a shift-share structure, as in Bartik (1991). A shift-share variable has the
structure $X_{i}=\sum_{s=1}^{S}w_{is}\Xs_{s}$, where
i indexes regions, s indexes sectors, $\Xs_{s}$ are sectoral shifters (or shocks),
and wis are
shares, such as initial share of region i’s employment in sector s.
This vignette illustrates the use of the package using a dataset from
Autor, Dorn, and Hanson (2013) (ADH
hereafter). The dataset is included in the package as the list
ADH
. The first element of the list, ADH$reg
is
a data-frame with regional variables, the second element,
ADH$sic
is a vector of SIC codes for the sectors, and
ADH$W
is a matrix of shares. See ?ADH
for a
description of the dataset.
We now replicate column (1) of Table V in Adão, Kolesár, and Morales (2019). First we load the package, define the vector of controls, and define a vector of 3-digit SIC codes:
library("ShiftShareSE")
ctrls <- paste("t2 + l_shind_manuf_cbp + l_sh_popedu_c +",
"l_sh_popfborn + l_sh_empl_f + l_sh_routine33", " + l_task_outsource + division")
sic <- floor(ADH$sic/10)
We cluster the standard errors at the 3-digit SIC code (using the
option sector_cvar
), and, following ADH, weight the data
using the weights ADH$reg$weights
. See ?reg_ss
and ?ivreg_ss
for full description of the options.
The first-stage regression:
reg_ss(as.formula(paste("shock ~ ", ctrls)), W = ADH$W,
X = IV, data = ADH$reg, weights = weights, region_cvar = statefip,
sector_cvar = sic, method = "all")
#> Estimate: 0.6310409
#>
#> Inference:
#> Std. Error p-value Lower CI Upper CI
#> Homoscedastic 0.02732516 0.000000e+00 0.5774846 0.6845973
#> EHW 0.08700719 4.083400e-13 0.4605100 0.8015719
#> Reg. cluster 0.09142372 5.113909e-12 0.4518537 0.8102281
#> AKM 0.05296055 0.000000e+00 0.5272402 0.7348417
#> AKM0 0.07671358 1.282891e-03 0.5375710 0.8382827
Note that for "AKM0"
, "Std. Error"
corresponds to the normalized standard error, i.e. the length of the
confidence interval divided by 2z1 − α/2.
The reduced-form and IV regressions:
reg_ss(as.formula(paste("d_sh_empl ~", ctrls)), W = ADH$W,
X = IV, data = ADH$reg, region_cvar = statefip, weights = weights,
sector_cvar = sic, method = "all")
#> Estimate: -0.4885687
#>
#> Inference:
#> Std. Error p-value Lower CI Upper CI
#> Homoscedastic 0.06332778 1.221245e-14 -0.6126889 -0.3644485
#> EHW 0.11244360 1.392685e-05 -0.7089541 -0.2681833
#> Reg. cluster 0.07578147 1.140306e-10 -0.6370977 -0.3400398
#> AKM 0.16419445 2.924641e-03 -0.8103839 -0.1667535
#> AKM0 0.25437489 4.218033e-04 -1.2368853 -0.2397541
ivreg_ss(as.formula(paste("d_sh_empl ~", ctrls, "| shock")),
W = ADH$W, X = IV, data = ADH$reg, region_cvar = statefip,
weights = weights, sector_cvar = sic, method = "all")
#> Estimate: -0.7742267
#>
#> Inference:
#> Std. Error p-value Lower CI Upper CI
#> Homoscedastic 0.1069532 4.523049e-13 -0.9838511 -0.5646022
#> EHW 0.1647892 2.623532e-06 -1.0972075 -0.4512459
#> Reg. cluster 0.1758096 1.063809e-05 -1.1188071 -0.4296462
#> AKM 0.2403730 1.277718e-03 -1.2453492 -0.3031041
#> AKM0 0.3318966 4.218033e-04 -1.6903240 -0.3893132
We now discuss how the methods in Adão, Kolesár, and Morales (2019) extend to the case where there are multiple shifters, or, in the case of an IV regression, multiple endogenous variables. Currently, these extensions are not implemented in the package.
Suppose that we’re interested in the effect of a k-vector of shift-share regressors, $X_i=\sum_{s}w_{is}\Xs_{s}$, where $\Xs_{s}$ is a vector of length k. For inference on the coefficient on the jth element of Xi, we proceed as if this was the only shift-share regressor, treating the remaining shifters as part of the controls.
Consider a general setup with eqs. (30) and (31) in the paper replaced by with $\Xs$ and Y2 now both vectors, and Bis has dimensions $\dim(\Xs)\times \dim(Y_{2})$. If $\Xs=Y_{2}$, the setup reduces to that in section . If Y2 is scalar, the setup reduces to that in section . The two-stage least squares estimator of α is given by With scalar Xi and Y2i, this expression reduces to eq. (33) in the paper. Now, Suppose that where $\mathcal{F}_{0}=(Y_{1}(0),Y_{2}(0),W,\Zs,U,B)$. Let δ be the coefficient on Z in the regression of Y1i − Y2i′α onto Zi, and let ϵi = Y1i − Y2i′α − Zi′δ = Y1i(0) − Zi′δ. Then, as in proof of Proposition 4 in the paper, where the second line follows by arguments in that proof. Now, since $\Xs_{s}$ is independent across s conditional on ℱ0, it follows that conditional on ℱ0, where Rs = ∑iwisϵi. This leads to variance formula where R̂s = ∑iwisϵ̂i, $\widehat{\Xs}=(W' W)^{-1}W'\ddot{X}$ as in eq. (36) in the paper, with rows $\Xs_{s}'$, and $\hat{B}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}$ is a matrix of the first-stage coefficients. Here ϵ̂i is an estimate of the structural residual, such as For standard errors, take square root of the appropriate diagonal element.
The AKM0 version is a little tricky here if dim (α) > 1 and we’re only interested in inference on one element of α, say the first: this is analogous to issues with using the Anderson-Rubin test in a setting with multiple endogenous variables.
If we do not require validity under weak instruments, then the analog of the `alternative AKM0’ procedure from the preceding subsection uses the estimate (α10, α̂−1(α10)) in place of α̂ in (), where α10 is the null hypothesized value, and is the estimate of the remaining elements of α with the null H0: α1 = α10 imposed.