The package ShiftShareSE implements confidence intervals
proposed by Adão et al. (2019) for
inference in shift-share least squares and instrumental variables
regressions, in which the regressor of interest (or the instrument) has
a shift-share structure, as in Bartik
(1991). A shift-share variable has the structure \(X_{i}=\sum_{s=1}^{S}w_{is}\Xs_{s}\), where
\(i\) indexes regions, \(s\) indexes sectors, \(\Xs_{s}\) are sectoral shifters (or
shocks), and \(w_{is}\) are shares,
such as initial share of region \(i\)’s
employment in sector \(s\).
This vignette illustrates the use of the package using a dataset from
Autor et al. (2013) (ADH hereafter). The
dataset is included in the package as the list ADH. The
first element of the list, ADH$reg is a data-frame with
regional variables, the second element, ADH$sic is a vector
of SIC codes for the sectors, and ADH$W is a matrix of
shares. See ?ADH for a description of the dataset.
We now replicate column (1) of Table V in Adão et al. (2019). First we load the package, define the vector of controls, and define a vector of 3-digit SIC codes:
library("ShiftShareSE")
ctrls <- paste("t2 + l_shind_manuf_cbp + l_sh_popedu_c +",
"l_sh_popfborn + l_sh_empl_f + l_sh_routine33", " + l_task_outsource + division")
sic <- floor(ADH$sic/10)We cluster the standard errors at the 3-digit SIC code (using the
option sector_cvar), and, following ADH, weight the data
using the weights ADH$reg$weights. See ?reg_ss
and ?ivreg_ss for full description of the options.
The first-stage regression:
reg_ss(as.formula(paste("shock ~ ", ctrls)), W = ADH$W,
X = IV, data = ADH$reg, weights = weights, region_cvar = statefip,
sector_cvar = sic, method = "all")
#> Estimate: 0.6310409
#>
#> Inference:
#> Std. Error p-value Lower CI Upper CI
#> Homoscedastic 0.02732516 0.000000e+00 0.5774846 0.6845973
#> EHW 0.08700719 4.083400e-13 0.4605100 0.8015719
#> Reg. cluster 0.09142372 5.113909e-12 0.4518537 0.8102281
#> AKM 0.05296055 0.000000e+00 0.5272402 0.7348417
#> AKM0 0.07671358 1.282891e-03 0.5375710 0.8382827Note that for "AKM0", "Std. Error"
corresponds to the normalized standard error, i.e. the length of the
confidence interval divided by \(2z_{1-\alpha/2}\).
The reduced-form and IV regressions:
reg_ss(as.formula(paste("d_sh_empl ~", ctrls)), W = ADH$W,
X = IV, data = ADH$reg, region_cvar = statefip, weights = weights,
sector_cvar = sic, method = "all")
#> Estimate: -0.4885687
#>
#> Inference:
#> Std. Error p-value Lower CI Upper CI
#> Homoscedastic 0.06332778 1.221245e-14 -0.6126889 -0.3644485
#> EHW 0.11244360 1.392685e-05 -0.7089541 -0.2681833
#> Reg. cluster 0.07578147 1.140306e-10 -0.6370977 -0.3400398
#> AKM 0.16419445 2.924641e-03 -0.8103839 -0.1667535
#> AKM0 0.25437489 4.218033e-04 -1.2368853 -0.2397541
ivreg_ss(as.formula(paste("d_sh_empl ~", ctrls, "| shock")),
W = ADH$W, X = IV, data = ADH$reg, region_cvar = statefip,
weights = weights, sector_cvar = sic, method = "all")
#> Estimate: -0.7742267
#>
#> Inference:
#> Std. Error p-value Lower CI Upper CI
#> Homoscedastic 0.1069532 4.523049e-13 -0.9838511 -0.5646022
#> EHW 0.1647892 2.623532e-06 -1.0972075 -0.4512459
#> Reg. cluster 0.1758096 1.063809e-05 -1.1188071 -0.4296462
#> AKM 0.2403730 1.277718e-03 -1.2453492 -0.3031041
#> AKM0 0.3318966 4.218033e-04 -1.6903240 -0.3893132We now discuss how the methods in Adão et al. (2019) extend to the case where there are multiple shifters, or, in the case of an IV regression, multiple endogenous variables. Currently, these extensions are not implemented in the package.
Suppose that we’re interested in the effect of a \(k\)-vector of shift-share regressors, \(X_i=\sum_{s}w_{is}\Xs_{s}\), where \(\Xs_{s}\) is a vector of length \(k\). For inference on the coefficient on the \(j\)th element of \(X_{i}\), we proceed as if this was the only shift-share regressor, treating the remaining shifters as part of the controls.
Consider a general setup with eqs. (30) and (31) in the paper replaced by \[\begin{equation*} Y_{1i}(y_{2})=Y_{1i}(0)+y_{2}'\alpha\qquad Y_{2i}(\xs_{1},\dotsc,\xs_{S})= Y_{2i}(0)+\sum_{s}w_{is}B_{is}'\xs_{s} \end{equation*}\] with \(\Xs\) and \(Y_{2}\) now both vectors, and \(B_{is}\) has dimensions \(\dim(\Xs)\times \dim(Y_{2})\). If \(\Xs=Y_{2}\), the setup reduces to that in section \(\ref{least_squares}\). If \(Y_{2}\) is scalar, the setup reduces to that in section \(\ref{instruments}\). The two-stage least squares estimator of \(\alpha\) is given by \[\begin{equation*} \hat{\alpha}=(Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1}Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{1}. \end{equation*}\] With scalar \(X_{i}\) and \(Y_{2i}\), this expression reduces to eq. (33) in the paper. Now, \[\begin{equation*} \hat{\alpha}-\alpha= (Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1} Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\cdot \ddot{X}'(Y_{1}-Y_{2}\alpha) \end{equation*}\] Suppose that \[\begin{equation*} E[\Xs_{s}\mid \mathcal{F}_{0}]=\Gamma'\Zs_{s}, \end{equation*}\] where \(\mathcal{F}_{0}=(Y_{1}(0),Y_{2}(0),W,\Zs,U,B)\). Let \(\delta\) be the coefficient on \(Z\) in the regression of \(Y_{1i}-Y_{2i}'\alpha\) onto \(Z_{i}\), and let \(\epsilon_{i}=Y_{1i}-Y_{2i}'\alpha-Z_{i}'\delta=Y_{1i}(0)-Z_{i}'\delta\). Then, as in proof of Proposition 4 in the paper, \[\begin{equation*} \begin{split} r_{N}^{1/2}\ddot{X}'(Y_{1}-Y_{2}\alpha) & =r_{N}^{1/2}\ddot{X}'(Z\delta+\epsilon) =r_{N}^{1/2}\tilde{\Xs}'W'\epsilon+r_{N}^{1/2}\Gamma' U'\epsilon -r_{N}^{1/2}\epsilon' Z(\hat{\Gamma}-\Gamma),\\ &=r_{N}^{1/2}\tilde{\Xs}'W'\epsilon+o_{p}(1), \end{split} \end{equation*}\] where the second line follows by arguments in that proof. Now, since \(\Xs_{s}\) is independent across \(s\) conditional on \(\mathcal{F}_{0}\), it follows that conditional on \(\mathcal{F}_{0}\), \[\begin{equation*} r_{N}^{1/2}\tilde{\Xs}'W'\epsilon =r_{N}^{1/2}\sum_{s}\tilde{\Xs}_{s}R_{s} =\mathcal{N}(0,\sum_{s}R^{2}_{s} E[\tilde{\Xs}_{s}\tilde{\Xs}_{s}'\mid \mathcal{F}_{0}])+o_{p}(1), \end{equation*}\] where \(R_{s}=\sum_{i}w_{is}\epsilon_{i}\). This leads to variance formula \[\begin{equation*} \begin{split} \widehat{\var}(\hat{\alpha})&= (Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1}Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\cdot \sum_{s}\hat{R}^{2}_{s} \widehat{\Xs}_{s}\widehat{\Xs}_{s}' \cdot (\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}(Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1}\\ &= (\hat{B}'\ddot{X}'\ddot{X}\hat{B})^{-1}\cdot \sum_{s}\hat{R}^{2}_{s} \hat{B}'\widehat{\Xs}_{s}\widehat{\Xs}_{s}'\hat{B} \cdot (\hat{B}'\ddot{X}'\ddot{X}\hat{B})^{-1}, \end{split} \end{equation*}\] where \(\hat{R}_{s}=\sum_{i}w_{is}\hat{\epsilon}_{i}\), \(\widehat{\Xs}=(W' W)^{-1}W'\ddot{X}\) as in eq. (36) in the paper, with rows \(\Xs_{s}'\), and \(\hat{B}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}\) is a matrix of the first-stage coefficients. Here \(\hat{\epsilon}_{i}\) is an estimate of the structural residual, such as \[\begin{equation}\label{eq:hat-epsilon} \hat{\epsilon}=(I-Z(Z' Z)^{-1}Z')(Y_{1}-Y_{21}'\hat{\alpha}) \end{equation}\] For standard errors, take square root of the appropriate diagonal element.
The AKM0 version is a little tricky here if \(\dim(\alpha)>1\) and we’re only interested in inference on one element of \(\alpha\), say the first: this is analogous to issues with using the Anderson-Rubin test in a setting with multiple endogenous variables.
If we do not require validity under weak instruments, then the analog of the `alternative AKM0’ procedure from the preceding subsection uses the estimate \((\alpha_{10}, \hat{\alpha}_{-1}(\alpha_{10}))\) in place of \(\hat{\alpha}\) in (\(\ref{eq:hat-epsilon}\)), where \(\alpha_{10}\) is the null hypothesized value, and \[\begin{equation*} \hat{\alpha}_{-1}(\alpha_{10})=(Y_{2,-1}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2,-1})^{-1} Y_{2,-1}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'(Y_{1}-Y_{2,1}\alpha_{10}). \end{equation*}\] is the estimate of the remaining elements of \(\alpha\) with the null \(H_{0}\colon \alpha_{1}=\alpha_{10}\) imposed.