Standard Errors in Shift-Share Regressions

Summary

The package ShiftShareSE implements confidence intervals proposed by Adão, Kolesár, and Morales (2019) for inference in shift-share least squares and instrumental variables regressions, in which the regressor of interest (or the instrument) has a shift-share structure, as in Bartik (1991). A shift-share variable has the structure $X_{i}=\sum_{s=1}^{S}w_{is}\Xs_{s}$, where i indexes regions, s indexes sectors, $\Xs_{s}$ are sectoral shifters (or shocks), and wis are shares, such as initial share of region i’s employment in sector s.

This vignette illustrates the use of the package using a dataset from Autor, Dorn, and Hanson (2013) (ADH hereafter). The dataset is included in the package as the list ADH. The first element of the list, ADH$reg is a data-frame with regional variables, the second element, ADH$sic is a vector of SIC codes for the sectors, and ADH$W is a matrix of shares. See ?ADH for a description of the dataset.

Examples

We now replicate column (1) of Table V in Adão, Kolesár, and Morales (2019). First we load the package, define the vector of controls, and define a vector of 3-digit SIC codes:

library("ShiftShareSE")
ctrls <- paste("t2 + l_shind_manuf_cbp + l_sh_popedu_c +",
    "l_sh_popfborn + l_sh_empl_f + l_sh_routine33", " + l_task_outsource + division")
sic <- floor(ADH$sic/10)

We cluster the standard errors at the 3-digit SIC code (using the option sector_cvar), and, following ADH, weight the data using the weights ADH$reg$weights. See ?reg_ss and ?ivreg_ss for full description of the options.

The first-stage regression:

reg_ss(as.formula(paste("shock ~ ", ctrls)), W = ADH$W,
    X = IV, data = ADH$reg, weights = weights, region_cvar = statefip,
    sector_cvar = sic, method = "all")
#> Estimate: 0.6310409
#> 
#> Inference:
#>               Std. Error      p-value  Lower CI  Upper CI
#> Homoscedastic 0.02732516 0.000000e+00 0.5774846 0.6845973
#> EHW           0.08700719 4.083400e-13 0.4605100 0.8015719
#> Reg. cluster  0.09142372 5.113909e-12 0.4518537 0.8102281
#> AKM           0.05296055 0.000000e+00 0.5272402 0.7348417
#> AKM0          0.07671358 1.282891e-03 0.5375710 0.8382827

Note that for "AKM0", "Std. Error" corresponds to the normalized standard error, i.e. the length of the confidence interval divided by 2z1 − α/2.

The reduced-form and IV regressions:

reg_ss(as.formula(paste("d_sh_empl ~", ctrls)), W = ADH$W,
    X = IV, data = ADH$reg, region_cvar = statefip, weights = weights,
    sector_cvar = sic, method = "all")
#> Estimate: -0.4885687
#> 
#> Inference:
#>               Std. Error      p-value   Lower CI   Upper CI
#> Homoscedastic 0.06332778 1.221245e-14 -0.6126889 -0.3644485
#> EHW           0.11244360 1.392685e-05 -0.7089541 -0.2681833
#> Reg. cluster  0.07578147 1.140306e-10 -0.6370977 -0.3400398
#> AKM           0.16419445 2.924641e-03 -0.8103839 -0.1667535
#> AKM0          0.25437489 4.218033e-04 -1.2368853 -0.2397541
ivreg_ss(as.formula(paste("d_sh_empl ~", ctrls, "| shock")),
    W = ADH$W, X = IV, data = ADH$reg, region_cvar = statefip,
    weights = weights, sector_cvar = sic, method = "all")
#> Estimate: -0.7742267
#> 
#> Inference:
#>               Std. Error      p-value   Lower CI   Upper CI
#> Homoscedastic  0.1069532 4.523049e-13 -0.9838511 -0.5646022
#> EHW            0.1647892 2.623532e-06 -1.0972075 -0.4512459
#> Reg. cluster   0.1758096 1.063809e-05 -1.1188071 -0.4296462
#> AKM            0.2403730 1.277718e-03 -1.2453492 -0.3031041
#> AKM0           0.3318966 4.218033e-04 -1.6903240 -0.3893132

Collinear share matrix

Let W denote the share matrix with the (i, s) element given by wis and sth column ws. Suppose that columns of W are collinear, so that it has rank S0 < S. Without loss of generality, suppose that the first S0 columns of the matrix are full rank, so that the collinearity is caused by the last S − S0 sectors. In this case, it is not possible to recover, $\tilde{\Xs}_s$, the sectoral shifters with the controls partialled without further assumptions. The reg_ss and ivreg_ss functions will return a warning message "Share matrix is collinear". To compute the standard errors, the commands implement a default solution to this issue based on aggregating the shocks to the collinear sectors, which we describe in Section below. However, there are other ways of dealing with collinearity in the share matrix, as we describe in below. Depending on the the setting, researchers may wish to instead use one of these alternatives.

Default way of dealing with collinear sectors

We use a QR factorization of W with column pivoting (see Chapter 5.4.2 in Golub and Van Loan (2013)) to drop the collinear columns in W. That is, we decompose W = QRP, where Q is an N × S orthogonal matrix, the matrix R takes the form $R=\bigl(\begin{smallmatrix}R_1 &R_2\\0&0\end{smallmatrix}\bigr)$, where R1 is an S0 × S0 upper triangular matrix, R2 has dimensions S0 × (S − S0), and P is a permutation matrix such that the diagonal elements of R are decreasing. We then drop S0 − S columns of W that correspond to the last S − S0 columns of QR, as indicated by the permutation matrix, obtaining a new share matrix Wnew. Most software implementations of ordinary least squares, including LAPACK used by R, use this algorithm to drop collinear columns of the regressor matrix.

This solution keeps the regional shocks Xi the same, so that the point estimates do not change, while implicitly redefining the sectoral shocks $\Xs_s$. In particular, by definition of collinearity, each column ws of W that we drop can be written as a linear combination of the new share matrix Wnew. We can determine the coefficients γs in this linear combination by regressing ws onto Wnew. Observe that since dropping the collinear columns of W doesn’t change the regional shocks Xi if we implicitly define a new sectoral shock vector $\Xs_{new}$ as Here $\Xs_0$ corresponds to the first S0 entries of the S-vector of shocks $\Xs$.

Note that re-ordering the columns of W will generally result in different columns being dropped, so that the standard errors will generally depend on the order of the sectors.

Other solutions

There are alternative ways of dealing with collinearity, including:

  1. Drop the collinear sectors, defining $X_i=\sum_{s=1}^{S_0}w_{is}\Xs_{s}$, and defining the share matrix W to only have S0 columns, as in the default solution. This effectively puts shocks to the collinear sectors into the residual (which is analogous to letting say the shock to non-manufacturing sectors be part of the residual), and changes the point estimate as well as the estimand.
  2. Aggregate the sectors. For instance, if originally the sectors correspond to 4-digit SIC industries, we may wish to work with 3-digit industries. This solution will change the point estimate, as well as the estimand. Alternatively, we may only aggregate the collinear sectors.
  3. If the only controls are those with shift-share structure, and we have data on $\Zs_{s}$, we can estimate $\tilde{\Xs}_{s}$ by running a sector-level regression of $\Xs_s$ onto $\Zs_s$, and taking the residual. This solution doesn’t affect the point estimate or the definition of the estimand.

Extensions to multiple shifters and multiple endogenous variables

We now discuss how the methods in Adão, Kolesár, and Morales (2019) extend to the case where there are multiple shifters, or, in the case of an IV regression, multiple endogenous variables. Currently, these extensions are not implemented in the package.

OLS

Suppose that we’re interested in the effect of a k-vector of shift-share regressors, $X_i=\sum_{s}w_{is}\Xs_{s}$, where $\Xs_{s}$ is a vector of length k. For inference on the coefficient on the jth element of Xi, we proceed as if this was the only shift-share regressor, treating the remaining shifters as part of the controls.

IV with a single endogenous regressor and multiple shift-share instruments

Now suppose that the k-vector Xi defined in section is a k-vector of instruments. Let X denote the $\N\times k$ matrix with rows given by Xi. Consider the setup in Section IV.C of Adão, Kolesár, and Morales (2019), with the first-stage coefficients βis in eq. (31) now a k-vector, and α being the scalar treatment effect of Y2 on Y1 as in eq. (30). Letting $\ddot{X}=X-Z(Z' Z)^{-1}Z' X$ denote the $\N\times k$ matrix of instruments with the covariates partialled out, the two-stage least squares estimator is given by where $\hat{\beta}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}$ is a k-vector of first-stage coefficients.

Thus,

Now, letting Y1(0) = Zδ + ϵ, we have, as in the proof of Proposition 4 in the paper, Thus, using arguments in Proposition 4 in the paper, we obtain the infeasible standard error formula where $\tilde{\Xs}_{s}$ is a (vector) residual from the population regression of the vector $\Xs_{s}$ onto the controls.

This suggests the feasible standard error formula where $\widehat{\Xs}=(W' W)^{-1}W'\ddot{X}$ are the regression coefficients from the regression of $\ddot{X}$ onto W (as in Remark 6, except now a $\widehat{\Xs}$ is an S × k matrix), and ϵ̂i are estimates of structural residual. For AKM, ϵ̂ = Y1 − Y2α̂ − Z(ZZ)−1Z′(Y1 − Y2α̂).

For AKM0, the construction is more complicated. Let $\hat{\gamma}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{1}$ denote the reduced-form coefficient. Let s, α0 = ∑iwisϵ̂α0, where ϵ̂α0 = (I − Z(ZZ)−1Z′)(Y1 − Y2α0). Then will be distributed χk2 in large samples, because $(\ddot{X}'\ddot{X})^{-1}\sum_{s}\widehat{\Xs}_{s} \widehat{\Xs}_{s}'\hat{R}_{s,\alpha_{0}}^{2} (\ddot{X}'\ddot{X})^{-1}$ consistently estimates the asymptotic variance of γ̂ − β̂α0 under the null. Therefore, we reject the null H0: α = α0 if Q(α0) > χk, 1 − α2, where χk, 1 − α2 is the 1 − α quantile of a χk2. A confidence set is collected by all nulls that are not rejected, Note that (i) unlike the case with a single instrument (Remark 6, step (iv)), there is not a closed form solution to the confidence set anymore: one needs to do a grid search over the real line, collecting all values of α for which the test doesn’t reject, and (ii) the confidence set will be valid even if the instruments are weak; however, if the instruments are strong, the AKM0 test is less powerful than the AKM test, and consequently the AKM0 confidence set will tend to be bigger than the AKM confidence interval.

Not that properties (i) and (ii) are inherited from the properties of the heteroskedasticity-robust version of the Anderson-Rubin test when there is more than one instrument (see, for example, Section 5.1 in Andrews, Stock, and Sun (2019) for a discussion). The AKM0 method adapts this test to the current setting with shift-share instruments, inheriting these properties.

If we do not require validity under weak instruments, we can also use a different version of AKM0, namely computing the confidence set as This form of the confidence can be thought of as the analog to the Lagrange multiplier confidence set in likelihood models, rather than the analog of the Anderson-Rubin test. In the case with a single instrument, these concepts coincide, but they are different in general. In this case, the inequality defining the set is just a quadratic inequality in α, and we can solve it explicitly as in Remark 6 in the paper to obtain a closed-form solution. If the instruments are strong, it will take the form of an interval.

IV with multiple endogenous variables

Consider a general setup with eqs. (30) and (31) in the paper replaced by with $\Xs$ and Y2 now both vectors, and Bis has dimensions $\dim(\Xs)\times \dim(Y_{2})$. If $\Xs=Y_{2}$, the setup reduces to that in section . If Y2 is scalar, the setup reduces to that in section . The two-stage least squares estimator of α is given by With scalar Xi and Y2i, this expression reduces to eq. (33) in the paper. Now, Suppose that where $\mathcal{F}_{0}=(Y_{1}(0),Y_{2}(0),W,\Zs,U,B)$. Let δ be the coefficient on Z in the regression of Y1i − Y2iα onto Zi, and let ϵi = Y1i − Y2iα − Ziδ = Y1i(0) − Ziδ. Then, as in proof of Proposition 4 in the paper, where the second line follows by arguments in that proof. Now, since $\Xs_{s}$ is independent across s conditional on 0, it follows that conditional on 0, where Rs = ∑iwisϵi. This leads to variance formula where s = ∑iwisϵ̂i, $\widehat{\Xs}=(W' W)^{-1}W'\ddot{X}$ as in eq. (36) in the paper, with rows $\Xs_{s}'$, and $\hat{B}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}$ is a matrix of the first-stage coefficients. Here ϵ̂i is an estimate of the structural residual, such as For standard errors, take square root of the appropriate diagonal element.

The AKM0 version is a little tricky here if dim (α) > 1 and we’re only interested in inference on one element of α, say the first: this is analogous to issues with using the Anderson-Rubin test in a setting with multiple endogenous variables.

If we do not require validity under weak instruments, then the analog of the `alternative AKM0’ procedure from the preceding subsection uses the estimate (α10, α̂−1(α10)) in place of α̂ in (), where α10 is the null hypothesized value, and is the estimate of the remaining elements of α with the null H0: α1 = α10 imposed.

References

Adão, Rodrigo, Michal Kolesár, and Eduardo Morales. 2019. “Inference in Shift-Share Designs: Theory and Inference.” Quarterly Journal of Economics 134 (4): 1949–2010. https://doi.org/10.1093/qje/qjz025.
Andrews, Isaiah, James H. Stock, and Liyang Sun. 2019. “Weak Instruments in Instrumental Variables Regression: Theory and Practice.” Annual Review of Economics 11 (1): 727–53. https://doi.org/10.1146/annurev-economics-080218-025643.
Autor, David H., David Dorn, and Gordon H. Hanson. 2013. “The China Syndrome: Local Labor Market Effects of Import Competition in the United States.” American Economic Review 103 (6): 2121–68. https://doi.org/10.1257/aer.103.6.2121.
Bartik, Timothy J. 1991. Who Benefits from State and Local Economic Development Policies? Kalamazoo, MI: W.E. Upjohn Institute for Employment Research.
Golub, Gene H., and Charles F. Van Loan. 2013. Matrix Computations. Fourth. Johns Hopkins Studies in the Mathematical Sciences. Baltimore, MD: The Johns Hopkins University Press.