---
output:
  pdf_document:
    citation_package: natbib
    latex_engine: pdflatex
    toc: true
    number_sections: true
    keep_tex: true
    includes:
      in_header: "preamble.tex"
title: "Standard Errors in Shift-Share Regressions"
author: "Michal Kolesár"
date: "`r format(Sys.time(), '%B %d, %Y')`"
geometry: margin=1in
fontfamily: mathpazo
bibliography: library.bib
linkcolor: red
urlcolor: red
fontsize: 11pt
vignette: >
  %\VignetteIndexEntry{ShiftShareSE}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include=FALSE, cache=FALSE}
library("knitr")
knitr::opts_knit$set(self.contained = FALSE)
knitr::opts_chunk$set(tidy = TRUE, collapse=TRUE, comment = "#>",
                      tidy.opts=list(blank=FALSE, width.cutoff=55))
```

# Summary

The package `ShiftShareSE` implements confidence intervals proposed by @akm18 for
inference in shift-share least squares and instrumental variables regressions,
in which the regressor of interest (or the instrument) has a shift-share
structure, as in @bartik91. A shift-share variable has the structure
$X_{i}=\sum_{s=1}^{S}w_{is}\Xs_{s}$, where $i$ indexes regions, $s$
indexes sectors, $\Xs_{s}$ are sectoral shifters (or shocks), and
$w_{is}$ are shares, such as initial share of region $i$'s employment in sector
$s$.

This vignette illustrates the use of the package using a dataset from @adh13
(ADH hereafter). The dataset is included in the package as the list `ADH`. The
first element of the list, `ADH$reg` is a data-frame with regional variables,
the second element, `ADH$sic` is a vector of SIC codes for the sectors, and
`ADH$W` is a matrix of shares. See `?ADH` for a description of the dataset.

# Examples

We now replicate column (1) of Table V in @akm18. First we load the package,
define the vector of controls, and define a vector of 3-digit SIC codes:
```{r}
library("ShiftShareSE")
ctrls <- paste("t2 + l_shind_manuf_cbp + l_sh_popedu_c +",
               "l_sh_popfborn + l_sh_empl_f + l_sh_routine33",
               " + l_task_outsource + division")
sic <- floor(ADH$sic/10)
```

We cluster the standard errors at the 3-digit SIC code (using the option
`sector_cvar`), and, following ADH, weight the data using the weights
`ADH$reg$weights`. See `?reg_ss` and `?ivreg_ss` for full description of the
options.

The first-stage regression:
```{r}
reg_ss(as.formula(paste("shock ~ ", ctrls)), W=ADH$W, X=IV,
       data=ADH$reg, weights=weights, region_cvar=statefip,
       sector_cvar=sic, method="all")
```

Note that for `"AKM0"`, `"Std. Error"` corresponds to the normalized standard
error, i.e. the length of the confidence interval divided by
$2z_{1-\alpha/2}$.


The reduced-form and IV regressions:
```{r}
reg_ss(as.formula(paste("d_sh_empl ~", ctrls)), W=ADH$W, X=IV,
       data=ADH$reg, region_cvar=statefip, weights=weights,
       sector_cvar=sic, method="all")
ivreg_ss(as.formula(paste("d_sh_empl ~", ctrls, "| shock")), W=ADH$W,
         X=IV, data=ADH$reg, region_cvar=statefip,
         weights=weights, sector_cvar=sic, method="all")
```

# Collinear share matrix

Let $W$ denote the share matrix with the $(i,s)$ element given by $w_{is}$ and
$s$th column $w_s$. Suppose that columns of $W$ are collinear, so that it has
rank $S_{0}<S$. Without loss of generality, suppose that the first $S_0$ columns
of the matrix are full rank, so that the collinearity is caused by the last
$S-S_0$ sectors. In this case, it is not possible to recover, $\tilde{\Xs}_s$,
the sectoral shifters with the controls partialled without further assumptions.
The `reg_ss` and `ivreg_ss` functions will return a warning message `"Share
matrix is collinear"`. To compute the standard errors, the commands implement a
default solution to this issue based on aggregating the shocks to the collinear
sectors, which we describe in Section \ref{default_collinear} below. However,
there are other ways of dealing with collinearity in the share matrix, as we
describe in \ref{other_collinear} below. Depending on the the setting,
researchers may wish to instead use one of these alternatives.

## Default way of dealing with collinear sectors
\label{default_collinear}

We use a QR factorization of $W$ with column pivoting (see Chapter 5.4.2 in
@gv13) to drop the collinear columns in $W$. That is, we decompose $W=Q R P'$,
where $Q$ is an $N\times S$ orthogonal matrix, the matrix $R$ takes the form
$R=\bigl(\begin{smallmatrix}R_1 &R_2\\0&0\end{smallmatrix}\bigr)$, where $R_1$
is an $S_0\times S_0$ upper triangular matrix, $R_2$ has dimensions $S_0\times
(S-S_0)$, and $P$ is a permutation matrix such that the diagonal elements of $R$
are decreasing. We then drop $S_0-S$ columns of $W$ that correspond to the last
$S-S_0$ columns of $QR$, as indicated by the permutation matrix, obtaining a new
share matrix $W_{new}$. Most software implementations of ordinary least squares,
including [LAPACK](https://www.netlib.org/lapack/lug/node42.html) used by `R`,
use this algorithm to drop collinear columns of the regressor matrix.

This solution keeps the regional shocks $X_i$ the same, so that the point
estimates do not change, while implicitly redefining the sectoral shocks
$\Xs_s$. In particular, by definition of collinearity, each column $w_s$ of $W$
that we drop can be written as a linear combination of the new share matrix
$W_{new}$. We can determine the coefficients $\gamma_s$ in this linear combination
by regressing $w_s$ onto $W_{new}$. Observe that since \begin{equation*}
X=W\Xs=W_{new}\Xs_{0}+\sum_{s=S_{0}+1}^{S}(W_{0}\gamma_{s})X_{s}
=W_{new}\left[\Xs_{0}+\sum_{s=S_{0}+1}^{S}\gamma_{s}X_{s}\right]=W_{new}\Xs_{new},
\end{equation*}
dropping the collinear columns of $W$ doesn't change the regional shocks $X_i$ if we implicitly define a new sectoral shock vector
$\Xs_{new}$ as \begin{equation*} \Xs_{new}=\Xs_0+\sum_{s=S_0+1}^S\gamma_s x_s.
\end{equation*} Here $\Xs_0$ corresponds to the first $S_0$ entries of the
$S$-vector of shocks $\Xs$.

Note that re-ordering the columns of $W$ will generally result in different
columns being dropped, so that the standard errors will generally depend on the
order of the sectors.

## Other solutions
\label{other_collinear}

There are alternative ways of dealing with collinearity, including:

1. Drop the collinear sectors, defining $X_i=\sum_{s=1}^{S_0}w_{is}\Xs_{s}$, and
   defining the share matrix $W$ to only have $S_0$ columns, as in the default
   solution. This effectively puts shocks to the collinear sectors into the
   residual (which is analogous to letting say the shock to non-manufacturing
   sectors be part of the residual), and changes the point estimate as well as
   the estimand.
2. Aggregate the sectors. For instance, if originally the sectors correspond to
   4-digit SIC industries, we may wish to work with 3-digit industries. This
   solution will change the point estimate, as well as the estimand.
   Alternatively, we may only aggregate the collinear sectors.
3. If the only controls are those with shift-share structure, and we have data
   on $\Zs_{s}$, we can estimate $\tilde{\Xs}_{s}$ by running a sector-level
   regression of $\Xs_s$ onto $\Zs_s$, and taking the residual. This solution
   doesn't affect the point estimate or the definition of the estimand.

# Extensions to multiple shifters and multiple endogenous variables

We now discuss how the methods in @akm18 extend to the case where there are
multiple shifters, or, in the case of an IV regression, multiple endogenous
variables. Currently, these extensions are not implemented in the package.

## OLS
\label{least_squares}

Suppose that we're interested in the effect of a $k$-vector of shift-share
regressors, $X_i=\sum_{s}w_{is}\Xs_{s}$, where $\Xs_{s}$ is a vector
of length $k$. For inference on the coefficient on the $j$th element of
$X_{i}$, we proceed as if this was the only shift-share regressor, treating the
remaining shifters as part of the controls.

## IV with a single endogenous regressor and multiple shift-share instruments
\label{instruments}

Now suppose that the $k$-vector $X_i$ defined in section \ref{least_squares} is a
$k$-vector of instruments. Let $X$ denote the $\N\times k$ matrix with rows
given by $X_i'$. Consider the setup in Section IV.C of @akm18, with the
first-stage coefficients $\beta_{is}$ in eq. (31) now a $k$-vector, and $\alpha$
being the scalar treatment effect of $Y_{2}$ on $Y_{1}$ as in eq. (30). Letting
$\ddot{X}=X-Z(Z' Z)^{-1}Z' X$ denote the $\N\times k$ matrix of instruments with
the covariates partialled out, the two-stage least squares estimator is given by
\begin{equation*}
\hat{\alpha}=\frac{{Y}_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{1}}{
{Y}_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}}=\frac{\hat{\beta}'\ddot{X}'Y_{1}}{
\hat{\beta}'\ddot{X}'\ddot{X}\hat{\beta}, }
\end{equation*}
where
$\hat{\beta}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}$ is a $k$-vector of
first-stage coefficients.

Thus,
\begin{equation*}
  \hat{\alpha}-\alpha=\frac{{Y}_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{1}(0)}{
    {Y}_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}}.
\end{equation*}

Now, letting $Y_{1}(0)=Z'\delta+\epsilon$, we have, as in the proof of
Proposition 4 in the paper,
\begin{equation*}
  r_{\N}^{1/2}\ddot{X}'Y_{1}(0)=
  r_{\N}^{1/2}{X}'(I-Z(Z' Z)^{-1}Z)\epsilon=r_{\N}^{1/2}\tilde{\Xs}'W'\epsilon+o_{p}(1).
\end{equation*}
Thus, using arguments in Proposition 4 in the paper, we obtain the infeasible
standard error formula
\begin{equation*}
  \se(\hat{\alpha})=  \frac{
    \sqrt{\sum_{s}(\hat{\beta}'\tilde{\Xs}_{s})^{2}R_{s}^{2}}
  }{\hat{\beta}'\ddot{X}'\ddot{X}\hat{\beta}},\qquad R_{s}=\sum_{i}w_{is}\epsilon_{i},
\end{equation*}
where $\tilde{\Xs}_{s}$ is a (vector) residual from the population regression of
the vector $\Xs_{s}$ onto the controls.

This suggests the feasible standard
error formula
\begin{equation*}
  \widehat{\se}(\hat{\alpha})=  \frac{
    \sqrt{\sum_{s}(\hat{\beta}'\widehat{\Xs}_{s})^{2}\hat{R}_{s}^{2}}
  }{\hat{\beta}'\ddot{X}'\ddot{X}\hat{\beta}},\qquad \hat{R}_{s}=\sum_{i}w_{is}\hat{\epsilon}_{i},
\end{equation*}
where $\widehat{\Xs}=(W' W)^{-1}W'\ddot{X}$ are the regression coefficients from
the regression of $\ddot{X}$ onto $W$ (as in Remark 6, except now a
$\widehat{\Xs}$ is an $S\times k$ matrix), and $\hat{\epsilon}_{i}$ are
estimates of structural residual. For AKM,
$\hat{\epsilon}=Y_{1}-Y_{2}\hat{\alpha}-Z(Z' Z)^{-1}Z'(Y_{1}-Y_{2}\hat{\alpha})$.

For AKM0, the construction is more complicated. Let
$\hat{\gamma}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{1}$ denote the reduced-form
coefficient. Let $\hat{R}_{s,\alpha_{0}}=\sum_{i}w_{is}\hat{\epsilon}_{\alpha_{0}}$, where
$\hat{\epsilon}_{\alpha_{0}}=(I-Z(Z' Z)^{-1}Z')(Y_{1}-Y_{2}\alpha_{0})$. Then
\begin{equation*}
  Q(\alpha_{0})=  (\hat{\gamma}-\hat{\beta}\alpha_{0})'
  (\ddot{X}'\ddot{X})\left(\sum_{s}\widehat{\Xs}_{s}
    \widehat{\Xs}_{s}'\hat{R}_{s,\alpha_{0}}^{2}\right)^{-1}
  (\ddot{X}'\ddot{X})
  (\hat{\gamma}-\hat{\beta}\alpha_{0})
\end{equation*}
will be distributed $\chi^{2}_{k}$ in large samples, because
$(\ddot{X}'\ddot{X})^{-1}\sum_{s}\widehat{\Xs}_{s}
\widehat{\Xs}_{s}'\hat{R}_{s,\alpha_{0}}^{2} (\ddot{X}'\ddot{X})^{-1}$
consistently estimates the asymptotic variance of
$\hat{\gamma}-\hat{\beta}\alpha_{0}$ under the null. Therefore, we reject the
null $H_{0}\colon\alpha=\alpha_{0}$ if $Q(\alpha_{0})>\chi^{2}_{k, 1-\alpha}$,
where $\chi^{2}_{k, 1-\alpha}$ is the $1-\alpha$ quantile of a $\chi^{2}_{k}$. A
confidence set is collected by all nulls that are not rejected,
\begin{equation*}
\text{AKM0 confidence set}=  \{\alpha\in\mathbb{R}\colon
  Q(\alpha)\leq \chi^{2}_{k,1-\alpha}
  \},
\end{equation*}
Note that (i) unlike the case with a single instrument (Remark 6, step (iv)),
there is not a closed form solution to the confidence set anymore: one needs to
do a grid search over the real line, collecting all values of $\alpha$ for which
the test doesn't reject, and (ii) the confidence set will be valid even if the
instruments are weak; however, if the instruments are strong, the AKM0 test is
less powerful than the AKM test, and consequently the AKM0 confidence set will
tend to be bigger than the AKM confidence interval.

Not that properties (i) and (ii) are inherited from the properties of the
heteroskedasticity-robust version of the Anderson-Rubin test when there is more
than one instrument (see, for example, Section 5.1 in @ass19 for a
discussion). The AKM0 method adapts this test to the current setting
with shift-share instruments, inheriting these properties.

If we do not require validity under weak instruments, we can also use a
different version of AKM0, namely computing the confidence set as
\begin{equation*}
  \text{Alternative AKM0 confidence set}=  \left\{\alpha\in\mathbb{R}\colon
  \frac{(\hat{\alpha}-\alpha)^{2}}{
    \frac{
      \sum_{s}(\hat{\beta}'\widehat{\Xs}_{s})^{2}\hat{R}_{s,\alpha}^{2}
    }{(\hat{\beta}'\ddot{X}'\ddot{X}\hat{\beta})^{2}}}  \leq z_{1-\alpha/2}^{2}
  \right\}.
\end{equation*}
This form of the confidence can be thought of as the analog to the Lagrange
multiplier confidence set in likelihood models, rather than the analog of the
Anderson-Rubin test. In the case with a single instrument, these concepts
coincide, but they are different in general. In this case, the inequality
defining the set is just a quadratic inequality in $\alpha$, and we can solve it
explicitly as in Remark 6 in the paper to obtain a closed-form solution. If the
instruments are strong, it will take the form of an interval.

## IV with multiple endogenous variables

Consider a general setup with eqs. (30) and (31) in the paper replaced by
\begin{equation*}
  Y_{1i}(y_{2})=Y_{1i}(0)+y_{2}'\alpha\qquad Y_{2i}(\xs_{1},\dotsc,\xs_{S})=
  Y_{2i}(0)+\sum_{s}w_{is}B_{is}'\xs_{s}
\end{equation*}
with $\Xs$ and $Y_{2}$ now both vectors, and $B_{is}$ has dimensions
$\dim(\Xs)\times \dim(Y_{2})$. If $\Xs=Y_{2}$, the setup reduces to that
in section \ref{least_squares}. If $Y_{2}$ is scalar, the setup reduces to that in
section \ref{instruments}. The two-stage least squares estimator of $\alpha$ is given by
\begin{equation*}
  \hat{\alpha}=(Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1}Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{1}.
\end{equation*}
With scalar $X_{i}$ and $Y_{2i}$, this expression reduces to eq. (33) in the
paper. Now,
\begin{equation*}
  \hat{\alpha}-\alpha=
  (Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1}
  Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\cdot
  \ddot{X}'(Y_{1}-Y_{2}\alpha)
\end{equation*}
Suppose that
\begin{equation*}
  E[\Xs_{s}\mid \mathcal{F}_{0}]=\Gamma'\Zs_{s},
\end{equation*}
where $\mathcal{F}_{0}=(Y_{1}(0),Y_{2}(0),W,\Zs,U,B)$. Let $\delta$ be the
coefficient on $Z$ in the regression of $Y_{1i}-Y_{2i}'\alpha$ onto $Z_{i}$, and
let $\epsilon_{i}=Y_{1i}-Y_{2i}'\alpha-Z_{i}'\delta=Y_{1i}(0)-Z_{i}'\delta$.
Then, as in proof of Proposition 4 in the paper,
\begin{equation*}
  \begin{split}
    r_{N}^{1/2}\ddot{X}'(Y_{1}-Y_{2}\alpha) &
    =r_{N}^{1/2}\ddot{X}'(Z\delta+\epsilon)
    =r_{N}^{1/2}\tilde{\Xs}'W'\epsilon+r_{N}^{1/2}\Gamma' U'\epsilon
    -r_{N}^{1/2}\epsilon' Z(\hat{\Gamma}-\Gamma),\\
    &=r_{N}^{1/2}\tilde{\Xs}'W'\epsilon+o_{p}(1),
  \end{split}
\end{equation*}
where the second line follows by arguments in that proof. Now, since $\Xs_{s}$
is independent across $s$ conditional on $\mathcal{F}_{0}$, it follows that
conditional on $\mathcal{F}_{0}$,
\begin{equation*}
  r_{N}^{1/2}\tilde{\Xs}'W'\epsilon
  =r_{N}^{1/2}\sum_{s}\tilde{\Xs}_{s}R_{s}
  =\mathcal{N}(0,\sum_{s}R^{2}_{s} E[\tilde{\Xs}_{s}\tilde{\Xs}_{s}'\mid \mathcal{F}_{0}])+o_{p}(1),
\end{equation*}
where $R_{s}=\sum_{i}w_{is}\epsilon_{i}$. This leads to variance formula
\begin{equation*}
  \begin{split}
    \widehat{\var}(\hat{\alpha})&=
    (Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1}Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\cdot
    \sum_{s}\hat{R}^{2}_{s} \widehat{\Xs}_{s}\widehat{\Xs}_{s}' \cdot
    (\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}(Y_{2}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2})^{-1}\\
    &=    (\hat{B}'\ddot{X}'\ddot{X}\hat{B})^{-1}\cdot
    \sum_{s}\hat{R}^{2}_{s} \hat{B}'\widehat{\Xs}_{s}\widehat{\Xs}_{s}'\hat{B} \cdot
   (\hat{B}'\ddot{X}'\ddot{X}\hat{B})^{-1},
  \end{split}
\end{equation*}
where $\hat{R}_{s}=\sum_{i}w_{is}\hat{\epsilon}_{i}$,
$\widehat{\Xs}=(W' W)^{-1}W'\ddot{X}$ as in eq. (36) in the paper, with rows
$\Xs_{s}'$, and $\hat{B}=(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2}$ is a matrix of
the first-stage coefficients. Here $\hat{\epsilon}_{i}$ is an estimate of the
structural residual, such as
\begin{equation}\label{eq:hat-epsilon}
\hat{\epsilon}=(I-Z(Z' Z)^{-1}Z')(Y_{1}-Y_{21}'\hat{\alpha})
\end{equation}
For standard errors, take square root of the appropriate diagonal element.

The AKM0 version is a little tricky here if $\dim(\alpha)>1$ and we're only
interested in inference on one element of $\alpha$, say the first: this is
analogous to issues with using the Anderson-Rubin test in a setting with
multiple endogenous variables.

If we do not require validity under weak instruments, then the analog of the
`alternative AKM0' procedure from the preceding subsection uses the estimate
$(\alpha_{10}, \hat{\alpha}_{-1}(\alpha_{10}))$ in place of $\hat{\alpha}$ in
(\ref{eq:hat-epsilon}), where $\alpha_{10}$ is the null hypothesized value, and
\begin{equation*}
  \hat{\alpha}_{-1}(\alpha_{10})=(Y_{2,-1}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'Y_{2,-1})^{-1}
  Y_{2,-1}'\ddot{X}(\ddot{X}'\ddot{X})^{-1}\ddot{X}'(Y_{1}-Y_{2,1}\alpha_{10}).
\end{equation*}
is the estimate of the remaining elements of $\alpha$ with the null $H_{0}\colon
\alpha_{1}=\alpha_{10}$ imposed.


# References