This package implements small-sample degrees of freedom adjustments to robust and cluster-robust standard errors in linear regression, as discussed in Imbens and Kolesár (2016). The implementation can handle models with fixed effects, and cases with a large number of observations or clusters
To give some examples, let us construct an artificial dataset with 11 clusters
set.seed(7)
d1 <- data.frame(y = rnorm(1000), x1 = c(rep(1, 3), rep(0,
997)), x2 = c(rep(1, 150), rep(0, 850)), x3 = rnorm(1000),
cl = as.factor(c(rep(1:10, each = 50), rep(11, 500))))
Let us first run a regression of y
on x1
.
This is a case in which, in spite of moderate data size, the effective
number of observations is small since there are only three treated
units:
r1 <- lm(y ~ x1, data = d1)
## No clustering
dfadjustSE(r1)
#>
#> Coefficients:
#> Estimate HC1 se HC2 se Adj. se df p-value
#> (Intercept) 0.00266 0.0311 0.031 0.0311 996.00 0.932
#> x1 0.12940 0.8892 1.088 2.3743 2.01 0.916
We can see that the usual robust standard errors
(HC1 se
) are much smaller than the effective standard
errors (Adj. se
), which are computed by taking the
HC2
standard errors and applying a degrees of freedom
adjustment.
Now consider a cluster-robust regression of y
on
x2
. There are only 3 treated clusters, so the effective
number of observations is again small:
r1 <- lm(y ~ x2, data = d1)
# Default Imbens-Kolesár method
dfadjustSE(r1, clustervar = d1$cl)
#>
#> Coefficients:
#> Estimate HC1 se HC2 se Adj. se df p-value
#> (Intercept) -0.0236 0.0135 0.0169 0.0222 4.94 0.2215
#> x2 0.1778 0.0530 0.0621 0.1157 2.43 0.0826
# Bell-McCaffrey method
dfadjustSE(r1, clustervar = d1$cl, IK = FALSE)
#>
#> Coefficients:
#> Estimate HC1 se HC2 se Adj. se df p-value
#> (Intercept) -0.0236 0.0135 0.0169 0.0316 2.42 0.2766
#> x2 0.1778 0.0530 0.0621 0.1076 2.70 0.0731
Now, let us run a regression of y
on x3
,
with fixed effects. Since we’re only interested in x3
, we
specify that we only want inference on the second element:
r1 <- lm(y ~ x3 + cl, data = d1)
dfadjustSE(r1, clustervar = d1$cl, ell = c(0, 1, rep(0,
r1$rank - 2)))
#>
#> Coefficients:
#> Estimate HC1 se HC2 se Adj. se df p-value
#> Estimate 0.0261 0.0463 0.0595 0.0928 3.23 0.688
dfadjustSE(r1, clustervar = d1$cl, ell = c(0, 1, rep(0,
r1$rank - 2)), IK = FALSE)
#>
#> Coefficients:
#> Estimate HC1 se HC2 se Adj. se df p-value
#> Estimate 0.0261 0.0463 0.0595 0.0928 3.23 0.688
Finally, an example in which the clusters are large. We have 500,000 observations:
d2 <- do.call("rbind", replicate(500, d1, simplify = FALSE))
d2$y <- rnorm(length(d2$y))
r2 <- lm(y ~ x2, data = d2)
summary(r2)
#>
#> Call:
#> lm(formula = y ~ x2, data = d2)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.073 -0.675 0.000 0.675 4.789
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.000991 0.001535 -0.65 0.52
#> x2 -0.003590 0.003963 -0.91 0.37
#>
#> Residual standard error: 1 on 499998 degrees of freedom
#> Multiple R-squared: 1.64e-06, Adjusted R-squared: -3.59e-07
#> F-statistic: 0.821 on 1 and 5e+05 DF, p-value: 0.365
# Default Imbens-Kolesár method
dfadjustSE(r2, clustervar = d2$cl)
#>
#> Coefficients:
#> Estimate HC1 se HC2 se Adj. se df p-value
#> (Intercept) -0.000991 0.00133 0.00168 0.00294 2.66 0.603
#> x2 -0.003590 0.00483 0.00568 0.00997 2.65 0.578
# Bell-McCaffrey method
dfadjustSE(r2, clustervar = d2$cl, IK = FALSE)
#>
#> Coefficients:
#> Estimate HC1 se HC2 se Adj. se df p-value
#> (Intercept) -0.000991 0.00133 0.00168 0.00315 2.42 0.607
#> x2 -0.003590 0.00483 0.00568 0.00984 2.70 0.577
This section describes the implementation of the Imbens and Kolesár (2016) and Bell and McCaffrey (2002) degrees of freedom adjustments.
There are S clusters, and we observe ns observations in cluster s, for a total of $n=\sum_{s=1}^{S}n_{s}$ observations. We handle the case with independent observations by letting each observation be in its own cluster, with S = n. Consider the linear regression of a scalar outcome Yi onto a p-vector of regressors Xi, We’re interested in inference on ℓ′β for some fixed vector ℓ ∈ ℝp. Let X, u, and Y denote the design matrix, and error and outcome vectors, respectively. For any n × k matrix M, let Ms denote the ns × k block corresponding to cluster s, so that, for instance, Ys corresponds to the outcome vector in cluster s. For a positive semi-definite matrix M, let M1/2 be a matrix satisfying M1/2′M1/2 = M, such as its symmetric square root or its Cholesky decomposition.
Assume that Denote the conditional variance matrix of u by Ω, so that Ωs is the block
of Ω corresponding to cluster
s. We estimate ℓ′β using OLS. In R
,
the OLS estimator is computed via a QR decomposition, X = QR, where
Q′Q = I and
R is upper-triangular, so we
can write the estimator as It has variance
We estimate V using a variance estimator that generalizes the HC2 variance estimator to clustering. Relative to the LZ2 estimator described in Imbens and Kolesár (2016), we use a slight modification that allows for fixed effects: where
and As is a generalized inverse of the symmetric square root of I − QsQs′, the block of the hat matrix corresponding to cluster s. In presence of cluster-specific fixed effects, I − QsQs′ is not generally invertible, which necessitates taking a generalized inverse. So long as the vector ℓ doesn’t load on these fixed effects, V̂ will be unbiased under homoskedasticity, as the next result, which slightly generalizes the Theorem 1 in Pustejovsky and Tipton (2018), shows.
The proof is given in the last section. By definition of projection, L and $\ddot{W}$ are orthogonal. Condition (i) of the lemma strengthens this requirement to orthogonality within each cluster. It holds if L corresponds to a vector of cluster fixed effects, or more generally if L contains cluster-specific variables. Condition (ii) ensures that after partialling out L, it is feasible to run leave-one-cluster-out regressions. Without clustering, the condition is equivalent to the requirement that the partial leverages associated with $\ddot{W}$ are smaller than one.2
If the observations are independent, the vector of leverages (Q1′Q1, …, Qn′Qn)
can be computed directly using the stats::hatvalues
function. In this case, we use this function to compute $A_{i}=1/\sqrt{1-Q_{i}'Q_{i}}$ directly, and
we then compute $a_{i}=A_{i}Q_{i}'\tilde{\ell}$ using vector
operations. For the case with clustering, computing an inverse of I − QsQs′
can be expensive or even infeasible if the cluster size ns is large. We
therefore use the following result, which allows us to compute as by computing
a spectral decomposition of a p × p matrix.
The lemma follows from the fact that I − QsQs′ has eigenvalues 1 − λis and eigenvectors Qsris. More precisely, let Qs = ∑iλis1/2uisris′ denote the singular value decomposition of Qs, so that I − QsQs′ = ∑i(1 − λis)uisuis′, and we can take As = ∑i: λis ≠ 1(1 − λis)−1/2uisuis′. Then, where the second equality uses Qsris = λis1/2uis.
Let G be an n × S matrix with columns
(I − QQ′)s′as.
Then the Bell and McCaffrey (2002)
adjustment sets the degrees of freedom to Since $(G'
G)_{st}=a_{s}'(I-QQ')_{s}(I-QQ)_{t}'a_{t}=a_{s}(\1{s=t}-Q_{s}Q_{t}')a_{t}$,
the matrix G′G can be
efficiently computed as Note that B is an S × p matrix, so that
computing the degrees of freedom adjustment only involves p × p matrices: If the
observations are independent, we compute B directly as
B <- a*Q
, and since ai is a scalar,
we have
The Imbens and Kolesár (2016) degrees of freedom adjustment instead sets
where Ω̂ is an estimate of the Moulton (1986) model of the covariance matrix, under which Ωs = σϵ2Ins + ριnsιns′. Using simple algebra, one can show that in this case, where which can again be computed even if the clusters are large. The estimate Ω̂ replaces σϵ2 and ρ with analog estimates.
The estimator of the block of V associated with W implied by V̂ is given by which is unbiased under homoskedasticity if for each s, We will show that holds. To this end, we first claim that under conditions (i) and (ii), $\ddot{W}_{s}$ is in the column space of I − QsQs′ (a claim that’s trivial if this matrix is full rank). Decompose $I-QQ'=I-H_{\ddot{W}}-H_{L}$, where $H_{\ddot{W}}$ and HL are hat matrices associated with $\ddot{W}$ and L. The block associated with cluster s can thus be written as $I-Q_{s}Q_{s}'=I-L_{s}(L' L)^{-1}L_{s}'-\ddot{W}_{s}(\ddot{W}'\ddot{W})^{-1}\ddot{W}_{s}'$. Let $B_{s}=\ddot{W}_{s}(\ddot{W}'\ddot{W}-\ddot{W}_{s}'\ddot{W}_{s})^{-1}\ddot{W}'\ddot{W}$, which is well-defined under condition (ii). Then, using condition (i), we get proving the claim. Letting C denote the symmetric square root of I − QsQs′, the left-hand side of can therefore be written as where the second equality follows by the definition of a generalized inverse.