The package ebci
implements robust empirical Bayes confidence intervals (EBCIs) proposed
by Armstrong et al. (2022) for inference
in a normal means model \(Y_i\sim
N(\theta_i, \sigma^2_i)\), \(i=1,
\dotsc, n\).
Suppose we use an empirical Bayes estimator of \(\theta_i\) that shrinks toward the predictor based on the regression of \(\theta_i\) onto \(X_i\) (equivalently, regression of \(Y_i\) onto \(X_i\)), \[\begin{equation}\label{Bayes-estimator} \hat{\theta}_{i} = X_i'\delta+w_{EB, i}(Y_i-X_i'\delta), \end{equation}\]
where \(\delta=E[X_i X_i']^{-1}E[X_i\theta_i]\), \(w_{EB, i}=\frac{\mu_{2}}{\mu_{2}+\sigma_{i}^{2}}\) is the degree of shrinkage, and \[\begin{equation}\label{mu2-independence} \mu_{2}=E[(\theta_i-X_i'\delta)^{2} \mid X_{i}, \sigma_i]. \end{equation}\] is the second moment of the regression residual. We assume that \(\mu_2\) doesn’t depend on \(\sigma_i\). Under this setup, Morris (1983) proposes to use the parametric EBCI \[\begin{equation*} \hat{\theta}_{i} \pm \frac{z_{1-\alpha/2}}{\sqrt{w_{EB, i}}}w_{EB, i}\sigma_i. \end{equation*}\]
The critical value \(z_{1-\alpha/2}/\sqrt{w_{EB, i}}\) is larger
than the usual critical value \(z_{1-\alpha/2}=\)qnorm(1-alpha/2)
if the estimator was unbiased conditional on \(\theta_i\). This CI is justified if we
strengthen the assumption (\(\ref{mu2-independence}\)) by making the
normality assumption \(\theta_i\mid X_{i},
\sigma_{i} \sim N(X_i'\delta, \mu_2)\). However, if the
residual \(\theta_i-X_i'\delta\) is
not normally distributed, this CI will undercover. Armstrong et al. (2022) derive a robust
EBCI that is only uses (\(\ref{mu2-independence}\)) and not the
normality assumption. The EBCI takes the form \[\begin{equation}\label{robust-ebci}
\hat{\theta}_{i} \pm cva_{\alpha}(m_{2i}, \infty) w_{EB, i}\sigma_i,
\,\, m_{2i}=(1-1/w_{EB, i})^2\mu_2/\sigma^2_i=\sigma^{2}_{i}/\mu_{2},
\end{equation}\] where the critical value \(cva_{\alpha}\) is derived in Armstrong et al. (2022). Here \(m_{2i}\) is the second moment of the
conditional bias of \(\hat{\theta}_i\),
scaled by the standard error (normalized bias, henceforth). This
critical value imposes a constraint (\(\ref{mu2-independence}\)) on the second
moment of \(\theta_i\), but no
constraints on higher moments. We can make the critical value smaller by
also imposing a constraint on the kurtosis of \(\theta_i\) (or equivalently, the kurtosis
of the normalized bias) \[\begin{equation}\label{kappa-independence}
\kappa=E[(\theta_i-X_i'\delta)^{4}
\mid X_{i}, \sigma_i]/\mu_{2}^2.
\end{equation}\] In analogy to (\(\ref{mu2-independence}\)), we assume here
that the conditional fourth moment of \(\theta_i-X_i'\delta\) doesn’t depend on
\((X_i,\sigma_i)\). In this case, the
robust EBCI takes the form \[\begin{equation*}
\hat{\theta}_{i} \pm cva_{\alpha}(m_{2i},\kappa) w_{EB, i}\sigma_i.
\end{equation*}\]
These critical values are implemented in the package by the
cva function:
library(ebci)
## If m_2=0, then we get the usual critical value
cva(m2=0, kappa=Inf, alpha=0.05)$cv
#> [1] 1.959964
## Otherwise the critical value is larger:
cva(m2=4, kappa=Inf, alpha=0.05)$cv
#> [1] 7.216351
## Imposing a constraint on kurtosis tightens it
cva(m2=4, kappa=3, alpha=0.05)$cv
#> [1] 4.619513In practice, the parameters \(\delta, \mu_2\), and \(\kappa\) are unknown. To implement the EBCI, the package replaces them with consistent estimates, following the baseline implementation in Armstrong et al. (2022). We illustrate this in the next section.
Here we illustrate the use of the package using a dataset from Chetty and Hendren (2018) (CH hereafter). The
dataset is included in the package as the list cz. Run
?cz for a full description of the dataset. As in Chetty and Hendren (2018), we use precision
weights proportional to the inverse of the squared standard error to
compute \((\delta,\mu_2,\kappa)\).
## As Y_i, use fixed effect estimate theta25 of causal effect of neighborhood
## for children with parents at the 25th percentile of income distribution. The
## standard error for this estimate is se25. As predictors use average outcome
## for permanent residents (stayers), stayer25. Let us use 90% CIs, as in
## Armstrong et al
r <- ebci(formula=theta25~stayer25, data=cz, se=se25, weights=1/se25^2,
alpha=0.1)For shrinkage toward the grand mean, or toward zero, use the
specification theta25 ~ 1, and theta25 ~ 0,
respectively, in the formula argument of
ebci.
The return value contains (see ?ebci for full
description)
The least squares estimate of \(\delta\):
Estimates of \(\mu_2\) and \(\kappa\). The estimate used for EBCI
calculations (estimate) is obtained by applying a
finite-sample correction to an initial method of moments estimate
(uncorrected_estimate). This correction ensures that we
don’t shrink all the way to zero (or past zero) if the method-of-moments
estimate of \(\mu_2\) is negative (see
Armstrong et al. (2022) for details):
The parameter \(\alpha\)
determining the confidence level, r$alpha, and the matrix
of regressors, r$X.
A data frame with columns:
The columns of the data frame refer to:
w_eb Empirical Bayes shrinkage factor \(w_{EB,
i}=\mu_2/(\mu_2+\sigma_i^2)\).
th_eb Empirical Bayes estimator \(\hat{\theta_i}\) given in (\(\ref{Bayes-estimator}\))
len_eb Half-length \(cva_{\alpha}(m_2, \kappa)w_i\sigma_i\) of
the robust EBCI, so that the lower endpoint of the EBCIs are given by
th_eb-len_eb, and the upper endpoint by
th_eb+len_eb. Let us verify this for the first
observation:
len_pa Half-length \(z_{1-\alpha/2}\sqrt{w_i}\sigma_i\) of the
parametric EBCI.
w_opt Shrinkage factor that optimizes the length of
the resulting confidence interval. In other words, instead of using
\(w_{EB, i}\) in (\(\ref{robust-ebci}\)), we use shrinkage
\(w_i\) that minimizes \(cva_{\alpha}((1-1/w_{EB,
i})^2\mu_2/\sigma^2_i, \kappa) w_{i}\sigma_i\). See Armstrong et al. (2022) for details. The vector
is missing here since the default option, wopt=FALSE, is to
skip computation of the length-optimal CIs to speed up the
calculations.
th_op Estimator based on the length-optimal
shrinkage factor w_opt (missing here since the default is
wopt=FALSE)
len_op Half-length \(cva_{\alpha}((1-1/w_{EB, i})^2\mu_2/\sigma^2_i,
\kappa)
w_{i}\sigma_i\) of the length-optimal EBCI (missing here since we
specified wopt=FALSE).
th_us The unshrunk estimate \(Y_i\), as specified in the
formula argument of the function
ebci.
len_us Half-length \(z_{1-\alpha/2}\sigma_i\) of the CI based on
the unshrunk estimate
se The standard error \(\sigma_i\), as specified by the argument
se of the ebci function.
ncov_pa maximal non-coverage of the parametric
EBCI.
Using the data frame, we can give a table summarizing the results. Let us show the results for the CZ in New York:
df <- (cbind(cz[!is.na(cz$se25), ], r$df))
df <- df[df$state=="NY", ]
knitr::kable(data.frame(cz=df$czname, unshrunk_estimate=df$theta25,
estimate=df$th_eb,
lower_ci=df$th_eb-df$len_eb,
upper_ci=df$th_eb+df$len_eb),
digits=3)| cz | unshrunk_estimate | estimate | lower_ci | upper_ci |
|---|---|---|---|---|
| Syracuse | 0.246 | 0.032 | -0.124 | 0.189 |
| Oneonta | 0.835 | 0.112 | -0.091 | 0.315 |
| Union | -0.493 | -0.014 | -0.201 | 0.173 |
| Buffalo | 0.084 | -0.003 | -0.131 | 0.125 |
| Elmira | 0.056 | 0.056 | -0.143 | 0.255 |
| Olean | -0.024 | 0.080 | -0.114 | 0.273 |
| Watertown | 0.537 | 0.098 | -0.111 | 0.307 |
| Plattsburgh | 0.585 | 0.056 | -0.153 | 0.266 |
| Amsterdam | 0.578 | 0.074 | -0.134 | 0.282 |
| Albany | -0.199 | -0.015 | -0.179 | 0.148 |
| Poughkeepsie | -0.333 | -0.099 | -0.233 | 0.035 |
| New York | -0.148 | -0.116 | -0.180 | -0.053 |
Armstrong et al. (2022) present the same information as a figure.
Finally, let us compute some summary statistics as in Table 3 in Armstrong et al. (2022). Average half-length of the robust, parametric, and unshrunk CI:
mean(r$df$len_eb)
#> [1] 0.1952409
mean(r$df$len_pa)
#> [1] 0.1234121
mean(r$df$len_us)
#> [1] 0.7858001The efficiency of the parametric and unshrunk CI relative to the robust EBCI is given by
mean(r$df$len_eb)/mean(r$df$len_pa)
#> [1] 1.582023
mean(r$df$len_eb)/mean(r$df$len_us)
#> [1] 0.2484613While the parametric EBCI is shorter on average, it yields CIs that may violate the 90% coverage requirement. In particular, the average maximal non-coverage probability at the estimated value of \((\mu_{2},\kappa)\) is given by