The package RDHonest implements bias-aware inference
methods in regression discontinuity (RD) designs developed in Armstrong and Kolesár (2018), Armstrong and Kolesár (2020), and Kolesár and Rothe (2018). In this vignette, we
demonstrate the implementation of these methods using datasets from
Lee (2008), Oreopoulos (2006), and Battistin et al. (2009) and Ludwig and Miller (2007), which are included in
the package as a data frame lee08, cghs,
rcp and headst. The dataset from Lalive (2008) used in Kolesár and Rothe (2018) is also included in the
package as data frame rebp.
In a fuzzy RD design, the treatment status \(D_i\) of a unit does not necessarily equate the treatment assignment \(Z_i=I\{x_{i}\geq c_{0}\}\). Instead, the treatment assignment induces a jump in the treatment probability at the cutoff. Correspondingly, we augment the outcome model with a first stage that measures the effect of the running variable on the treatment: \[\begin{equation} Y_i = f_Y(x_i) + u_{Y, i}, \qquad D_i = f_D(x_i) + u_{Y, i}, \end{equation}\] where \(f_D, f_Y\) are the conditional mean functions.
To account for imperfect compliance the fuzzy RD parameter scales the jump in the outcome equation \(\tau_Y\) by the jump in the treatment probability at the cutoff, \(\tau_D=\lim_{x\downarrow c_{0}}f_D(x)-\lim_{x\uparrow c_{0}}f_D(x)\). This fuzzy RD parameter, \(\theta=\tau_{Y}/\tau_{D}\), measures the local average treatment effect for individuals at the threshold who comply with the treatment assignment, provided mild continuity conditions and a monotonicity condition hold (Hahn et al. (2001)). Under perfect compliance, the treatment probability jumps all the way from zero to one at the threshold so that \(\tau_{D}=1\), and the two parameters coincide.
For example, in the Battistin et al. (2009) dataset, the treatment variable is an indicator for retirement, and the running variable is the number of years since being eligible to retire. The cutoff is \(0\). Individuals exactly at the cutoff are dropped from the dataset. If there were individuals exactly at the cutoff, they are assumed to receive the treatment assignment (i.e. be eligible for retirement).
A natural estimator for the fuzzy RD parameter \(\theta\) is the sample analog based on local linear regression, \[\begin{equation*} \hat{\theta}_{h}=\frac{\hat{\tau}_{Y, h}}{\hat{\tau}_{D, h}}. \end{equation*}\]
Unlike in the sharp case, our bias-aware CIs do rely on the consistency of the estimator, which generally requires the bandwidth to shrink with the sample size. Since this estimator is a ratio of regression coefficients, it follows by the delta method that so long as \(\tau_{D}\neq 0\), the estimator will be asymptotically normal in large samples. In fact, the estimator is equivalent to a weighted IV regression of \(Y_i\) onto \(D_i\), using \(Z_i\) as an instrument, and \(x_i\) and its interaction with \(Z_i\) as controls, so the variance formula is analogous to the IV variance formula: \[\begin{equation*} \sd(\hat{\theta}_h)^2=\frac{\sd(\tau_{Y, h})^2+\theta^2\sd(\tau_{D, h})^2 -2\cov(\tau_{D, h}, \tau_{Y, h})\theta}{\tau_D^2}, \end{equation*}\] where \(\cov(\tau_{D, h}, \tau_{Y, h})=\sum_i k_{i, h}^2\cov(Y_i, D_i\mid x_i)\) is the covariance of the estimators.
If the second derivative of \(f_Y\)
is bounded by \(M_Y\) and the second
derivative of \(f_D\) is bounded by
\(M_D\), a linearization argument from
Section 3.2.3 in Armstrong and Kolesár
(2020) that the bias can be bounded in large samples by \(B_{M, h}\), with \(M=(M_{Y}+\abs{\theta}M_{D})/\abs{\tau_{D}}\),
which now depends on \(\theta\) itself.
Therefore, optimal bandwidth calculations will require a preliminary
estimate of \(\abs{\theta}\), which can
be passed to RDHonest via the option T0. Like
in the sharp case, the optimal bandwidth calculations assume
homoskedastic covariance of \((u_{Y, i}, u_{D,
i})\) on either side of the cutoff, which are estimated based on
a preliminary local linear regression for both the outcome and first
stage equation, with bandwidth given by the Imbens and Kalyanaraman (2012) bandwidth
selector applied to the outcome equation.
## Initial estimate of treatment effect for optimal
## bandwidth calculations
r <- RDHonest(log(cn) | retired ~ elig_year, data = rcp,
kern = "triangular", M = c(0.001, 0.002), opt.criterion = "MSE",
sclass = "H", T0 = 0)
## Use it to compute optimal bandwidth
RDHonest(log(cn) | retired ~ elig_year, data = rcp, kern = "triangular",
M = c(0.001, 0.002), opt.criterion = "MSE", sclass = "H",
T0 = r$coefficients$estimate)
#>
#> Call:
#> RDHonest(formula = log(cn) | retired ~ elig_year, data = rcp,
#> M = c(0.001, 0.002), kern = "triangular", opt.criterion = "MSE",
#> sclass = "H", T0 = r$coefficients$estimate)
#>
#> Inference for Fuzzy RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> retired -0.09062179 0.07162253 0.04374348 (-0.253552, 0.07230844)
#>
#> Onesided CIs: (-Inf, 0.07093026), (-0.2521738, Inf)
#> Number of effective observations: 7465.396
#> Maximal leverage for fuzzy RD parameter: 0.000817649
#> First stage estimate: 0.3479478
#> First stage smoothness constant M: 0.002
#> Reduced form smoothness constant M: 0.001
#> P-value: 0.286715
#>
#> Based on local regression with bandwidth: 9.551734, kernel: triangular
#> Regression coefficients:
#> log(cn) retired
#> I(elig_year>0) -0.031532 0.347948
#> I(elig_year>0):elig_year -0.005692 -0.009782
#> (Intercept) 9.760643 0.246088
#> elig_year -0.005370 0.032019Like in the sharp RD case, without further restrictions, the curvature parameters \(M_Y\) and \(M_D\) cannot be data-driven: to maintain honesty over the whole function class, a researcher must choose them a priori, rather than attempting to use a data-driven method. Therefore, one should, whenever possible, use problem-specific knowledge to decide what choices of \(M_Y\) and \(M_D\) are reasonable a priori.
For cases in which this is difficult, the function
RDHonest implements the rule of thumb Armstrong and Kolesár (2020) described earlier,
based on computing the global smoothness of both \(f_Y\) and \(f_D\) using a quartic polynomial. When the
user doesn’t supply the curvature bounds, the package uses the rule of
thumb \(\hat{M}_{ROT}\), and prints a
message to inform the user:
## Data-driven choice of M
RDHonest(log(cn) | retired ~ elig_year, data = rcp, kern = "triangular",
opt.criterion = "MSE", sclass = "H", T0 = r$coefficients$estimate)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
#>
#> Call:
#> RDHonest(formula = log(cn) | retired ~ elig_year, data = rcp,
#> kern = "triangular", opt.criterion = "MSE", sclass = "H",
#> T0 = r$coefficients$estimate)
#>
#> Inference for Fuzzy RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> retired -0.1762098 0.1042236 0.08473369 (-0.4329103, 0.08049077)
#>
#> Onesided CIs: (-Inf, 0.07995648), (-0.4323761, Inf)
#> Number of effective observations: 4543.727
#> Maximal leverage for fuzzy RD parameter: 0.001114888
#> First stage estimate: 0.3185837
#> First stage smoothness constant M: 0.008178929
#> Reduced form smoothness constant M: 0.002849525
#> P-value: 0.1962011
#>
#> Based on local regression with bandwidth: 6.127523, kernel: triangular
#> Regression coefficients:
#> log(cn) retired
#> I(elig_year>0) -0.056138 0.318584
#> I(elig_year>0):elig_year -0.009435 -0.019943
#> (Intercept) 9.777239 0.273343
#> elig_year 0.001841 0.042820See Armstrong and Kolesár (2020) for a discussion of the restrictions on the parameter space under which this method yields honest inference.
RD datasets often contain information on a vector of \(K\) pre-treatment covariates \(W_{i}\), such as pre-intervention outcomes, demographic, or socioeconomic characteristics of the units. Similar to randomized controlled trials, while the presence of covariates doesn’t help to weaken the fundamental identifying assumptions, augmenting the RD estimator with predetermined covariates can increase its precision.
Let us first describe covariate adjustment in the sharp RD case. We implement the covariate adjustment studied in Calonico et al. (2019), namely to include \(W_{i}\) as one of the regressors in the WLS regression, regressing \(Y_{i}\) onto \(m(x_{i})\) and \(W_{i}\). As in the case with no covariates, we weight each observation with the kernel weight \(K(x_{i}/h)\). This leads to the estimator \[\begin{equation*} \tilde{\tau}_{Y, h}=\tilde{\beta}_{Y, h,1}, \qquad \tilde{\beta}_{Y, h}=\left(\sum_{i=1}^{n}K(x_{i}/h) \tilde{m}(x_{i}, W_{i})\tilde{m}(x_{i}, W_{i})'\right)^{-1} \sum_{i=1}^{n}K(x_{i}/h)\tilde{m}(x_{i}, W_{i})Y_{i}, \end{equation*}\] where \(\tilde{m}(x_{i}, W_{i})=(m(x_{i})', W_{i}')'\). Denote the coefficient on \(W_{i}\) in this regression by \(\tilde{\gamma}_{Y, h}\); this corresponds to the last \(K\) elements of \(\tilde{\beta}_{Y, h}\). As in the case without covariates, we first take the bandwidth \(h\) as given, and defer bandwidth selection choice to the end of this subsection.
To motivate the estimator under our framework, and to derive bias-aware CIs that incorporate covariates, we need to formalize the assumption that the covariates are predetermined (without any assumptions on the covariates, it is optimal to ignore the covariates and use the unadjusted estimator \(\hat{\tau}_{Y, h}\)). Let \(f_{W}(x)=E[W_{i}\mid X_{i}=x]\) denote the regression function from regressing the covariates on the running variable, and let \[\begin{equation*} \Sigma_{WW}(x)=\operatorname{var}(W_{i}\mid X_{i}=x), \qquad \Sigma_{WY}(x)=\cov(W_{i}, Y_{i}\mid X_{i}=x). \end{equation*}\] We assume that the variance and covariance functions are continuous, except possibly at zero. Let \(\gamma_{Y}=(\Sigma_{WW}(0_{+})+\Sigma_{WW}(0_{-}))^{-1} (\Sigma_{WY}(0_{+})+\Sigma_{WY}(0_{-}))\) denote the coefficient on \(W_{i}\) when we regress \(Y_{i}\) onto \(W_{i}\) for observations at the cutoff. Let \(\tilde{Y}_{i}:=Y_{i}-W_{i}'\gamma_{Y}\) denote the covariate-adjusted outcome. To formalize the assumption that the covariates are pre-determined, we assume that \(\tau_{W}=\lim_{x\downarrow 0}f_{W}(0)-\lim_{x\uparrow 0}f_{W}(0)=0\), which implies that \(\tau_{Y}\) can be identified as the jump in the covariate-adjusted outcome \(\tilde{Y}_{i}\) at \(0\). Following Appendix B.1 in Armstrong and Kolesár (2018), we also assume that the covariate-adjusted outcome varies smoothly with the running variable (except for a possible jump at the cutoff), in that the second derivative of \[\begin{equation*} \tilde{f}(x):=f_{Y}(x)-f_{W}(x)'\gamma_{Y} \end{equation*}\] is bounded by a known constant \(\tilde{M}\). In addition, we assume \(f_{W}\) has bounded second derivatives.
Under these assumptions, if \(\gamma_{Y}\) was known and hence \(\tilde{Y}_{i}\) was directly observable, we could estimate \(\tau\) as in the case without covariates, replacing \(M\) with \(\tilde{M}\) and \(Y_{i}\) with \(\tilde{Y}_{i}\). Furthermore, as discussed in Armstrong and Kolesár (2018), such approach would be optimal under homoskedasticity assumptions. Although \(\gamma_{Y}\) is unknown, it turns out that the estimator \(\tilde{\tau}_{Y, h}\) has the same large sample behavior as the infeasible estimator \(\hat{\tau}_{\tilde{Y}, h}\). To show this, note that by standard regression algebra, \(\tilde{\tau}_{Y, h}\) can equivalently be written as \[\begin{equation*} \tilde{\tau}_{Y, h}=\hat{\tau}_{Y-W'\tilde{\gamma}_{Y, h}, h}= \hat{\tau}_{\tilde{Y}, h}- \sum_{k=1}^{K}\hat{\tau}_{W_{k}, h}(\tilde{\gamma}_{Y, h, k}-\gamma_{Y, k}). \end{equation*}\] The first equality says that covariate-adjusted estimate is the same as an unadjusted estimate that replaces the original outcome \(Y_{i}\) with the covariate-adjusted outcome \(Y_{i}-W_{i}'\tilde{\gamma}_{Y, h}\). The second equality uses the decomposition \(Y_{i}-W_{i}'\tilde{\gamma}_{Y, h}=\tilde{Y}_{i}-W_{i}' (\tilde{\gamma}_{Y, h}-\gamma_{Y})\) to write the estimator as a sum of the infeasible estimator and a linear combination of “placebo RD estimators” \(\hat{\tau}_{W_{k}, h}\), that replace \(Y_{i}\) in the outcome equation with the \(k\)th element of \(W_{i}\), Since \(f_{W}\) has bounded second derivatives, these placebo estimators converge to zero, with rate that is at least as fast as the rate of convergence of the infeasible estimator \(\hat{\tau}_{\tilde{Y}, h}\): \(\hat{\tau}_{W_{k}, h}=O_{p}(B_{\tilde{M}, h}+\sd(\hat{\tau}_{\tilde{Y}, h}))\). Furthermore, under regularity conditions, \(\tilde{\gamma}_{Y, h}\) converges to \(\gamma_{Y}\), so that the second term in the previous display is asymptotically negligible relative to the first. Consequently, we can form bias-aware CIs based on \(\tilde{\tau}_{Y, h}\) as in the case without covariates, treating the covariate-adjusted outcome \(Y_{i}-W_{i}'\tilde{\gamma}_{Y}\) as the outcome, \[\begin{equation*} \tilde{\tau}_{Y, h}\pm \cv_{1-\alpha}(B_{\tilde{M}, h} / \sd(\hat{\tau}_{\tilde{Y}, h}))\sd(\hat{\tau}_{\tilde{Y}, h}), \qquad \sd(\hat{\tau}_{\tilde{Y}, h})^2=\sum_{i=1}^{n}k_{i, h}^{2}\sigma^{2}_{\tilde{Y}}(x_{i}), \end{equation*}\] where \(\sigma^{2}_{\tilde{Y}}(x_{i})=\sigma^{2}_{Y}(x_{i})+{\gamma}_{Y}'\Sigma_{WW}(x_{i}){\gamma}_{Y}-2{\gamma}_{Y}'\Sigma_{WY}(x_{i})\). If the covariates are effective at explaining variation in the outcomes, then the quantity \(\sum_{i}k_{i, h}^{2}\left({\gamma}_{Y}'\Sigma_{WW}(x_{i}){\gamma}_{Y}-2{\gamma}_{Y}'\Sigma_{WY}(x_{i}) \right)\) will be negative, and \(\sd(\hat{\tau}_{\tilde{Y}, h})\leq \sd(\hat{\tau}_{Y, h})\). If the smoothness of the covariate-adjusted conditional mean function \(f_{Y}-f_{W}'\gamma_{Y}\) is greater than the smoothness of the unadjusted conditional mean function \(f_{Y}\), so that \(\tilde{M}\leq M\), then using the covariates will help tighten the confidence intervals.
Implementation of covariate-adjustment requires a choice of \(\tilde{M}\), and computing the optimal bandwidth requires a preliminary estimate of the variance of the covariate-adjusted outcome. In our implementation, we first estimate the model without covariates (using a rule of thumb to calibrate \(M\), the bound on the second derivative of \(f_{Y}\)), and compute the bandwidth \(\check{h}\) that’s MSE optimal without covariates. Based on this bandwidth, we compute a preliminary estimate \(\tilde{\gamma}_{Y, \check{h}}\) of \(\gamma_{Y}\), and use this preliminary estimate to compute a preliminary covariate-adjusted outcome \(Y_{i}-W_{i}'\tilde{\gamma}_{Y, \check{h}}\). If \(\tilde{M}\) is not supplied, we calibrate \(\tilde{M}\) using the rule of thumb, using this preliminary covariate-adjusted outcome as the outcome. Similarly, we use this preliminary covariate-adjusted outcome as the outcome to compute a preliminary estimator of the conditional variance \(\sigma^{2}_{\tilde{Y}}(x_{i})\), for optimal bandwidth calculations, as in the case without covariates. With this choice of bandwidth \(h\), in the second step, we estimate \(\tau_Y\) using the estimator \(\tilde{\tau}_{Y, h}\) defined above.
A demonstration using the headst data:
## No covariates
rn <- RDHonest(mortHS ~ povrate, data = headst)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## Use Percent attending school aged 14-17, urban,
## black, and their interaction as covariates.
rc <- RDHonest(mortHS ~ povrate | urban * black + sch1417,
data = headst)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
rn
#>
#> Call:
#> RDHonest(formula = mortHS ~ povrate, data = headst)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(povrate>0) -3.15129 1.272338 0.7014739 (-5.980412, -0.3221684)
#>
#> Onesided CIs: (-Inf, -0.357007), (-5.945573, Inf)
#> Number of effective observations: 256.5127
#> Maximal leverage for sharp RD parameter: 0.0279751
#> Smoothness constant M: 0.2993999
#> P-value: 0.02831734
#>
#> Based on local regression with bandwidth: 4.880646, kernel: triangular
#> Regression coefficients:
#> I(povrate>0) I(povrate>0):povrate (Intercept)
#> -3.1513 0.4805 3.7589
#> povrate
#> 0.2800
#> 24 observations with missing values dropped
rc
#>
#> Call:
#> RDHonest(formula = mortHS ~ povrate | urban * black + sch1417,
#> data = headst)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(povrate>0) -2.678646 1.163838 0.6105652 (-5.240691, -0.1166008)
#>
#> Onesided CIs: (-Inf, -0.1537367), (-5.203555, Inf)
#> Number of effective observations: 303.9467
#> Maximal leverage for sharp RD parameter: 0.02287772
#> Smoothness constant M: 0.185326
#> P-value: 0.04014376
#>
#> Based on local regression with bandwidth: 5.891514, kernel: triangular
#> Regression coefficients:
#> I(povrate>0) I(povrate>0):povrate (Intercept)
#> -2.6786458 0.3592431 15.2995726
#> povrate urban black
#> 0.1117840 0.0111355 0.0401396
#> sch1417 urban:black
#> -0.1553701 -0.0005111
#> 29 observations with missing values droppedWe see that the inclusion of covariates leads to a reduction in the rule-of-thumb curvature and also smaller standard errors (this would be true even if the bandwidth was kept fixed). Correspondingly, the CIs are tighter by about 9 percentage points:
ci_len <- c(rc$coefficients$conf.high - rc$coefficients$conf.low,
rn$coefficients$conf.high - rn$coefficients$conf.low)
100 * (1 - ci_len[1]/ci_len[2])
#> [1] 9.440274In the fuzzy RD case, we need to covariate-adjust the treatment \(D_i\) as well as the outcome. The implementation mirrors the sharp case. Define \(\gamma_{D}\) analogously to \(\gamma_{Y}\), and assume that the second derivative of \(f_{Y}(x)-f_{W}(x)'\gamma_{Y}\) is bounded by a known constant \(\tilde{M}_{Y}\), and that \(f_{D}(x)-f_{W}(x)'\gamma_{D}\) is bounded by a known constant \(\tilde{M}_{D}\). The covariate-adjusted estimator is given by \(\tilde{\theta}_{h}=\tilde{\tau}_{Y, h}/\tilde{\tau}_{D, h}\), with variances and worst-case bias computed as in the case without covariates, replacing the treatment and outcome with their covariate-adjusted versions.
A demonstration using the rcp data, where we add
education controls:
RDHonest(log(cn) | retired ~ elig_year | education, data = rcp,
T0 = r$coefficients$estimate)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
#>
#> Call:
#> RDHonest(formula = log(cn) | retired ~ elig_year | education,
#> data = rcp, T0 = r$coefficients$estimate)
#>
#> Inference for Fuzzy RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> retired -0.26109 0.113415 0.1029379 (-0.5508736, 0.02869355)
#>
#> Onesided CIs: (-Inf, 0.02839887), (-0.5505789, Inf)
#> Number of effective observations: 3259.425
#> Maximal leverage for fuzzy RD parameter: 0.001393073
#> First stage estimate: 0.316514
#> First stage smoothness constant M: 0.007191008
#> Reduced form smoothness constant M: 0.005370042
#> P-value: 0.08225439
#>
#> Based on local regression with bandwidth: 5.069605, kernel: triangular
#> Regression coefficients:
#> log(cn) retired
#> I(elig_year>0) -0.082639 0.316514
#> I(elig_year>0):elig_year -0.031593 -0.008402
#> (Intercept) 9.378002 0.515120
#> elig_year 0.025771 0.035794
#> educationelementary school 0.275151 -0.201655
#> educationlower secondary 0.431725 -0.287265
#> educationvocational studies 0.580132 -0.360524
#> educationupper secondary 0.741646 -0.294614
#> educationcollege or higher 0.923659 -0.424387Relative to the previous estimate without covariates, the point estimate is now much larger. This is in part due to slightly smaller bandwidth used, and the regression function for the reduced form appears noisy below the cutoff, potentially due to measurement error: see Figure 3. As a result, the estimates are quite sensitive to the bandwidth used. The noise is also responsible for the rather large data-driven estimates of the curvature parameters.
## see Figure 3
f3 <- RDScatter(log(cn) ~ elig_year, data = rcp, cutoff = 0,
avg = Inf, xlab = "Years to eligibility", ylab = "Log consumption of non-durables",
propdotsize = TRUE, subset = abs(elig_year) < 15)
## Adjust size of dots if they are too big
f3 + ggplot2::scale_size_area(max_size = 5)Battistin et al (2009) data
In some cases, data is only observed as cell averages. For instance,
suppose that instead of observing the original cghs data,
we only observe averages for cells as follows:
dd <- data.frame()
## Collapse data by running variable
for (j in unique(cghs$yearat14)) {
ix <- cghs$yearat14 == j
df <- data.frame(y = mean(log(cghs$earnings[ix])), x = j,
weights = sum(ix), sigma2 = var(log(cghs$earnings[ix]))/sum(ix))
dd <- rbind(dd, df)
}The column weights gives the number of observations that
each cell averages over. In this case, if we weight the observations
using weights, we can recover the original estimates (and
the same worst-case bias). If we use the estimates of the conditional
variance of the outcome, dd$sigma2, then we can also
replicate the standard error calculations:
s0 <- RDHonest(log(earnings) ~ yearat14, cutoff = 1947,
data = cghs)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## keep same bandwidth
s1 <- RDHonest(y ~ x, cutoff = 1947, data = dd, weights = weights,
sigmaY2 = sigma2, se.method = "supplied.var", h = s0$coefficients$bandwidth)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## Results are identical:
s0
#>
#> Call:
#> RDHonest(formula = log(earnings) ~ yearat14, data = cghs, cutoff = 1947)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(yearat14>0) 0.07047433 0.05307276 0.03796328 (-0.05531319, 0.1962619)
#>
#> Onesided CIs: (-Inf, 0.1957345), (-0.05478587, Inf)
#> Number of effective observations: 9074.452
#> Maximal leverage for sharp RD parameter: 0.0005055239
#> Smoothness constant M: 0.02296488
#> P-value: 0.2905956
#>
#> Based on local regression with bandwidth: 3.442187, kernel: triangular
#> Regression coefficients:
#> I(yearat14>0) I(yearat14>0):yearat14 (Intercept)
#> 0.070474 0.009301 8.732659
#> yearat14
#> 0.010848
s1
#>
#> Call:
#> RDHonest(formula = y ~ x, data = dd, weights = weights, cutoff = 1947,
#> h = s0$coefficients$bandwidth, se.method = "supplied.var",
#> sigmaY2 = sigma2)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(x>0) 0.07047433 0.05307276 0.03796328 (-0.05531319, 0.1962619)
#>
#> Onesided CIs: (-Inf, 0.1957345), (-0.05478587, Inf)
#> Number of effective observations: 9074.452
#> Maximal leverage for sharp RD parameter: 0.0005055239
#> Smoothness constant M: 0.02296488
#> P-value: 0.2905956
#>
#> Based on local regression with bandwidth: 3.442187, kernel: triangular
#> Regression coefficients:
#> I(x>0) I(x>0):x (Intercept) x
#> 0.070474 0.009301 8.732659 0.010848Without supplying the variance estimates and specifying
se.method="supplied.var", the variance estimates will not
match, since the collapsed data is not generally not sufficient to learn
about the true variability of the collapsed outcomes.
The same method works in fuzzy designs, but one has to also save the conditional variance of the treatment and its covariance with the outcome:
r0 <- RDHonest(log(cn) | retired ~ elig_year, data = rcp,
h = 7)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
dd <- data.frame(x = sort(unique(rcp$elig_year)), y = NA,
d = NA, weights = NA, sig11 = NA, sig12 = NA, sig21 = NA,
sig22 = NA)
for (j in seq_len(NROW(dd))) {
ix <- rcp$elig_year == dd$x[j]
Y <- cbind(log(rcp$cn[ix]), rcp$retired[ix])
dd[j, -1] <- c(colMeans(Y), sum(ix), as.vector(var(Y))/sum(ix))
}
r1 <- RDHonest(y | d ~ x, data = dd, weights = weights,
sigmaY2 = sig11, T0 = 0, sigmaYD = sig21, sigmaD2 = sig22,
h = 7, se.method = "supplied.var")
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## Outputs match up to numerical precision
max(abs(r0$coefficients[2:11] - r1$coefficients[2:11]))
#> [1] 3.183231e-11In some applications, the data are collected by clustered sampling.
In such cases, the user can specify a vector clusterid
signifying cluster membership. In this case, preliminary bandwidth
calculations assume that the regression errors have a Moulton-type
structure, with homoskedasticity on either side of the cutoff: \[\begin{equation*} \cov(Y_{i},
Y_{j})=\begin{cases} \sigma^{2}_{+} &
\text{if $i=j$ and $x_{i}\geq 0$,} \\ \sigma^{2}_{-} & \text{if
$i=j$ and
$x_{i}< 0$,} \\ \rho & \text{if $i\neq j$ and $g(i)=g(j)$,} \\ 0
&
\text{otherwise,} \end{cases} \end{equation*}\] where \(g(i)\in\{1,\dotsc, G\}\) denotes cluster
membership. Since it appears difficult to generalize the nearest
neighbor variance estimator to clustering, we use regression-based
cluster-robust variance formulas to compute estimator variances, so that
option se.method="EHW" is required.
## make fake clusters
set.seed(42)
clusterid <- sample(1:50, NROW(lee08), replace = TRUE)
sc <- RDHonest(voteshare ~ margin, data = lee08, se.method = "EHW",
clusterid = clusterid, M = 0.14, h = 7)
## Since clusters are unrelated to outcomes, not
## clustering should yield similar standard errors
sn <- RDHonest(voteshare ~ margin, data = lee08, se.method = "EHW",
M = 0.14, h = 7)
sc
#>
#> Call:
#> RDHonest(formula = voteshare ~ margin, data = lee08, M = 0.14,
#> h = 7, se.method = "EHW", clusterid = clusterid)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(margin>0) 5.821591 1.528091 0.7131699 (2.527825, 9.115358)
#>
#> Onesided CIs: (-Inf, 9.048247), (2.594936, Inf)
#> Number of effective observations: 695.1204
#> Maximal leverage for sharp RD parameter: 0.01063752
#> Smoothness constant M: 0.14
#> P-value: 0.0004238717
#>
#> Based on local regression with bandwidth: 7, kernel: triangular
#> Regression coefficients:
#> I(margin>0) I(margin>0):margin (Intercept) margin
#> 5.82159 0.07179 46.38139 0.65536
sn
#>
#> Call:
#> RDHonest(formula = voteshare ~ margin, data = lee08, M = 0.14,
#> h = 7, se.method = "EHW")
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(margin>0) 5.821591 1.417699 0.7131699 (2.725463, 8.91772)
#>
#> Onesided CIs: (-Inf, 8.866669), (2.776514, Inf)
#> Number of effective observations: 695.1204
#> Maximal leverage for sharp RD parameter: 0.01063752
#> Smoothness constant M: 0.14
#> P-value: 0.000159109
#>
#> Based on local regression with bandwidth: 7, kernel: triangular
#> Regression coefficients:
#> I(margin>0) I(margin>0):margin (Intercept) margin
#> 5.82159 0.07179 46.38139 0.65536The package also implements lower-bound estimates for the smoothness constant \(M\) for the Taylor and Hölder smoothness class, as described in the supplements to Kolesár and Rothe (2018) and Armstrong and Kolesár (2018)
r1 <- RDHonest(voteshare ~ margin, data = lee08, M = 0.1,
se.method = "nn")
### Only use three point-average for averages of a 100
### points closest to cutoff, and report results
### separately for points above and below cutoff
RDSmoothnessBound(r1, s = 100, separate = TRUE, multiple = FALSE,
sclass = "T")
#> estimate conf.low
#> Below cutoff 0.3337537 0.1224842
#> Above cutoff 0.1738905 0.0000000
## Pool estimates based on observations below and
## above cutoff, and use three-point averages over the
## entire support of the running variable
RDSmoothnessBound(r1, s = 100, separate = FALSE, multiple = TRUE,
sclass = "H")
#> estimate conf.low
#> Pooled 0.1959213 0.01728526For the second-order Taylor smoothness class, the function
RDHonest, with kernel="optimal", computes
finite-sample optimal estimators and confidence intervals, as described
in Section 2.2 in Armstrong and Kolesár
(2018). This typically yields tighter CIs:
r1 <- RDHonest(voteshare ~ margin, data = lee08, kern = "optimal",
M = 0.1, opt.criterion = "FLCI", se.method = "nn")$coefficients
r2 <- RDHonest(voteshare ~ margin, data = lee08, kern = "triangular",
M = 0.1, opt.criterion = "FLCI", se.method = "nn", sclass = "T")$coefficients
r1$conf.high - r1$conf.low
#> [1] 6.33639
r2$conf.high - r2$conf.low
#> [1] 6.685286The package can also perform inference at a point, and optimal bandwidth selection for inference at a point. Suppose, for example, one was interested in the vote share for candidates with margin of victory equal to 20 points:
## Specify we're interested in inference at x0=20, and
## drop observations below cutoff
RDHonest(voteshare ~ margin, data = lee08, subset = margin >
0, cutoff = 20, kern = "uniform", opt.criterion = "MSE",
sclass = "H", point.inference = TRUE)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
#>
#> Call:
#> RDHonest(formula = voteshare ~ margin, data = lee08, subset = margin >
#> 0, cutoff = 20, kern = "uniform", opt.criterion = "MSE",
#> sclass = "H", point.inference = TRUE)
#>
#> Inference for Value of conditional mean (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> (Intercept) 61.66394 0.468336 0.2482525 (60.63086, 62.69703)
#>
#> Onesided CIs: (-Inf, 62.68254), (60.64535, Inf)
#> Number of effective observations: 738
#> Maximal leverage for value of conditional mean: 0.001435988
#> Smoothness constant M: 0.0275703
#> P-value: 0
#>
#> Based on local regression with bandwidth: 7.311286, kernel: uniform
#> Regression coefficients:
#> (Intercept) margin
#> 61.6639 0.4076To compute the optimal bandwidth, the package assumes homoskedastic variance on either side of the cutoff, which it estimates based on a preliminary local linear regression using the Fan and Gijbels (1996) rule of thumb bandwidth selector. This homoskedasticity assumption is dropped when the final standard errors are computed.
The estimators in this package are just weighted regression
estimators, or ratios of regression estimators in the fuzzy RD case.
Regression estimators are linear in outcomes, taking the form \(\sum_i k_{i, h} Y_i\), where \(k_{i, h}\) are estimation weights, returned
by data$est_w part of the RDHonest output (see
expression for \(\hat{\tau}_{Y, h}\)
above).
For the sampling distribution of the estimator to be
well-approximated by a normal distribution, it is important that these
regression weights not be too large: asymptotic normality requires \(L_{\max}=\max_j k_{j, h}^2/\sum_i k_{i,
h}^2\to 0\). If uniform kernel is used, the weights \(k_{i, h}\) are just the diagonal elements
of the partial projection matrix. We therefore refer to \(L_{\max}\) as maximal (partial) leverage,
and it is reported in the RDHonest output. The package
issues a warning if the maximal leverage exceeds \(0.1\)—in such cases using a bigger
bandwidth is advised.
In the fuzzy RD case, by Theorem B.2 in the appendix to Armstrong and Kolesár (2020), the estimator is asymptotically equivalent to \(\sum_i k_{i, h} (Y_i-D_i\theta)/\tau_D\), where \(k_{i, h}\) are the weights for \(\hat{\tau}_{Y, h}\). The maximal leverage calculations are thus analogous to the sharp case.
With local regression methods, it is clear that observations outside the estimation window don’t contribute to estimation, reducing the effective sample size. If the uniform kernel is used, the package therefore reports the number of observations inside the estimation window as the “number of effective observations”.
To make this number comparable across different kernels, observe that, under homoskedasticity, the variance of a linear estimator \(\sum_{i} k_{i} Y_i\) is \(\sigma^2\sum_{i} k_i^2\). We expect this to scale in inverse proportion to the sample size: with twice as many observations and the same bandwidth, we expect the variance to halve. Therefore, if the variance ratio relative to a uniform kernel estimator with weights \(\sum_i k_{uniform, i}Y_i\) is \(\sigma^2\sum_{i} k_{uniform, i}^2/\sigma^2\sum_{i} k_i^2=\sum_{i} k_{uniform, i}^2/\sum_{i} k_i^2\), the precision of this estimator is the same as if we used a uniform kernel, but with \(\sum_{i} k_{uniform, i}^2/\sum_{i} k_i^2\) as many observations. Correspondingly, we define the number of effective observations for other kernels as the number of observations inside the estimation window times \(\sum_{i} k_{uniform, i}^2/\sum_{i} k_i^2\). With this definition, using a triangular kernel typically yields effective samples sizes equal to about 80% of the number of observations inside the estimation window.
Finally, to assess which observations are important for pinning down the estimate, it can be useful to explicitly plot the estimation weights.