The package RDHonest
implements bias-aware inference
methods in regression discontinuity (RD) designs developed in Armstrong and Kolesár (2018), Armstrong and Kolesár (2020), and Kolesár and Rothe (2018). In this vignette, we
demonstrate the implementation of these methods using datasets from
Lee (2008), Oreopoulos (2006), and Battistin et al. (2009) and Ludwig and Miller (2007), which are included in
the package as a data frame lee08
, cghs
,
rcp
and headst
. The dataset from Lalive (2008) used in Kolesár and Rothe (2018) is also included in the
package as data frame rebp
.
In a fuzzy RD design, the treatment status Di of a unit does not necessarily equate the treatment assignment Zi = I{xi ≥ c0}. Instead, the treatment assignment induces a jump in the treatment probability at the cutoff. Correspondingly, we augment the outcome model with a first stage that measures the effect of the running variable on the treatment: where fD, fY are the conditional mean functions.
To account for imperfect compliance the fuzzy RD parameter scales the jump in the outcome equation τY by the jump in the treatment probability at the cutoff, τD = limx ↓ c0fD(x) − limx ↑ c0fD(x). This fuzzy RD parameter, θ = τY/τD, measures the local average treatment effect for individuals at the threshold who comply with the treatment assignment, provided mild continuity conditions and a monotonicity condition hold (Hahn, Todd, and Klaauw (2001)). Under perfect compliance, the treatment probability jumps all the way from zero to one at the threshold so that τD = 1, and the two parameters coincide.
For example, in the Battistin et al. (2009) dataset, the treatment variable is an indicator for retirement, and the running variable is the number of years since being eligible to retire. The cutoff is 0. Individuals exactly at the cutoff are dropped from the dataset. If there were individuals exactly at the cutoff, they are assumed to receive the treatment assignment (i.e. be eligible for retirement).
A natural estimator for the fuzzy RD parameter θ is the sample analog based on local linear regression,
Unlike in the sharp case, our bias-aware CIs do rely on the consistency of the estimator, which generally requires the bandwidth to shrink with the sample size. Since this estimator is a ratio of regression coefficients, it follows by the delta method that so long as τD ≠ 0, the estimator will be asymptotically normal in large samples. In fact, the estimator is equivalent to a weighted IV regression of Yi onto Di, using Zi as an instrument, and xi and its interaction with Zi as controls, so the variance formula is analogous to the IV variance formula: where $\cov(\tau_{D, h}, \tau_{Y, h})=\sum_i k_{i, h}^2\cov(Y_i, D_i\mid x_i)$ is the covariance of the estimators.
If the second derivative of fY is bounded by
MY and the
second derivative of fD is bounded by
MD, a
linearization argument from Section 3.2.3 in Armstrong and Kolesár (2020) that the bias can
be bounded in large samples by BM, h,
with $M=(M_{Y}+\abs{\theta}M_{D})/\abs{\tau_{D}}$,
which now depends on θ itself.
Therefore, optimal bandwidth calculations will require a preliminary
estimate of $\abs{\theta}$, which can
be passed to RDHonest
via the option T0
. Like
in the sharp case, the optimal bandwidth calculations assume
homoskedastic covariance of (uY, i, uD, i)
on either side of the cutoff, which are estimated based on a preliminary
local linear regression for both the outcome and first stage equation,
with bandwidth given by the Imbens and
Kalyanaraman (2012) bandwidth selector applied to the outcome
equation.
## Initial estimate of treatment effect for optimal
## bandwidth calculations
r <- RDHonest(log(cn) | retired ~ elig_year, data = rcp,
kern = "triangular", M = c(0.001, 0.002), opt.criterion = "MSE",
sclass = "H", T0 = 0)
## Use it to compute optimal bandwidth
RDHonest(log(cn) | retired ~ elig_year, data = rcp, kern = "triangular",
M = c(0.001, 0.002), opt.criterion = "MSE", sclass = "H",
T0 = r$coefficients$estimate)
#>
#> Call:
#> RDHonest(formula = log(cn) | retired ~ elig_year, data = rcp,
#> M = c(0.001, 0.002), kern = "triangular", opt.criterion = "MSE",
#> sclass = "H", T0 = r$coefficients$estimate)
#>
#> Inference for Fuzzy RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> retired -0.09062179 0.07162253 0.04374348 (-0.253552, 0.07230844)
#>
#> Onesided CIs: (-Inf, 0.07093026), (-0.2521738, Inf)
#> Number of effective observations: 7465.396
#> Maximal leverage for fuzzy RD parameter: 0.000817649
#> First stage estimate: 0.3479478
#> First stage smoothness constant M: 0.002
#> Reduced form smoothness constant M: 0.001
#> P-value: 0.286715
#>
#> Based on local regression with bandwidth: 9.551734, kernel: triangular
#> Regression coefficients:
#> log(cn) retired
#> I(elig_year>0) -0.031532 0.347948
#> I(elig_year>0):elig_year -0.005692 -0.009782
#> (Intercept) 9.760643 0.246088
#> elig_year -0.005370 0.032019
Like in the sharp RD case, without further restrictions, the curvature parameters MY and MD cannot be data-driven: to maintain honesty over the whole function class, a researcher must choose them a priori, rather than attempting to use a data-driven method. Therefore, one should, whenever possible, use problem-specific knowledge to decide what choices of MY and MD are reasonable a priori.
For cases in which this is difficult, the function
RDHonest
implements the rule of thumb Armstrong and Kolesár (2020) described earlier,
based on computing the global smoothness of both fY and fD using a
quartic polynomial. When the user doesn’t supply the curvature bounds,
the package uses the rule of thumb M̂ROT,
and prints a message to inform the user:
## Data-driven choice of M
RDHonest(log(cn) | retired ~ elig_year, data = rcp, kern = "triangular",
opt.criterion = "MSE", sclass = "H", T0 = r$coefficients$estimate)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
#>
#> Call:
#> RDHonest(formula = log(cn) | retired ~ elig_year, data = rcp,
#> kern = "triangular", opt.criterion = "MSE", sclass = "H",
#> T0 = r$coefficients$estimate)
#>
#> Inference for Fuzzy RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> retired -0.1762098 0.1042236 0.08473369 (-0.4329103, 0.08049077)
#>
#> Onesided CIs: (-Inf, 0.07995648), (-0.4323761, Inf)
#> Number of effective observations: 4543.727
#> Maximal leverage for fuzzy RD parameter: 0.001114888
#> First stage estimate: 0.3185837
#> First stage smoothness constant M: 0.008178929
#> Reduced form smoothness constant M: 0.002849525
#> P-value: 0.1962011
#>
#> Based on local regression with bandwidth: 6.127523, kernel: triangular
#> Regression coefficients:
#> log(cn) retired
#> I(elig_year>0) -0.056138 0.318584
#> I(elig_year>0):elig_year -0.009435 -0.019943
#> (Intercept) 9.777239 0.273343
#> elig_year 0.001841 0.042820
See Armstrong and Kolesár (2020) for a discussion of the restrictions on the parameter space under which this method yields honest inference.
RD datasets often contain information on a vector of K pre-treatment covariates Wi, such as pre-intervention outcomes, demographic, or socioeconomic characteristics of the units. Similar to randomized controlled trials, while the presence of covariates doesn’t help to weaken the fundamental identifying assumptions, augmenting the RD estimator with predetermined covariates can increase its precision.
Let us first describe covariate adjustment in the sharp RD case. We implement the covariate adjustment studied in Calonico et al. (2019), namely to include Wi as one of the regressors in the WLS regression, regressing Yi onto m(xi) and Wi. As in the case with no covariates, we weight each observation with the kernel weight K(xi/h). This leads to the estimator where m̃(xi, Wi) = (m(xi)′, Wi′)′. Denote the coefficient on Wi in this regression by γ̃Y, h; this corresponds to the last K elements of β̃Y, h. As in the case without covariates, we first take the bandwidth h as given, and defer bandwidth selection choice to the end of this subsection.
To motivate the estimator under our framework, and to derive bias-aware CIs that incorporate covariates, we need to formalize the assumption that the covariates are predetermined (without any assumptions on the covariates, it is optimal to ignore the covariates and use the unadjusted estimator τ̂Y, h). Let fW(x) = E[Wi ∣ Xi = x] denote the regression function from regressing the covariates on the running variable, and let We assume that the variance and covariance functions are continuous, except possibly at zero. Let γY = (ΣWW(0+) + ΣWW(0−))−1(ΣWY(0+) + ΣWY(0−)) denote the coefficient on Wi when we regress Yi onto Wi for observations at the cutoff. Let Ỹi := Yi − Wi′γY denote the covariate-adjusted outcome. To formalize the assumption that the covariates are pre-determined, we assume that τW = limx ↓ 0fW(0) − limx ↑ 0fW(0) = 0, which implies that τY can be identified as the jump in the covariate-adjusted outcome Ỹi at 0. Following Appendix B.1 in Armstrong and Kolesár (2018), we also assume that the covariate-adjusted outcome varies smoothly with the running variable (except for a possible jump at the cutoff), in that the second derivative of is bounded by a known constant M̃. In addition, we assume fW has bounded second derivatives.
Under these assumptions, if γY was known and hence Ỹi was directly observable, we could estimate τ as in the case without covariates, replacing M with M̃ and Yi with Ỹi. Furthermore, as discussed in Armstrong and Kolesár (2018), such approach would be optimal under homoskedasticity assumptions. Although γY is unknown, it turns out that the estimator τ̃Y, h has the same large sample behavior as the infeasible estimator τ̂Ỹ, h. To show this, note that by standard regression algebra, τ̃Y, h can equivalently be written as The first equality says that covariate-adjusted estimate is the same as an unadjusted estimate that replaces the original outcome Yi with the covariate-adjusted outcome Yi − Wi′γ̃Y, h. The second equality uses the decomposition Yi − Wi′γ̃Y, h = Ỹi − Wi′(γ̃Y, h − γY) to write the estimator as a sum of the infeasible estimator and a linear combination of “placebo RD estimators” τ̂Wk, h, that replace Yi in the outcome equation with the kth element of Wi, Since fW has bounded second derivatives, these placebo estimators converge to zero, with rate that is at least as fast as the rate of convergence of the infeasible estimator τ̂Ỹ, h: $\hat{\tau}_{W_{k}, h}=O_{p}(B_{\tilde{M}, h}+\sd(\hat{\tau}_{\tilde{Y}, h}))$. Furthermore, under regularity conditions, γ̃Y, h converges to γY, so that the second term in the previous display is asymptotically negligible relative to the first. Consequently, we can form bias-aware CIs based on τ̃Y, h as in the case without covariates, treating the covariate-adjusted outcome Yi − Wi′γ̃Y as the outcome, where σỸ2(xi) = σY2(xi) + γY′ΣWW(xi)γY − 2γY′ΣWY(xi). If the covariates are effective at explaining variation in the outcomes, then the quantity ∑iki, h2(γY′ΣWW(xi)γY − 2γY′ΣWY(xi)) will be negative, and $\sd(\hat{\tau}_{\tilde{Y}, h})\leq \sd(\hat{\tau}_{Y, h})$. If the smoothness of the covariate-adjusted conditional mean function fY − fW′γY is greater than the smoothness of the unadjusted conditional mean function fY, so that M̃ ≤ M, then using the covariates will help tighten the confidence intervals.
Implementation of covariate-adjustment requires a choice of M̃, and computing the optimal bandwidth requires a preliminary estimate of the variance of the covariate-adjusted outcome. In our implementation, we first estimate the model without covariates (using a rule of thumb to calibrate M, the bound on the second derivative of fY), and compute the bandwidth ȟ that’s MSE optimal without covariates. Based on this bandwidth, we compute a preliminary estimate γ̃Y, ȟ of γY, and use this preliminary estimate to compute a preliminary covariate-adjusted outcome Yi − Wi′γ̃Y, ȟ. If M̃ is not supplied, we calibrate M̃ using the rule of thumb, using this preliminary covariate-adjusted outcome as the outcome. Similarly, we use this preliminary covariate-adjusted outcome as the outcome to compute a preliminary estimator of the conditional variance σỸ2(xi), for optimal bandwidth calculations, as in the case without covariates. With this choice of bandwidth h, in the second step, we estimate τY using the estimator τ̃Y, h defined above.
A demonstration using the headst
data:
## No covariates
rn <- RDHonest(mortHS ~ povrate, data = headst)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## Use Percent attending school aged 14-17, urban,
## black, and their interaction as covariates.
rc <- RDHonest(mortHS ~ povrate | urban * black + sch1417,
data = headst)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
rn
#>
#> Call:
#> RDHonest(formula = mortHS ~ povrate, data = headst)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(povrate>0) -3.15129 1.272338 0.7014739 (-5.980412, -0.3221684)
#>
#> Onesided CIs: (-Inf, -0.357007), (-5.945573, Inf)
#> Number of effective observations: 256.5127
#> Maximal leverage for sharp RD parameter: 0.0279751
#> Smoothness constant M: 0.2993999
#> P-value: 0.02831734
#>
#> Based on local regression with bandwidth: 4.880646, kernel: triangular
#> Regression coefficients:
#> I(povrate>0) I(povrate>0):povrate (Intercept)
#> -3.1513 0.4805 3.7589
#> povrate
#> 0.2800
#> 24 observations with missing values dropped
rc
#>
#> Call:
#> RDHonest(formula = mortHS ~ povrate | urban * black + sch1417,
#> data = headst)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(povrate>0) -2.678646 1.163838 0.6105652 (-5.240691, -0.1166008)
#>
#> Onesided CIs: (-Inf, -0.1537367), (-5.203555, Inf)
#> Number of effective observations: 303.9467
#> Maximal leverage for sharp RD parameter: 0.02287772
#> Smoothness constant M: 0.185326
#> P-value: 0.04014376
#>
#> Based on local regression with bandwidth: 5.891514, kernel: triangular
#> Regression coefficients:
#> I(povrate>0) I(povrate>0):povrate (Intercept)
#> -2.6786458 0.3592431 15.2995726
#> povrate urban black
#> 0.1117840 0.0111355 0.0401396
#> sch1417 urban:black
#> -0.1553701 -0.0005111
#> 29 observations with missing values dropped
We see that the inclusion of covariates leads to a reduction in the rule-of-thumb curvature and also smaller standard errors (this would be true even if the bandwidth was kept fixed). Correspondingly, the CIs are tighter by about 9 percentage points:
ci_len <- c(rc$coefficients$conf.high - rc$coefficients$conf.low,
rn$coefficients$conf.high - rn$coefficients$conf.low)
100 * (1 - ci_len[1]/ci_len[2])
#> [1] 9.440274
In the fuzzy RD case, we need to covariate-adjust the treatment Di as well as the outcome. The implementation mirrors the sharp case. Define γD analogously to γY, and assume that the second derivative of fY(x) − fW(x)′γY is bounded by a known constant M̃Y, and that fD(x) − fW(x)′γD is bounded by a known constant M̃D. The covariate-adjusted estimator is given by θ̃h = τ̃Y, h/τ̃D, h, with variances and worst-case bias computed as in the case without covariates, replacing the treatment and outcome with their covariate-adjusted versions.
A demonstration using the rcp
data, where we add
education controls:
RDHonest(log(cn) | retired ~ elig_year | education, data = rcp,
T0 = r$coefficients$estimate)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
#>
#> Call:
#> RDHonest(formula = log(cn) | retired ~ elig_year | education,
#> data = rcp, T0 = r$coefficients$estimate)
#>
#> Inference for Fuzzy RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> retired -0.26109 0.113415 0.1029379 (-0.5508736, 0.02869355)
#>
#> Onesided CIs: (-Inf, 0.02839887), (-0.5505789, Inf)
#> Number of effective observations: 3259.425
#> Maximal leverage for fuzzy RD parameter: 0.001393073
#> First stage estimate: 0.316514
#> First stage smoothness constant M: 0.007191008
#> Reduced form smoothness constant M: 0.005370042
#> P-value: 0.08225439
#>
#> Based on local regression with bandwidth: 5.069605, kernel: triangular
#> Regression coefficients:
#> log(cn) retired
#> I(elig_year>0) -0.082639 0.316514
#> I(elig_year>0):elig_year -0.031593 -0.008402
#> (Intercept) 9.378002 0.515120
#> elig_year 0.025771 0.035794
#> educationelementary school 0.275151 -0.201655
#> educationlower secondary 0.431725 -0.287265
#> educationvocational studies 0.580132 -0.360524
#> educationupper secondary 0.741646 -0.294614
#> educationcollege or higher 0.923659 -0.424387
Relative to the previous estimate without covariates, the point estimate is now much larger. This is in part due to slightly smaller bandwidth used, and the regression function for the reduced form appears noisy below the cutoff, potentially due to measurement error: see Figure 3. As a result, the estimates are quite sensitive to the bandwidth used. The noise is also responsible for the rather large data-driven estimates of the curvature parameters.
## see Figure 3
f3 <- RDScatter(log(cn) ~ elig_year, data = rcp, cutoff = 0,
avg = Inf, xlab = "Years to eligibility", ylab = "Log consumption of non-durables",
propdotsize = TRUE, subset = abs(elig_year) < 15)
## Adjust size of dots if they are too big
f3 + ggplot2::scale_size_area(max_size = 5)
Battistin et al (2009) data
In some cases, data is only observed as cell averages. For instance,
suppose that instead of observing the original cghs
data,
we only observe averages for cells as follows:
dd <- data.frame()
## Collapse data by running variable
for (j in unique(cghs$yearat14)) {
ix <- cghs$yearat14 == j
df <- data.frame(y = mean(log(cghs$earnings[ix])), x = j,
weights = sum(ix), sigma2 = var(log(cghs$earnings[ix]))/sum(ix))
dd <- rbind(dd, df)
}
The column weights
gives the number of observations that
each cell averages over. In this case, if we weight the observations
using weights
, we can recover the original estimates (and
the same worst-case bias). If we use the estimates of the conditional
variance of the outcome, dd$sigma2
, then we can also
replicate the standard error calculations:
s0 <- RDHonest(log(earnings) ~ yearat14, cutoff = 1947,
data = cghs)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## keep same bandwidth
s1 <- RDHonest(y ~ x, cutoff = 1947, data = dd, weights = weights,
sigmaY2 = sigma2, se.method = "supplied.var", h = s0$coefficients$bandwidth)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## Results are identical:
s0
#>
#> Call:
#> RDHonest(formula = log(earnings) ~ yearat14, data = cghs, cutoff = 1947)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(yearat14>0) 0.07047433 0.05307276 0.03796328 (-0.05531319, 0.1962619)
#>
#> Onesided CIs: (-Inf, 0.1957345), (-0.05478587, Inf)
#> Number of effective observations: 9074.452
#> Maximal leverage for sharp RD parameter: 0.0005055239
#> Smoothness constant M: 0.02296488
#> P-value: 0.2905956
#>
#> Based on local regression with bandwidth: 3.442187, kernel: triangular
#> Regression coefficients:
#> I(yearat14>0) I(yearat14>0):yearat14 (Intercept)
#> 0.070474 0.009301 8.732659
#> yearat14
#> 0.010848
s1
#>
#> Call:
#> RDHonest(formula = y ~ x, data = dd, weights = weights, cutoff = 1947,
#> h = s0$coefficients$bandwidth, se.method = "supplied.var",
#> sigmaY2 = sigma2)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(x>0) 0.07047433 0.05307276 0.03796328 (-0.05531319, 0.1962619)
#>
#> Onesided CIs: (-Inf, 0.1957345), (-0.05478587, Inf)
#> Number of effective observations: 9074.452
#> Maximal leverage for sharp RD parameter: 0.0005055239
#> Smoothness constant M: 0.02296488
#> P-value: 0.2905956
#>
#> Based on local regression with bandwidth: 3.442187, kernel: triangular
#> Regression coefficients:
#> I(x>0) I(x>0):x (Intercept) x
#> 0.070474 0.009301 8.732659 0.010848
Without supplying the variance estimates and specifying
se.method="supplied.var"
, the variance estimates will not
match, since the collapsed data is not generally not sufficient to learn
about the true variability of the collapsed outcomes.
The same method works in fuzzy designs, but one has to also save the conditional variance of the treatment and its covariance with the outcome:
r0 <- RDHonest(log(cn) | retired ~ elig_year, data = rcp,
h = 7)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
dd <- data.frame(x = sort(unique(rcp$elig_year)), y = NA,
d = NA, weights = NA, sig11 = NA, sig12 = NA, sig21 = NA,
sig22 = NA)
for (j in seq_len(NROW(dd))) {
ix <- rcp$elig_year == dd$x[j]
Y <- cbind(log(rcp$cn[ix]), rcp$retired[ix])
dd[j, -1] <- c(colMeans(Y), sum(ix), as.vector(var(Y))/sum(ix))
}
r1 <- RDHonest(y | d ~ x, data = dd, weights = weights,
sigmaY2 = sig11, T0 = 0, sigmaYD = sig21, sigmaD2 = sig22,
h = 7, se.method = "supplied.var")
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
## Outputs match up to numerical precision
max(abs(r0$coefficients[2:11] - r1$coefficients[2:11]))
#> [1] 3.183231e-11
In some applications, the data are collected by clustered sampling.
In such cases, the user can specify a vector clusterid
signifying cluster membership. In this case, preliminary bandwidth
calculations assume that the regression errors have a Moulton-type
structure, with homoskedasticity on either side of the cutoff: where
g(i) ∈ {1, …, G}
denotes cluster membership. Since it appears difficult to generalize the
nearest neighbor variance estimator to clustering, we use
regression-based cluster-robust variance formulas to compute estimator
variances, so that option se.method="EHW"
is required.
## make fake clusters
set.seed(42)
clusterid <- sample(1:50, NROW(lee08), replace = TRUE)
sc <- RDHonest(voteshare ~ margin, data = lee08, se.method = "EHW",
clusterid = clusterid, M = 0.14, h = 7)
## Since clusters are unrelated to outcomes, not
## clustering should yield similar standard errors
sn <- RDHonest(voteshare ~ margin, data = lee08, se.method = "EHW",
M = 0.14, h = 7)
sc
#>
#> Call:
#> RDHonest(formula = voteshare ~ margin, data = lee08, M = 0.14,
#> h = 7, se.method = "EHW", clusterid = clusterid)
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(margin>0) 5.821591 1.528091 0.7131699 (2.527825, 9.115358)
#>
#> Onesided CIs: (-Inf, 9.048247), (2.594936, Inf)
#> Number of effective observations: 695.1204
#> Maximal leverage for sharp RD parameter: 0.01063752
#> Smoothness constant M: 0.14
#> P-value: 0.0004238717
#>
#> Based on local regression with bandwidth: 7, kernel: triangular
#> Regression coefficients:
#> I(margin>0) I(margin>0):margin (Intercept) margin
#> 5.82159 0.07179 46.38139 0.65536
sn
#>
#> Call:
#> RDHonest(formula = voteshare ~ margin, data = lee08, M = 0.14,
#> h = 7, se.method = "EHW")
#>
#> Inference for Sharp RD parameter (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> I(margin>0) 5.821591 1.417699 0.7131699 (2.725463, 8.91772)
#>
#> Onesided CIs: (-Inf, 8.866669), (2.776514, Inf)
#> Number of effective observations: 695.1204
#> Maximal leverage for sharp RD parameter: 0.01063752
#> Smoothness constant M: 0.14
#> P-value: 0.000159109
#>
#> Based on local regression with bandwidth: 7, kernel: triangular
#> Regression coefficients:
#> I(margin>0) I(margin>0):margin (Intercept) margin
#> 5.82159 0.07179 46.38139 0.65536
The package also implements lower-bound estimates for the smoothness constant M for the Taylor and Hölder smoothness class, as described in the supplements to Kolesár and Rothe (2018) and Armstrong and Kolesár (2018)
r1 <- RDHonest(voteshare ~ margin, data = lee08, M = 0.1,
se.method = "nn")
### Only use three point-average for averages of a 100
### points closest to cutoff, and report results
### separately for points above and below cutoff
RDSmoothnessBound(r1, s = 100, separate = TRUE, multiple = FALSE,
sclass = "T")
#> estimate conf.low
#> Below cutoff 0.3337537 0.1224842
#> Above cutoff 0.1738905 0.0000000
## Pool estimates based on observations below and
## above cutoff, and use three-point averages over the
## entire support of the running variable
RDSmoothnessBound(r1, s = 100, separate = FALSE, multiple = TRUE,
sclass = "H")
#> estimate conf.low
#> Pooled 0.1959213 0.01728526
For the second-order Taylor smoothness class, the function
RDHonest
, with kernel="optimal"
, computes
finite-sample optimal estimators and confidence intervals, as described
in Section 2.2 in Armstrong and Kolesár
(2018). This typically yields tighter CIs:
r1 <- RDHonest(voteshare ~ margin, data = lee08, kern = "optimal",
M = 0.1, opt.criterion = "FLCI", se.method = "nn")$coefficients
r2 <- RDHonest(voteshare ~ margin, data = lee08, kern = "triangular",
M = 0.1, opt.criterion = "FLCI", se.method = "nn", sclass = "T")$coefficients
r1$conf.high - r1$conf.low
#> [1] 6.33639
r2$conf.high - r2$conf.low
#> [1] 6.685286
The package can also perform inference at a point, and optimal bandwidth selection for inference at a point. Suppose, for example, one was interested in the vote share for candidates with margin of victory equal to 20 points:
## Specify we're interested in inference at x0=20, and
## drop observations below cutoff
RDHonest(voteshare ~ margin, data = lee08, subset = margin >
0, cutoff = 20, kern = "uniform", opt.criterion = "MSE",
sclass = "H", point.inference = TRUE)
#> Using Armstrong & Kolesar (2020) ROT for smoothness constant M
#>
#> Call:
#> RDHonest(formula = voteshare ~ margin, data = lee08, subset = margin >
#> 0, cutoff = 20, kern = "uniform", opt.criterion = "MSE",
#> sclass = "H", point.inference = TRUE)
#>
#> Inference for Value of conditional mean (using Holder class), confidence level 95%:
#> Estimate Std. Error Maximum Bias Confidence Interval
#> (Intercept) 61.66394 0.468336 0.2482525 (60.63086, 62.69703)
#>
#> Onesided CIs: (-Inf, 62.68254), (60.64535, Inf)
#> Number of effective observations: 738
#> Maximal leverage for value of conditional mean: 0.001435988
#> Smoothness constant M: 0.0275703
#> P-value: 0
#>
#> Based on local regression with bandwidth: 7.311286, kernel: uniform
#> Regression coefficients:
#> (Intercept) margin
#> 61.6639 0.4076
To compute the optimal bandwidth, the package assumes homoskedastic variance on either side of the cutoff, which it estimates based on a preliminary local linear regression using the Fan and Gijbels (1996) rule of thumb bandwidth selector. This homoskedasticity assumption is dropped when the final standard errors are computed.
The estimators in this package are just weighted regression
estimators, or ratios of regression estimators in the fuzzy RD case.
Regression estimators are linear in outcomes, taking the form ∑iki, hYi,
where ki, h
are estimation weights, returned by data$est_w
part of the
RDHonest
output (see expression for τ̂Y, h
above).
For the sampling distribution of the estimator to be
well-approximated by a normal distribution, it is important that these
regression weights not be too large: asymptotic normality requires Lmax = maxjkj, h2/∑iki, h2 → 0.
If uniform kernel is used, the weights ki, h
are just the diagonal elements of the partial projection matrix. We
therefore refer to Lmax as maximal (partial)
leverage, and it is reported in the RDHonest
output. The
package issues a warning if the maximal leverage exceeds 0.1—in such cases using a bigger bandwidth is
advised.
In the fuzzy RD case, by Theorem B.2 in the appendix to Armstrong and Kolesár (2020), the estimator is asymptotically equivalent to ∑iki, h(Yi − Diθ)/τD, where ki, h are the weights for τ̂Y, h. The maximal leverage calculations are thus analogous to the sharp case.
With local regression methods, it is clear that observations outside the estimation window don’t contribute to estimation, reducing the effective sample size. If the uniform kernel is used, the package therefore reports the number of observations inside the estimation window as the “number of effective observations”.
To make this number comparable across different kernels, observe that, under homoskedasticity, the variance of a linear estimator ∑ikiYi is σ2∑iki2. We expect this to scale in inverse proportion to the sample size: with twice as many observations and the same bandwidth, we expect the variance to halve. Therefore, if the variance ratio relative to a uniform kernel estimator with weights ∑ikuniform, iYi is σ2∑ikuniform, i2/σ2∑iki2 = ∑ikuniform, i2/∑iki2, the precision of this estimator is the same as if we used a uniform kernel, but with ∑ikuniform, i2/∑iki2 as many observations. Correspondingly, we define the number of effective observations for other kernels as the number of observations inside the estimation window times ∑ikuniform, i2/∑iki2. With this definition, using a triangular kernel typically yields effective samples sizes equal to about 80% of the number of observations inside the estimation window.
Finally, to assess which observations are important for pinning down the estimate, it can be useful to explicitly plot the estimation weights.