Getting Started with the testinterference Package
Tadao Hoshino (thoshino@waseda.jp), Takahide Yanagi (yanagi@econ.kyoto-u.ac.jp)
testinterference.Rmd
Introduction
The testinterference package provides tools to test SUTVA and several hypotheses about spillover effects. Specifically, the package enables to perform randomization tests on whether or not the following null hypotheses are plausible:
- Fisher: One’s outcome does not depend on one’s own treatment (i.e., Fisher’s sharp null hypothesis).
- SUTVA: One’s outcome is determined by one’s own treatment (i.e., the stable unit treatment value assumption).
- Exposure 1: One’s outcome is determined by whether there is at least one treated unit in one’s neighborhood including oneself.
- Exposure 2: One’s outcome is determined by one’s own treatment and whether there is at least one treated peer.
The testing procedures are developed by Hoshino and Yanagi (2023) “Randomization test for the specification of interference structure”.
Installation
Get the package from GitHub:
# install.packages("devtools") # if needed
devtools::install_github("tkhdyanagi/testinterference", build_vignettes = TRUE)
Package Function
The testinterference package provides the following function:
-
testinterference()
: Randomization tests for the following null hypotheses: (1) Fisher, (2) SUTVA, (3) Exposure 1, (4) Exposure 2.
Arguments
The testinterference()
function has the following
arguments:
-
Y
: The \(n\)-dimensional outcome vector -
Z
: The \(n\)-dimensional binary treatment assignment vector -
A
: The \(n \times n\), binary, possibly asymmetric, adjacency matrix. The diagonal elements must be zero. This argument can be NULL ifhypothesis = Fisher
. Default is NULL. -
hypothesis
: A character specifying the null hypothesis of interest. Options are “Fisher”, “SUTVA”, “exposure1”, and “exposure2”. “Fisher” stands for the sharp null hypothesis in the canonical Fisher randomization test. “SUTVA”, “exposure1”, and “exposure2” correspond to the null hypotheses a, b, and c in Table 2 of Hoshino and Yanagi (2023). Default is “SUTVA”. Ifhypothesis = "exposure2"
, the user can specify the argumentkappa
. -
method
: A character specifying how to find focal units. Options are “3-net”, “2-net”, “random”, and “manual”, which stand for the 3-net (MIS-G) method, the 2-net (MIS-A) method, the random selection, and the manual selection. Default is “3-net”. Ifmethod = "random"
, the user can specify the argumentnum_focal_unit
. Ifmethod = "manual"
, the user must specify the argumentfocal_unit
. -
design
: A character specifying the randomization design of the original experiment. Options are “Bernoulli”, “complete”, “stratified”, and “other”, which stand for Bernoulli randomization, complete randomization, stratified randomization, and other experimental designs. Ifdesign = "Bernoulli"
, the user must specify the argumentprob
. Ifdesign = "stratified"
,the user must specify the argumentstrata
. Ifdesign = "other"
, the user must specify the argumentzmatrix
. -
prob
: NULL, a scalar indicating the homogeneous probability of being treatment eligible, or an \(n\)-dimensional vector indicating the heterogeneous probabilities for treatment eligibility (i.e., the \(i\)-th element ofprob
indicates \(\Pr(Z_i = 1)\)). This argument is used only whendesign = "Bernoulli"
. Default is NULL. -
focal_unit
: NULL or an \(n\)-dimensional logical vector specifying whether each unit is focal. This argument is used only whenmethod = "manual"
. Default is NULL. -
num_focal_unit
: NULL or a positive scalar specifying the number of focal units. This argument is used only whenmethod = "random"
. Default is NULL. -
num_randomization
: A large positive integer specifying the number of randomization. Default is 9999. This argument is not used whendesign = "other"
. -
strata
: NULL or an \(n\)-dimensional numerical vector indicating the stratum to which each unit belongs. This argument is used only whendesign = "stratified"
. Default is NULL. -
zmatrix
: NULL or a large matrix of realizations of treatment assignments. This argument must be specified by the user only whendesign = "other"
. The number of rows must equal to the sample size \(n\). The number of columns is the number of realizations given by the user. Default is NULL. -
kappa
: NULL or a positive integer no less than 2. This argument is used only whenhypothesis = "exposure2"
. Default is NULL. Ifkappa = NULL
, kappa is automatically chosen to maximize the number of focal units. -
cores
: A positive integer specifying the number of cores to use for parallel computing. Default is 1.
Returns
The testinterference()
function returns a list
containing the following elements:
-
pval
: The vector of p-values obtained from Kruskal-Wallis (KW), average cross difference (ACD), and ordinary least squares (OLS) test statistics. -
Simes
: The vector of testing results by Simes’ correction under significance levels 10%, 5%, and 1%. TRUE (resp. FALSE) indicates the rejection (resp. acceptance) of the null hypothesis. -
stat
: A matrix of test statistics computed with focal assignments. -
focal_unit
: The logical vector indicating whether each unit is focal. -
focal_asgmt
: The matrix of focal assignments. -
num_focal_unit
: The number of focal units. -
num_focal_asgmt
: The number of focal assignments.
Example 1: A Single Large Network
We begin by generating artificial data using the
datageneration()
function. In the
datageneration()
function, there are two options to
generate the adjacency matrix: (1) Erdos-Renyi model and (2) pairs. In
addition, there are three options for the experimental design: (a)
Bernoulli randomization, (b) complete randomization, and (c) stratified
randomization. Here we consider the case (1)-(a) with:
# Load the package
library(testinterference)
# Sample size
n <- 200
# Generate artificial data (Erdos-Renyi model)
set.seed(1)
data1 <- datageneration(n = n,
design = "Bernoulli",
A = "Erdos-Renyi",
beta_s = 1)
Here, beta_s
specifies the value of a coefficient for
spillover effects in the outcome equation. beta_s = 0
means
that there are no spillovers. In this example, we consider the case
where SUTVA does not hold.
Run the testinterference()
function to test SUTVA:
set.seed(1)
RT1 <- testinterference(Y = data1$Y,
Z = data1$Z,
A = data1$A,
hypothesis = "SUTVA",
method = "3-net",
design = "Bernoulli",
prob = 0.5,
focal_unit = NULL,
num_focal_unit = NULL,
num_randomization = 999,
strata = NULL,
zmatrix = NULL,
kappa = NULL,
cores = 1)
Here, we set num_randomization = 999
to reduce the
computation time, but in realistic situations the number of
randomization should be larger (e.g.,
num_randomization = 99999
).
The p-values obtained from Kruskal-Wallis (KW), average cross difference (ACD), and ordinary least squares (OLS) test statistics are:
RT1$pval
#> KW ACD OLS
#> 0.012 0.019 0.012
The null hypothesis is rejected at the standard significance level. In other words, we find statistical evidence that SUTVA is implausible.
The results of Simes’ correction are:
RT1$Simes
#> 10% 5% 1%
#> TRUE TRUE FALSE
Here, TRUE
indicates the rejection of the null
hypothesis. Again, the results suggest that SUTVA does not hold.
We can confirm the numbers of focal units and assignments with:
RT1$num_focal_unit
#> [1] 36
RT1$num_focal_asgmt
#> [1] 1000
Next, we turn to test whether exposure 2 is correct.
set.seed(1)
RT2 <- testinterference(Y = data1$Y,
Z = data1$Z,
A = data1$A,
hypothesis = "exposure2",
method = "3-net",
design = "Bernoulli",
prob = 0.5,
focal_unit = NULL,
num_focal_unit = NULL,
num_randomization = 999,
strata = NULL,
zmatrix = NULL,
kappa = NULL,
cores = 1)
The testing results are:
RT2$pval
#> KW ACD OLS
#> 0.001 0.001 0.005
RT2$Simes
#> 10% 5% 1%
#> TRUE TRUE TRUE
RT2$num_focal_unit
#> [1] 34
RT2$num_focal_asgmt
#> [1] 1000
We find statistical evidence that exposure 2 is incorrect.
Example 2: Pairs
We turn to the case with possible pairwise interactions and complete
randomization. To consider the case without interference, we specify
beta_s = 0
.
# Generate artificial data (Erdos-Renyi model)
set.seed(1)
data2 <- datageneration(n = n,
design = "complete",
A = "pairs",
beta_s = 0)
Perform the randomization test:
set.seed(1)
RT3 <- testinterference(Y = data2$Y,
Z = data2$Z,
A = data2$A,
hypothesis = "SUTVA",
method = "3-net",
design = "complete",
prob = NULL,
focal_unit = NULL,
num_focal_unit = NULL,
num_randomization = 999,
strata = NULL,
zmatrix = NULL,
kappa = NULL,
cores = 1)
The testing results are:
RT3$pval
#> KW ACD OLS
#> 0.206 0.238 0.500
RT3$Simes
#> 10% 5% 1%
#> FALSE FALSE FALSE
RT3$num_focal_unit
#> [1] 100
RT3$num_focal_asgmt
#> [1] 1000
The null hypothesis is not rejected at the standard significance
level. For the results of Simes’ correction, FALSE
indicates the acceptance of the null hypothesis. Thus, the testing
results suggest that SUTVA might hold.
In the case with pairwise interactions, it is impossible to perform
randomization tests for hypothesis = "exposure2"
. This is
because each unit has only one peer so that \(\mathcal{N}(\kappa)\) is empty for any
\(\kappa \ge 2\). See the vignette
vignette("exposure2", package = "testinterference")
for
more details.
References
- Hoshino, T. and Yanagi, T., 2023. Randomization test for the specification of interference structure arXiv preprint arXiv:2301.05580. Link