In this article, we discuss the R implementation of the testing procedure for the test of conditional independence of $$X, Y$$ given $$Z$$. The codes provided in this page can only be used when $$X$$ and $$Y$$ are real valued and $$Z \in \mathbb{R}^d$$ for $$d\geq 1$$.

In the following, we assume $$d=2$$ and generate $$n=100$$ i.i.d. copies of $$(X,Y,Z)$$ from the simulation scenario used in Section 4 of Patra, Sen, and Székely (2015). We assume that $$Z\sim N(0, \sigma_z^2 \textbf{I}_{d\times d})$$, $$X=W+Z_1+\epsilon$$, and $$X=W+Z_1+\epsilon'$$, where $$Z_1$$ is the first coordinate of $$Z$$ and $$\epsilon, \epsilon^\prime,$$ and $$W$$ are three independent mean zero Gaussian random variables. Moreover, we assume that $$\epsilon, \epsilon^\prime,$$ and $$W$$ are independent of $$Z,$$ and $$var(\epsilon)=var(\epsilon^\prime)=\sigma^2_E,$$ and $$var(W)= \sigma^2_W$$. Note that $$X$$ and $$Y$$ are conditionally independent of $$Z$$ only when $$\sigma_W=0$$. In the following, we fixed $$\sigma_W$$ to be $$0.1$$.

n <-100
d <- 2
sigma.Z <- 0.3
sigma.W <- 0.1
sigma.E <- 0.2
Z <- matrix(rnorm(n*d,0,sigma.Z),nr=n, nc=d)
W <- rnorm(n,0,sigma.W)
eps <- rnorm(n,0,sigma.E)
eps.prime <- rnorm(n,0,sigma.E)
X <- W + Z[,1] +eps
Y <- W + Z[,1] +eps.prime
Data <- cbind(X,Y,Z)
colnames(Data) <- c('X', 'Y', paste('Z.', 1:d,sep=''))
head(Data)
##               X           Y        Z.1          Z.2
## [1,] -0.6209561 -0.35011790 -0.5453813 -0.008216948
## [2,] -0.5213963 -0.43784900 -0.3721414  0.085239240
## [3,]  0.1374818  0.08060723  0.2268859 -0.131199999
## [4,] -0.5639182 -0.27728940 -0.3826270 -0.123249106
## [5,] -0.8627586 -0.51166254 -0.3830763  0.281173962
## [6,] -0.1749498 -0.37170336 -0.2509368 -0.111071919

The following block of code computes the test statistic $$\hat{\mathcal{E}}_n$$; see (3.7) of Patra, Sen, and Székely (2015).

source('Npres_Fucntions.R')
test.stat <- npresid.statistics(Data,d)

Recall we reject the null hypothesis of conditional independence for large values of $$\hat{\mathcal{E}}_n$$. However, limiting behavior of $$\hat{\mathcal{E}}_n$$ is unknown. In Section 3.2.1 of Patra, Sen, and Székely (2015) we proposed a model based bootstrap procedure to approximate the asymptotic distribution and evaluate the $$p$$-value for the test. In the following boot.replic" denotes the number of bootstrap replications. We recommend using a bootstrap replication of size $$1000.$$

out <- npresid.boot(Data,d,boot.replic=50)
## [1] "Starting bootstrap"
## [1] "50 bootstrap samples obtained"
## [1] "At bootstrap iteration 25 of 50"
## [1] "At bootstrap iteration 50 of 50"
str(out, max.level = 1)
## List of 7
##  $statistic : num 2.13 ##$ p.value              : num 0.88
##  $method : chr "Cond Indep test: p-values by inverting F_hat to get bootstrap samples" ##$ bandwidth.method     : chr "least-squares cross-validation, see \"np\" package"
##  $data.desrip : chr "dimension of Z is 2, sample size 100, dimension of (X,Y,Z) is 4, bootstrap replication number 50" ##$ bootstrap.stat.values: num [1:50] 2.03 7.15 4.13 6.62 2.18 ...
##  $cond.dist.obj :List of 6 Here cond.dist.obj’’ is the the list containing estimators of $$F_{X|Z}$$, $$F_{Y|Z}$$, and $$F_Z$$ evaluated at the data points (denoted by F.x_z, F.y_z, and F.z_hat) and the bandwidth used (denoted by Fbw.x_z, Fbw.y_z, and Fbw.z_z) to evaluate the conditional distribution functions. Note that we use the functions available in the “np’’ package ( see Hayfield and Racine (2008)) to compute the optimal bandwidths as well as the estimates of the conditional distribution functions. str(out$cond.dist.obj, max.level = 1)
## List of 6
##  $F.x_z : num [1:100] 0.2588 0.3201 0.3829 0.271 0.0433 ... ##$ F.y_z  : num [1:100] 0.739 0.362 0.444 0.683 0.242 ...
##  $Fbw.x_z:List of 64 ## ..- attr(*, "class")= chr "condbandwidth" ##$ Fbw.y_z:List of 64
##   ..- attr(*, "class")= chr "condbandwidth"
##  $Fbw.z_z:List of 2 ##$ F.z_hat: num [1:100, 1:2] 0.0589 0.1595 0.7888 0.1512 0.1508 ...

The estimated $$p$$-value of the test procedure is given through

p.value <- out\$p.value 

The required R file Npres_Fucntions.R’’ can be downloaded here. Note the functions require the following R-packages: boot, data.table, energy, np, and stats.

# References

Hayfield, Tristen, and Jeffrey S. Racine. 2008. “Nonparametric Econometrics: The Np Package.” Journal of Statistical Software 27 (5). http://www.jstatsoft.org/v27/i05/.

Patra, Rohit K., Bodhisattva Sen, and Gábor Székely. 2015. “A Consistent Bootstrap Procedure for the Maximum Score Estimator.” Statist. Probab. Lett. (to Appear). http://arxiv.org/abs/1409.3886.