We now consider how one can incorporate latent variables into our simulation, explicitly giving their simulated values.
In this example we use the survey package to get robust
standard errors when reweighting. We start by loading the libraries.
We begin by setting up the formulas, families and parameter values:
formulas = list(list(U ~ 1, L ~ A0), list(A0 ~ 1, A1 ~ A0*L), Y ~ A0*A1, ~A0*A1)
fam = list(c(4,3), c(5,5), c(3), c(1,1,1))
pars <- list(A0 = list(beta = 0),
U = list(beta = 0, phi=1),
L = list(beta = c(0.3,-0.2), phi=1),
A1 = list(beta = c(-0.3,0.4,0.3,0)),
Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
cop = list(beta = matrix(c(1,0,0,0,
1,0,0,0,
0.5,0,0,0), nrow=4)))Now simulate the data.
## Inversion method selected: using pair-copula parameterization
We can then check that the parameter values match their intended values:
summary(svyglm(L ~ A0, family=Gamma(link="log"),
design=svydesign(id=~1, weights=~1, data=dat)))$coef## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.289 0.0144 20.1 3.35e-88
## A0 -0.216 0.0202 -10.7 1.34e-26
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3539 0.0414 -8.538 1.36e-17
## A0 0.3811 0.0592 6.433 1.25e-10
## L 0.3233 0.0243 13.298 2.39e-40
## A0:L 0.0113 0.0398 0.285 7.76e-01
These look close to the desired values.
We can start by fitting some naïve regressions.
## wrong models
mod_w <- svyglm(Y ~ A0*A1, family=Gamma(link="log"),
design=svydesign(id=~1, weights=rep(1,nrow(dat)), data=dat))##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
| (Intercept) | -0.5 | -0.544 | 0.021 | 0.02 |
| A0 | 0.2 | 0.192 | 0.031 | 0.39 |
| A1 | 0.3 | 0.392 | 0.029 | 0.00 |
| A0:A1 | 0.0 | -0.004 | 0.041 | 0.46 |
We see that the p-values for both the intercept at the \(A_1\) coefficient are highly significant.
We can also try estimating the inverse probability of treatment weights and using them to obtain a consistent estimate of the treatment effects.
w <- predict(glmA1, type="response")
ps <- dat$A1/w + (1-dat$A1)/(1-w)
## correct model
mod_c <- svyglm(Y ~ A0*A1, family=Gamma(link="log"),
design=svydesign(id=~1, weights=ps, data=dat))| (Intercept) | -0.5 | -0.503 | 0.022 | 0.45 |
| A0 | 0.2 | 0.202 | 0.035 | 0.48 |
| A1 | 0.3 | 0.312 | 0.029 | 0.34 |
| A0:A1 | 0.0 | -0.007 | 0.044 | 0.44 |
Indeed, they are all within two standard errors of their nominal values.
We can also fit the data using maximum likelihood directly. Set
eval=TRUE to run this chunk.
Again, all estimates are within two standard errors of the true values.
We can transform \(U\) to a standard
normal (\(Z\)) and include it in the
model as well. (Again, set eval=TRUE to run this
chunk.)
dat <- dplyr::mutate(dat, Z = qnorm(U)) # transform to normal for fitting
out <- fitCausal(dat, formulas = list(Z ~ 1, L ~ A0, Y ~ A0*A1, ~A0), family = c(1,3,3,1))
outNote that again the results look correct including the ‘hidden’ variable \(U\)/\(Z\).