We now consider how one can incorporate latent variables into our simulation, explicitly giving their simulated values.
In this example we use the survey
package to get robust
standard errors when reweighting. We start by loading the libraries.
We begin by setting up the formulas, families and parameter values:
formulas = list(list(U ~ 1, L ~ A0), list(A0 ~ 1, A1 ~ A0*L), Y ~ A0*A1, ~A0*A1)
fam = list(c(4,3), c(5,5), c(3), c(1,1,1))
pars <- list(A0 = list(beta = 0),
U = list(beta = 0, phi=1),
L = list(beta = c(0.3,-0.2), phi=1),
A1 = list(beta = c(-0.3,0.4,0.3,0)),
Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
cop = list(beta = matrix(c(1,0,0,0,
1,0,0,0,
0.5,0,0,0), nrow=4)))
Now simulate the data.
## Inversion method selected: using pair-copula parameterization
We can then check that the parameter values match their intended values:
summary(svyglm(L ~ A0, family=Gamma(link="log"),
design=svydesign(id=~1, weights=~1, data=dat)))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.289 0.0144 20.1 3.35e-88
## A0 -0.216 0.0202 -10.7 1.34e-26
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3539 0.0414 -8.538 1.36e-17
## A0 0.3811 0.0592 6.433 1.25e-10
## L 0.3233 0.0243 13.298 2.39e-40
## A0:L 0.0113 0.0398 0.285 7.76e-01
These look close to the desired values.
We can start by fitting some naïve regressions.
## wrong models
mod_w <- svyglm(Y ~ A0*A1, family=Gamma(link="log"),
design=svydesign(id=~1, weights=rep(1,nrow(dat)), data=dat))
(Intercept) | -0.5 | -0.550 | 0.020 | 0.01 |
A0 | 0.2 | 0.216 | 0.030 | 0.30 |
A1 | 0.3 | 0.353 | 0.027 | 0.03 |
A0:A1 | 0.0 | 0.012 | 0.040 | 0.38 |
We see that the p-values for both the intercept at the A1 coefficient are highly significant.
We can also try estimating the inverse probability of treatment weights and using them to obtain a consistent estimate of the treatment effects.
w <- predict(glmA1, type="response")
ps <- dat$A1/w + (1-dat$A1)/(1-w)
## correct model
mod_c <- svyglm(Y ~ A0*A1, family=Gamma(link="log"),
design=svydesign(id=~1, weights=ps, data=dat))
(Intercept) | -0.5 | -0.501 | 0.021 | 0.47 |
A0 | 0.2 | 0.205 | 0.032 | 0.44 |
A1 | 0.3 | 0.272 | 0.028 | 0.16 |
A0:A1 | 0.0 | 0.029 | 0.041 | 0.24 |
Indeed, they are all within two standard errors of their nominal values.
We can also fit the data using maximum likelihood directly. Set
eval=TRUE
to run this chunk.
Again, all estimates are within two standard errors of the true values.
We can transform U to a
standard normal (Z) and
include it in the model as well. (Again, set eval=TRUE
to
run this chunk.)
dat <- dplyr::mutate(dat, Z = qnorm(U)) # transform to normal for fitting
out <- fitCausal(dat, formulas = list(Z ~ 1, L ~ A0, Y ~ A0*A1, ~A0), family = c(1,3,3,1))
out
Note that again the results look correct including the ‘hidden’ variable U/Z.