In this tutorial, we show how to
simulate using the inversion method, which is significantly faster than
rejection sampling. As always, start by loading the package as well as
the survey
package.
We begin by setting up the formulas, families and parameter values. Here we are again a modifed version of the Example R7 from Evans and Didelez (2023). In this case we explicitly parameterize the U-L relationship, and have a copula covering only the response variable Y.
formulas <- list(list(U ~ 1, L ~ U*A0), # covariates
list(A0 ~ 1, A1 ~ A0*L), # treatments
Y ~ A0*A1, # outcome
list(Y=list(U ~ A0*A1, L ~ A0*A1))) # copula formulas defined differently
fam <- list(c(4,3), c(5,5), c(3), c(1,1))
pars <- list(A0 = list(beta = 0),
U = list(beta = 0, phi=0.5),
L = list(beta = c(0.3,0.5,-0.2,-0.1), phi=1),
A1 = list(beta = c(-0.3,0.4,0.3,0)),
Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
cop = list(Y=list(U=list(beta=c(1,0,0,0)), L=list(beta=c(0.5,0,0,0))))
) # parameters also different
cm <- causl_model(formulas=formulas, family=fam, pars=pars, method="inversion")
## Inversion method selected: using pair-copula parameterization
## U L A0 A1 Y
## 1 0.7124 5.908 1 1 0.8742
## 2 0.5910 4.163 0 1 0.9211
## 3 0.0595 1.455 0 0 0.3753
## 4 0.4719 2.518 0 1 0.8181
## 5 0.4486 0.661 1 1 0.8550
## 6 0.0432 0.372 0 0 0.0261
We can then check that parameter estimates match the intended values:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.309 0.0283 10.92 1.30e-27
## U 0.503 0.0492 10.22 2.13e-24
## A0 -0.239 0.0403 -5.94 3.03e-09
## U:A0 -0.110 0.0697 -1.58 1.13e-01
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3412 0.0425 -8.028 9.89e-16
## A0 0.3894 0.0600 6.488 8.68e-11
## L 0.3091 0.0201 15.405 1.50e-53
## A0:L 0.0089 0.0332 0.268 7.89e-01
These are indeed close to their true values.
Now we can use inverse probability weighting to estimate the causal effect of A0, A1 on Y.
w <- predict(glmA1, type="response")
wt <- dat$A1/w + (1-dat$A1)/(1-w)
## wrong model
mod_w <- svyglm(Y ~ A0*A1, family=Gamma(link="log"),
design = svydesign(~1, weights=rep(1,nrow(dat)), data=dat))
summary(mod_w)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.55950 0.0215 -25.980 4.65e-144
## A0 0.19935 0.0320 6.228 4.93e-10
## A1 0.41282 0.0286 14.425 1.05e-46
## A0:A1 -0.00868 0.0411 -0.211 8.33e-01
## correct model
mod_c <- svyglm(Y ~ A0*A1, family=Gamma(link="log"),
design = svydesign(~1, weights=wt, data=dat))
summary(mod_c)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5131 0.0218 -23.515 4.66e-119
## A0 0.2021 0.0337 5.998 2.07e-09
## A1 0.3136 0.0289 10.861 2.51e-27
## A0:A1 0.0114 0.0425 0.267 7.89e-01
The A0
and A1
coefficients of the naïve and
correct models show that IPW works very well.