Inversion Tutorial

In this tutorial, we show how to simulate using the inversion method, which is significantly faster than rejection sampling. As always, start by loading the package as well as the survey package.

library(causl)
library(survey)

Set Up the Model

We begin by setting up the formulas, families and parameter values. Here we are again a modifed version of the Example R7 from Evans and Didelez (2023). In this case we explicitly parameterize the U-L relationship, and have a copula covering only the response variable Y.

formulas <- list(list(U ~ 1, L ~ U*A0),   # covariates
                 list(A0 ~ 1, A1 ~ A0*L), # treatments
                 Y ~ A0*A1,             # outcome
                 list(Y=list(U ~ A0*A1, L ~ A0*A1)))   # copula formulas defined differently
fam <- list(c(4,3), c(5,5), c(3), c(1,1))

pars <- list(A0 = list(beta = 0),
             U = list(beta = 0, phi=0.5),
             L = list(beta = c(0.3,0.5,-0.2,-0.1), phi=1),
             A1 = list(beta = c(-0.3,0.4,0.3,0)),
             Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
             cop = list(Y=list(U=list(beta=c(1,0,0,0)), L=list(beta=c(0.5,0,0,0)))) 
             )  # parameters also different
cm <- causl_model(formulas=formulas, family=fam, pars=pars, method="inversion")
set.seed(123)
n <- 1e4
dat <- rfrugal(n=n, causl_model=cm)
## Inversion method selected: using pair-copula parameterization
head(dat)
##        U     L A0 A1      Y
## 1 0.7124 5.908  1  1 0.8742
## 2 0.5910 4.163  0  1 0.9211
## 3 0.0595 1.455  0  0 0.3753
## 4 0.4719 2.518  0  1 0.8181
## 5 0.4486 0.661  1  1 0.8550
## 6 0.0432 0.372  0  0 0.0261

We can then check that parameter estimates match the intended values:

summary(glm(L ~ U*A0, family=Gamma(link="log"), data=dat))$coef
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    0.309     0.0283   10.92 1.30e-27
## U              0.503     0.0492   10.22 2.13e-24
## A0            -0.239     0.0403   -5.94 3.03e-09
## U:A0          -0.110     0.0697   -1.58 1.13e-01
glmA1 <- glm(A1 ~ A0*L, family=binomial, data=dat)
summary(glmA1)$coef
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.3412     0.0425  -8.028 9.89e-16
## A0            0.3894     0.0600   6.488 8.68e-11
## L             0.3091     0.0201  15.405 1.50e-53
## A0:L          0.0089     0.0332   0.268 7.89e-01

These are indeed close to their true values.

Now we can use inverse probability weighting to estimate the causal effect of A0, A1 on Y.

w <- predict(glmA1, type="response")
wt <- dat$A1/w + (1-dat$A1)/(1-w)

## wrong model
mod_w <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=rep(1,nrow(dat)), data=dat))
summary(mod_w)$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept) -0.55950     0.0215 -25.980 4.65e-144
## A0           0.19935     0.0320   6.228  4.93e-10
## A1           0.41282     0.0286  14.425  1.05e-46
## A0:A1       -0.00868     0.0411  -0.211  8.33e-01
## correct model
mod_c <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=wt, data=dat))
summary(mod_c)$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept)  -0.5131     0.0218 -23.515 4.66e-119
## A0            0.2021     0.0337   5.998  2.07e-09
## A1            0.3136     0.0289  10.861  2.51e-27
## A0:A1         0.0114     0.0425   0.267  7.89e-01

The A0 and A1 coefficients of the naïve and correct models show that IPW works very well.