Inversion Tutorial

In this tutorial, we show how to simulate using the inversion method, which is significantly faster than rejection sampling. As always, start by loading the package as well as the survey package.

library(causl)
library(survey)

Set Up the Model

We begin by setting up the formulas, families and parameter values. Here we are again a modifed version of the Example R7 from Evans and Didelez (2023). In this case we explicitly parameterize the U-L relationship, and have a copula covering only the response variable Y.

formulas <- list(list(U ~ 1, L ~ U*A0),   # covariates
                 list(A0 ~ 1, A1 ~ A0*L), # treatments
                 Y ~ A0*A1,             # outcome
                 list(Y=list(U ~ A0*A1, L ~ A0*A1)))   # copula formulas defined differently
fam <- list(c(4,3), c(5,5), c(3), c(1,1))

pars <- list(A0 = list(beta = 0),
             U = list(beta = 0, phi=0.5),
             L = list(beta = c(0.3,0.5,-0.2,-0.1), phi=1),
             A1 = list(beta = c(-0.3,0.4,0.3,0)),
             Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
             cop = list(Y=list(U=list(beta=c(1,0,0,0)), L=list(beta=c(0.5,0,0,0)))) 
             )  # parameters also different
cm <- causl_model(formulas=formulas, family=fam, pars=pars, method="inversion")
set.seed(123)
n <- 1e4
dat <- rfrugal(n=n, causl_model=cm)
## Inversion method selected: using pair-copula parameterization
head(dat)
##        U     L A0 A1      Y
## 1 0.7124 5.908  1  0 0.6476
## 2 0.5910 4.163  0  1 0.9211
## 3 0.0595 1.455  0  1 0.5066
## 4 0.4719 2.518  0  1 0.8181
## 5 0.4486 0.661  1  0 0.6334
## 6 0.0432 0.372  0  0 0.0261

We can then check that parameter estimates match the intended values:

summary(glm(L ~ U*A0, family=Gamma(link="log"), data=dat))$coef
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    0.309     0.0283   10.92 1.30e-27
## U              0.503     0.0492   10.22 2.13e-24
## A0            -0.239     0.0403   -5.94 3.03e-09
## U:A0          -0.110     0.0697   -1.58 1.13e-01
glmA1 <- glm(A1 ~ A0*L, family=binomial, data=dat)
summary(glmA1)$coef
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.2726     0.0421  -6.475 9.48e-11
## A0            0.4002     0.0599   6.684 2.32e-11
## L             0.2779     0.0195  14.231 5.84e-46
## A0:L          0.0297     0.0331   0.896 3.70e-01

These are indeed close to their true values.

Now we can use inverse probability weighting to estimate the causal effect of A0, A1 on Y.

w <- predict(glmA1, type="response")
wt <- dat$A1/w + (1-dat$A1)/(1-w)

## wrong model
mod_w <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=rep(1,nrow(dat)), data=dat))
summary(mod_w)$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept) -0.53967     0.0211 -25.538 2.07e-139
## A0           0.19076     0.0315   6.050  1.50e-09
## A1           0.37719     0.0285  13.230  1.27e-39
## A0:A1        0.00709     0.0410   0.173  8.63e-01
## correct model
mod_c <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=wt, data=dat))
summary(mod_c)$coef
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.4907     0.0231 -21.255 4.34e-98
## A0            0.1950     0.0348   5.598 2.23e-08
## A1            0.2813     0.0301   9.353 1.03e-20
## A0:A1         0.0194     0.0436   0.445 6.56e-01

The A0 and A1 coefficients of the naïve and correct models show that IPW works very well.