Inversion Tutorial

In this tutorial, we show how to simulate using the inversion method, which is significantly faster than rejection sampling. As always, start by loading the package as well as the survey package.

library(causl)
library(survey)

Set Up the Model

We begin by setting up the formulas, families and parameter values. Here we are again a modifed version of the Example R7 from Evans and Didelez (2023). In this case we explicitly parameterize the \(U\)-\(L\) relationship, and have a copula covering only the response variable \(Y\).

formulas <- list(list(U ~ 1, L ~ U*A0),   # covariates
                 list(A0 ~ 1, A1 ~ A0*L), # treatments
                 Y ~ A0*A1,             # outcome
                 list(Y=list(U ~ A0*A1, L ~ A0*A1)))   # copula formulas defined differently
fam <- list(c(4,3), c(5,5), c(3), c(1,1))

pars <- list(A0 = list(beta = 0),
             U = list(beta = 0, phi=0.5),
             L = list(beta = c(0.3,0.5,-0.2,-0.1), phi=1),
             A1 = list(beta = c(-0.3,0.4,0.3,0)),
             Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
             cop = list(Y=list(U=list(beta=c(1,0,0,0)), L=list(beta=c(0.5,0,0,0)))) 
             )  # parameters also different
cm <- causl_model(formulas=formulas, family=fam, pars=pars, method="inversion")
set.seed(123)
n <- 1e4
dat <- rfrugal(n=n, causl_model=cm)
## Inversion method selected: using pair-copula parameterization
head(dat)
##       U     L A0 A1     Y
## 1 0.308 0.307  0  0 0.291
## 2 0.418 5.004  0  1 0.422
## 3 0.922 0.207  1  1 3.460
## 4 0.525 1.910  1  1 1.211
## 5 0.546 1.302  1  1 0.711
## 6 0.941 0.394  0  0 0.839

We can then check that parameter estimates match the intended values:

summary(glm(L ~ U*A0, family=Gamma(link="log"), data=dat))$coef
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.2718     0.0308    8.82 1.34e-18
## U             0.5133     0.0541    9.49 2.94e-21
## A0           -0.1809     0.0429   -4.22 2.44e-05
## U:A0         -0.0842     0.0757   -1.11 2.66e-01
glmA1 <- glm(A1 ~ A0*L, family=binomial, data=dat)
summary(glmA1)$coef
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.4273     0.0428   -9.98 1.86e-23
## A0            0.5150     0.0603    8.53 1.40e-17
## L             0.3450     0.0209   16.54 1.78e-61
## A0:L         -0.0423     0.0333   -1.27 2.04e-01

These are indeed close to their true values.

Now we can use inverse probability weighting to estimate the causal effect of \(A_0,A_1\) on \(Y\).

w <- predict(glmA1, type="response")
wt <- dat$A1/w + (1-dat$A1)/(1-w)

## wrong model
mod_w <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=rep(1,nrow(dat)), data=dat))
summary(mod_w)$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept)  -0.5888     0.0207  -28.39 1.52e-170
## A0            0.2485     0.0312    7.96  1.96e-15
## A1            0.4535     0.0282   16.07  2.09e-57
## A0:A1        -0.0522     0.0410   -1.27  2.03e-01
## correct model
mod_c <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=wt, data=dat))
summary(mod_c)$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept)  -0.5322     0.0238  -22.32 9.17e-108
## A0            0.2491     0.0351    7.10  1.30e-12
## A1            0.3393     0.0306   11.07  2.53e-28
## A0:A1        -0.0269     0.0440   -0.61  5.42e-01

The A0 and A1 coefficients of the naïve and correct models show that IPW works very well.