---
title: "Inversion Tutorial"
author: "Robin J. Evans"
date: "10/06/2023"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Inversion Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

In this tutorial, we show how to simulate using the inversion method, which is
significantly faster than rejection sampling.  As always, start by loading the 
package as well as the `survey` package.
```{r setup, echo=2:3, message=FALSE}
knitr::opts_chunk$set(echo = TRUE); options(digits=3)
library(causl)
library(survey)
```


## Set Up the Model

We begin by setting up the formulas, families and parameter values.  Here we are 
again a modifed version of the Example R7 from Evans and Didelez (2023).  In this
case we explicitly parameterize the $U$-$L$ relationship, and have a copula 
covering only the response variable $Y$.  

```{r setup_model}
formulas <- list(list(U ~ 1, L ~ U*A0),   # covariates
                 list(A0 ~ 1, A1 ~ A0*L), # treatments
                 Y ~ A0*A1,             # outcome
                 list(Y=list(U ~ A0*A1, L ~ A0*A1)))   # copula formulas defined differently
fam <- list(c(4,3), c(5,5), c(3), c(1,1))

pars <- list(A0 = list(beta = 0),
             U = list(beta = 0, phi=0.5),
             L = list(beta = c(0.3,0.5,-0.2,-0.1), phi=1),
             A1 = list(beta = c(-0.3,0.4,0.3,0)),
             Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
             cop = list(Y=list(U=list(beta=c(1,0,0,0)), L=list(beta=c(0.5,0,0,0)))) 
             )  # parameters also different
cm <- causl_model(formulas=formulas, family=fam, pars=pars, method="inversion")
```

```{r simulate}
set.seed(123)
n <- 1e4
dat <- rfrugal(n=n, causl_model=cm)
head(dat)
```
We can then check that parameter estimates match the intended values:
```{r check_data}
summary(glm(L ~ U*A0, family=Gamma(link="log"), data=dat))$coef
glmA1 <- glm(A1 ~ A0*L, family=binomial, data=dat)
summary(glmA1)$coef
```
These are indeed close to their true values.  

Now we can use inverse probability weighting to estimate the causal effect of
$A_0,A_1$ on $Y$.
```{r check_model}
w <- predict(glmA1, type="response")
wt <- dat$A1/w + (1-dat$A1)/(1-w)

## wrong model
mod_w <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=rep(1,nrow(dat)), data=dat))
summary(mod_w)$coef
## correct model
mod_c <- svyglm(Y ~ A0*A1, family=Gamma(link="log"), 
                design = svydesign(~1, weights=wt, data=dat))
summary(mod_c)$coef
```
The `A0` and `A1` coefficients of the naïve and correct models show that
IPW works very well.