Custom Families

The causl package now allows users to specify custom families with which to perform their simulations. To do this, one must create a function that takes as an argument the link function, and returns a list containing:

  • name: the name of the relevant family;
  • ddist: a function returning the density of the distributions;
  • qdist: a function returning the quantiles from probabilities;
  • rdist: a function to sample values from the distribution;
  • pdist: a cumulative distribution function;
  • pars: a list of the names of the parameters used;
  • default: a function that returns a list of the default values for an observation and each of the parameters;
  • link: the specified link function.

The function should also give the output the class "causl_family", so that it is interpreted appropriately.

Example (Poisson distribution)

Suppose we wish to simulate from a Poisson distribution as part of a frugal model. Then we can define a function such as the below:

library(causl)

poisson_causl_fam <- function (link) {
  if (missing(link)) link = "log"

  ## write functions
  dens <- function (x, mu, log=FALSE) dpois(x, lambda=mu, log=log)
  quan <- function (p, mu) qpois(p, lambda=mu)
  sim <- function (n, mu) {
    qx <- runif(n)
    x <- qpois(qx, lambda=mu)
    attr(x, "quantile") <- qx
    return(x)
  }
  probs <- function (x, mu) ppois(x, lambda=mu)

  default <- function(theta) list(x=1, mu=1)

  ## define family
  out <- list(name="poisson", ddist=dens, qdist=quan, rdist=sim, pdist=probs,
              pars=c("mu"), default=default, link=link)
  class(out) <- "causl_family"

  return(out)
}

Some remarks are in order. Note that the ddist function must have a log argument to allow the log-density to be evaluated. In addition, since this is a discrete distribution, if we use it in a copula we must know the exact quantile used in the simulation (not just the one returned by applying ppois). Hence we simulate a uniform and then obtain the Poisson random variables by inversion from qpois. For continuous distributions this is unnecessary, and glm_sim will just apply the pdist function.

Now, let us try to simulate some data using this distribution. We will need to use poisson_causl_fam in our family argument:

forms <- list(Z ~ 1, X ~ Z, Y ~ X, ~ 1)
fams <- list(1, poisson_causl_fam(), 1, 1)
pars <- list(Z = list(beta = 0, phi = 1),
             X = list(beta = c(0.3,1)),
             Y = list(beta = c(-1,0.5), phi = 1),
             cop = list(beta = 1))
cm <- causl_model(formulas = forms, family = fams, pars = pars)

set.seed(123)
dat <- rfrugal(n=1e3, causl_model=cm)
## Inversion method selected: using pair-copula parameterization

Treatment model

Then we can check that the new variables have been simulated correctly. We ought to have: \[\begin{align*} P(X = x \mid z) &= \frac{1}{x!} \exp\{\alpha_0 + \alpha_1 z x - e^{\alpha_0 + \alpha_1 z}\}, \end{align*}\] with \(\alpha = 0.3\) and \(\alpha_1 = 1\). Let’s perform a regression to check the parameters:

modX <- glm(X ~ Z, family=quasipoisson, data=dat)
summary(modX)$coefficients
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept)    0.286     0.0312    9.19  2.30e-19
## Z              1.019     0.0219   46.63 7.38e-253

So we can see that the estimates are well within two standard errors of the true value. In addition,

summary(modX)$dispersion
## [1] 1.06

the dispersion parameter is close to its nominal value of 1.

Outcome model

We can also fit the outcome model using maximum likelihood estimation with sandwich errors.

out <- fit_causl(dat = dat, formulas = list(Y ~ X, Z ~ 1, cop ~ 1),
                 family = c(1, 1, 1))

This gives us estimates of \(\beta_0 = -0.886\) (0.042) and \(\beta_1 = 0.457\) (0.011), which is consistent with the values \(-1\) and \(0.5\) that we used to simulate.