The causl package now
allows users to specify custom families with which to perform their
simulations. To do this, one must create a function that takes as an
argument the link function, and returns a list containing:
name: the name of the relevant family;ddist: a function returning the density of the
distributions;qdist: a function returning the quantiles from
probabilities;rdist: a function to sample values from the
distribution;pdist: a cumulative distribution function;pars: a list of the names of the parameters used;default: a function that returns a list of the default
values for an observation and each of the parameters;link: the specified link function.The function should also give the output the class
"causl_family", so that it is interpreted
appropriately.
Suppose we wish to simulate from a Poisson distribution as part of a frugal model. Then we can define a function such as the below:
library(causl)
poisson_causl_fam <- function (link) {
if (missing(link)) link = "log"
## write functions
dens <- function (x, mu, log=FALSE) dpois(x, lambda=mu, log=log)
quan <- function (p, mu) qpois(p, lambda=mu)
sim <- function (n, mu) {
qx <- runif(n)
x <- qpois(qx, lambda=mu)
attr(x, "quantile") <- qx
return(x)
}
probs <- function (x, mu) ppois(x, lambda=mu)
default <- function(theta) list(x=1, mu=1)
## define family
out <- list(name="poisson", ddist=dens, qdist=quan, rdist=sim, pdist=probs,
pars=c("mu"), default=default, link=link)
class(out) <- "causl_family"
return(out)
}Some remarks are in order. Note that the ddist function
must have a log argument to allow the log-density to be
evaluated. In addition, since this is a discrete distribution, if we use
it in a copula we must know the exact quantile used in the simulation
(not just the one returned by applying ppois). Hence we
simulate a uniform and then obtain the Poisson random variables by
inversion from qpois. For continuous distributions this is
unnecessary, and glm_sim will just apply the
pdist function.
Now, let us try to simulate some data using this distribution. We
will need to use poisson_causl_fam in our
family argument:
forms <- list(Z ~ 1, X ~ Z, Y ~ X, ~ 1)
fams <- list(1, poisson_causl_fam(), 1, 1)
pars <- list(Z = list(beta = 0, phi = 1),
X = list(beta = c(0.3,1)),
Y = list(beta = c(-1,0.5), phi = 1),
cop = list(beta = 1))
cm <- causl_model(formulas = forms, family = fams, pars = pars)
set.seed(123)
dat <- rfrugal(n=1e3, causl_model=cm)## Inversion method selected: using pair-copula parameterization
Then we can check that the new variables have been simulated correctly. We ought to have: \[\begin{align*} P(X = x \mid z) &= \frac{1}{x!} \exp\{\alpha_0 + \alpha_1 z x - e^{\alpha_0 + \alpha_1 z}\}, \end{align*}\] with \(\alpha = 0.3\) and \(\alpha_1 = 1\). Let’s perform a regression to check the parameters:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.286 0.0312 9.19 2.30e-19
## Z 1.019 0.0219 46.63 7.38e-253
So we can see that the estimates are well within two standard errors of the true value. In addition,
## [1] 1.06
the dispersion parameter is close to its nominal value of 1.
We can also fit the outcome model using maximum likelihood estimation with sandwich errors.
This gives us estimates of \(\beta_0 = -0.886\) (0.042) and \(\beta_1 = 0.457\) (0.011), which is consistent with the values \(-1\) and \(0.5\) that we used to simulate.
The example above works because log is an existing link
function specified in the package. Suppose, however, that we want to use
the sqrt function as the link; this is not used with any
built-in families.
For this we need to specify a custom_links argument to
poisson_causl_fam; each entry in this named list should
itself be a list, containing:
linkfun: the link function \(g\);linkinv: the inverse of \(g\).The names of the entries should correspond to the string used as the name of the link function.
poisson_causl_fam <- function (link) {
if (missing(link)) link = "log"
## write functions
dens <- function (x, mu, log=FALSE) dpois(x, lambda=mu, log=log)
quan <- function (p, mu) qpois(p, lambda=mu)
sim <- function (n, mu) {
qx <- runif(n)
x <- qpois(qx, lambda=mu)
attr(x, "quantile") <- qx
return(x)
}
probs <- function (x, mu) ppois(x, lambda=mu)
default <- function(theta) list(x=1, mu=1)
custom_links <- list(sqrt = list(linkfun = function (x) sqrt(x),
linkinv = function (x) x^2))
## define family
out <- list(name="poisson", ddist=dens, qdist=quan, rdist=sim, pdist=probs,
pars=c("mu"), default=default, link=link, custom_links=custom_links)
class(out) <- "causl_family"
return(out)
}Now we can suitably modify our earlier example,
fams <- list(1, poisson_causl_fam(link="sqrt"), 1, 1)
pars$X$beta <- c(0.9, 0.05)
cm_sq <- causl::modify(cm, family=fams, pars=pars)and re-simulate:
We can again check that the distribution was correctly sampled.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5527 0.0161 96.22 0.00e+00
## Z 0.0775 0.0165 4.71 2.86e-06
So we again see that the estimates are (just about) within two standard errors of the true values (0.9 and 0.05), and the dispersion parameter is estimated to be 1.037, again close to 1.