Package 'causl' reference manual

Title:	Methods for Specifying, Simulating from and Fitting Causal Models
Description:	Model multivariate distributions using causal parameters.
Authors:	Robin Evans [aut, cre], Xi Lin [aut]
Maintainer:	Robin Evans <[email protected]>
License:	GPL-2
Version:	0.9.9
Built:	2025-03-07 11:24:44 UTC
Source:	https://github.com/rje42/causl

Adjust values of copula parameters individually

Description

Adjust values of copula parameters individually

Usage

adj_vars(
  cop_pars,
  strong = character(0),
  weak = character(0),
  factor = c(5, 0.2)
)
adj_vars(
  cop_pars,
  strong = character(0),
  weak = character(0),
  factor = c(5, 0.2)
)

Arguments

`cop_pars`	list of copula parameters, as output by `gen_cop_pars()`
`strong`, `weak`	character vectors of variables to make strong or weak
`factor`	vector of two real values, to multiply coefficients by

Sample from a causal model

Description

Obtain samples from a causal model using the rejection sampling approach of Evans and Didelez (2024).

Usage

causalSamp(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  pars,
  family,
  link = NULL,
  dat = NULL,
  method = "rejection",
  control = list(),
  seed
)
causalSamp(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  pars,
  family,
  link = NULL,
  dat = NULL,
  method = "rejection",
  control = list(),
  seed
)

Arguments

`n`	number of samples required
`formulas`	list of lists of formulas
`pars`	list of lists of parameters
`family`	families for Z,X,Y and copula
`link`	list of link functions
`dat`	data frame of covariates
`method`	only `"rejection"` is valid
`control`	list of options for the algorithm
`seed`	random seed used for replication

Details

Samples from a given causal model using rejection sampling (or, if everything is discrete, direct sampling).

The entries for formula and family should each be a list with four entries, corresponding to the $Z$ , $X$ , $Y$ and the copula. formula determines the model, so it is crucial that every variable to be simulated is represented there exactly once. Each entry of that list can either be a single formula, or a list of formulae. Each corresponding entry in family should be the same length as the list in formula or of length 1 (in which case it will be repeated for all the variables therein).

We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.

The family variables for the copula are also numeric and taken from VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.

pars should be a named list containing: either entries z, x, y and cop, or variable names that correspond to the LHS of formulae in formulas. Each of these should themselves be a list containing beta (a vector of regression parameters) and (possibly) phi, a dispersion parameter. For any discrete variable that is a treatment, you can also specify p, an initial proportion to simulate from (otherwise this defaults to 0.5).

Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit and probit links are available.

Control parameters are oversamp (default value 10), trace (default value 0, increasing to 1 increases verbosity of output), max_oversamp (default value 1000), warn (which currently does nothing), max_wt which is set to 1, and increases each time the function is recalled. Control parameters also include cop, which gives a keyword for the copula that defaults to "cop".

This function is kept largely for the replication of simulations from Evans and Didelez (2024).

Value

A data frame containing the simulated data.

References

Evans, R.J. and Didelez, V. Parameterizing and simulating from causal models (with discussion). Journal of the Royal Statistical Society, Series B, 2024.

Examples

pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
causalSamp(100, pars = pars)


pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
causalSamp(100, pars = pars)

Copula family functions

Description

Copula family functions

Usage

get_copula(family_index, link = NULL)

gaussian_causl_cop(link)

t_causl_cop(link)

emp_causl_cop(link)

sim_cop(causl_copula, beta_matrix, other_pars, model_matrix)
get_copula(family_index, link = NULL)

gaussian_causl_cop(link)

t_causl_cop(link)

emp_causl_cop(link)

sim_cop(causl_copula, beta_matrix, other_pars, model_matrix)

Arguments

`family_index`	integer representing copula family
`link`	link function
`causl_copula`	family from which to simulate
`beta_matrix`	matrix of regression coefficients
`other_pars`	other parameters for some families
`model_matrix`	matrix of regressors

Details

get_copula returns the causl_copula that corresponds to the particular integer given. So far, 1 for Gaussian and 2 for t copulas are implemented.

The copula_fam functions return a list that contains, for each valid family:

name: the name of the family;
ddist: function to evaluate the density;
rdist: function to obtain samples from copula;
pdist: copula function
pars: character vector of the parameter names;
default: list of default values;
link: the chosen link function.

Functions

get_copula(): getter copula family
gaussian_causl_cop(): Gaussian copula family
t_causl_cop(): t copula family
emp_causl_cop(): empirical copula family
sim_cop(): simulate from copula family

Define a `causl_model` object

Description

This defines a causl_model object, that can be used either for simulation or inference.

Usage

causl_model(
  formulas,
  family,
  pars,
  link,
  dat = NULL,
  method = "inversion",
  kwd = "cop"
)
causl_model(
  formulas,
  family,
  pars,
  link,
  dat = NULL,
  method = "inversion",
  kwd = "cop"
)

Arguments

`formulas`	list of lists of formulas
`family`	families for variables and copula
`pars`	list of lists of parameters
`link`	list of link functions
`dat`	optional data frame of covariates
`method`	either `"inversion"` (the default), `"inversion_mv"`, or `"rejection"`
`kwd`	word used for copula formula and parameters

Details

The components formulas and family must both be specified, and have matching lengths. If pars is specified, then the model can be used for simulation and inference, if not then only for inference. link is optional, and if not specified default links will be used.

Check parameters for univariate families

Description

Checks existence of beta vectors and then assesses appropriate length

Usage

check_rej(formulas, family, pars, dims, kwd)

check_pars(formulas, family, pars, dummy_dat, LHSs, kwd, dims)
check_rej(formulas, family, pars, dims, kwd)

check_pars(formulas, family, pars, dummy_dat, LHSs, kwd, dims)

Arguments

`formulas`	list of lists of formulas
`family`	families for variables and copula
`pars`	list of lists of parameters
`dims`	number of variables in each class
`kwd`	keyword for copula
`dummy_dat`	a dummy dataset, as generated by `gen_dummy_dat()`
`LHSs`	left-hand sides from `formulas`

Functions

check_rej(): Checks for rejection sampling

Density of a multivariate copula

Description

Density of a multivariate copula

Usage

dGaussCop(x, Sigma, log = FALSE, use_cpp = TRUE, N)

dtCop(x, Sigma, df, log = FALSE, use_cpp = TRUE)

dfgmCopula(x, alpha)
dGaussCop(x, Sigma, log = FALSE, use_cpp = TRUE, N)

dtCop(x, Sigma, df, log = FALSE, use_cpp = TRUE)

dfgmCopula(x, alpha)

Arguments

`x`	samples on (0,1)
`Sigma`	collection of matrices
`log`	logical: return log=density?
`use_cpp`	logical: use the C routine?
`N`	optional integer for number of covariance matrices
`df`	degrees of freedom
`alpha`	parameter for copula

Details

Computes the density for data from a Gaussian or t-copula. Currently use_cpp only works for dGaussCop.

Value

numeric vector of densities

Functions

dGaussCop(): Gaussian copula
dtCop(): t-Copula density
dfgmCopula(): bivariate FGM copula

Vectorized conditional copula function

Description

Vectorized conditional copula function

Usage

cVCopula(U, copula, param, par2, inverse = FALSE)
cVCopula(U, copula, param, par2, inverse = FALSE)

Arguments

`U`	matrix of quantiles
`copula`	family of copula to use
`param`	vector of parameters
`par2`	Degrees of freedom for t-copula
`inverse`	should inverse CDF be returned?

Details

Should have nrow(U) = length(param).

Density of a Mixed Copula

Description

Density of a Mixed Copula

Usage

dGaussDiscCop(x, m, Sigma, eta, log = FALSE, use_cpp = TRUE)
dGaussDiscCop(x, m, Sigma, eta, log = FALSE, use_cpp = TRUE)

Arguments

`x`	matrix of samples on (0,1)
`m`	number of discrete variables
`Sigma`	collection of matrices
`eta`	eta matrix
`log`	logical: return log=density?
`use_cpp`	logical: use the C routine?

Value

numeric vector of densities

Empirical CDF

Description

Empirical CDF

Usage

emp_cdf(xo, inv = FALSE, zero = "min")
emp_cdf(xo, inv = FALSE, zero = "min")

Arguments

`xo`	vector of observed values
`inv`	logical: should inverse CDF be returned
`zero`	value to use as inverse for zero

Details

zero can be "min" (for the minimum value of xo), "m1" (for the minimum minus 1) or "mInf" (for -Inf).

Empirical copula

Description

Empirical copula

Usage

emp_cop(u, pts = u, smoothing = c("none", "checkerboard"))
emp_cop(u, pts = u, smoothing = c("none", "checkerboard"))

Arguments

`u`	matrix of integral probability transformed values
`pts`	index of points at which to define copula
`smoothing`	method of smoothing to use

Extract parameter estimates and standard errors

Description

Extract parameter estimates and standard errors

Usage

ests_ses(fit, beta, merged_formula, kwd)
ests_ses(fit, beta, merged_formula, kwd)

Arguments

`fit`	output of `optim`
`beta`	output of `initializeParams2`
`merged_formula`	formula with all variables on RHS
`kwd`	keyword for copula variable

Obtain list of family functions

Description

Obtain list of family functions from numeric or character representation

Usage

family_list(family, func_return = get_family)
family_list(family, func_return = get_family)

Arguments

`family`	numeric or character vector of families
`func_return`	function to apply to list of families

Examples

family_list(c(1,3,5))
family_list(c("t","binomial"))


family_list(c(1,3,5))
family_list(c("t","binomial"))

Numbers for parametric families

Description

Data frames containing

val: an integer
family: a vector giving the associated parametric family for that integer.

The integer val may be used in place of the name of the parametric family when specifying the family object.

Usage

family_vals

familyVals

copula_vals
family_vals

familyVals

copula_vals

Format

family_vals is a data.frame with 9 rows and 2 columns

familyVals is the same object as family_vals

copula_vals is a data.frame with 7 rows and 2 columns

Details

familyVals will be removed in version 1.0.0.

Functions

familyVals: Old name
copula_vals: Values for copula families

Fit multivariate copula regression model

Description

Fit multivariate copula regression model

Usage

fit_causl(
  dat,
  formulas = list(y ~ x, z ~ 1, ~x),
  family = rep(1, length(formulas)),
  link,
  cop_pars,
  use_cpp = TRUE,
  control = list(),
  other_pars = list()
)

fitCausal(
  dat,
  formulas = list(y ~ x, z ~ 1, ~x),
  family = rep(1, length(formulas)),
  link,
  par2,
  sandwich = TRUE,
  use_cpp = TRUE,
  control = list()
)
fit_causl(
  dat,
  formulas = list(y ~ x, z ~ 1, ~x),
  family = rep(1, length(formulas)),
  link,
  cop_pars,
  use_cpp = TRUE,
  control = list(),
  other_pars = list()
)

fitCausal(
  dat,
  formulas = list(y ~ x, z ~ 1, ~x),
  family = rep(1, length(formulas)),
  link,
  par2,
  sandwich = TRUE,
  use_cpp = TRUE,
  control = list()
)

Arguments

`dat`	data frame of observations
`formulas`	list of model formulae, for Y, for the Z variables, and finally for the copula
`family`	families for the Y and Z distributions, and the copula. Should be the same length as `formulas`
`link`	link functions for each variable
`cop_pars`	additional parameters for copula if required
`use_cpp`	logical: should C++ routines be used?
`control`	list of parameters to be passed to `optim`
`other_pars`	list of other parameters to use (e.g. degrees of freedom for a t-distribution)
`par2`	former name for `cop_pars` argument
`sandwich`	logical: should sandwich standard errors be returned?

Details

forms is list of three or more formulae giving predictors of y-margin, z-margin(s) and interaction parameters. Fit is by maximum likelihood.

control has the same arguments as the argument in optim, as well as sandwich, a logical indicating if sandwich estimates of standard errors should be computed, newton, a logical which controls whether Newton iterates should be performed at the end, and cop which can edit the restricted variable name for the left-hand side of formulae. Useful for altering are trace (1 shows steps of optimization) and maxit for the number of steps.

The list other_pars should be named with the relevant variables, and each entry should be a named list containing the relevant parameters.

Warning By default, none of the variables should be called cop, as this is reserved for the copula. The reserved word can be changed using the argument cop within control.

Value

Returns a list of class cop_fit.

Functions

fitCausal(): old name

Tools for manipulating formulas

Description

Tools for manipulating formulas

Usage

lhs(formulas, surv = FALSE)

lhs(formulas) <- value

rhs_vars(formulas)

tidy_formulas(formulas, kwd, prefix = "V")
lhs(formulas, surv = FALSE)

lhs(formulas) <- value

rhs_vars(formulas)

tidy_formulas(formulas, kwd, prefix = "V")

Arguments

`formulas`	list of formulae
`surv`	logical indicating whether to treat as survey data
`value`	character vector to assign
`kwd`	string used to denote copula
`prefix`	string to begin each new variable name

Details

lhs returns a character vector containing left-hand sides of a list of formulae. If surv=TRUE then two responses are returned in the event of the left-hand side being a valid Surv object. ⁠lhs<-⁠ allows one to assign the left-hand sides of variables in the obvious way.

tidy_formulas ensures that all formulae in a list have a left hand side, by giving them names of the form Vn where n is some positive integer. The prefix V can be changed using the argument prefix.

rhs_vars extracts all the variables used on the right-hand sides of a list of formulas.

Functions

lhs(): Obtain left-hand sides from list of formulas
lhs(formulas) <- value: Assign left-hand sides to list of formulas
rhs_vars(): Extract variables from right-hand sides
tidy_formulas(): Tidy up formulae

Function to generate random copula parameters for simulation

Description

Attempts to ensure that values after passing through the standard link function used for Gaussian copulas will have the specified value. For other copulas this will not target the correct range, but it can still be used by considering how the relevant link functions work for the Gaussian and other copula.

Usage

gen_cop_pars(formulas, data, range = c(-1, 1), ...)
gen_cop_pars(formulas, data, range = c(-1, 1), ...)

Arguments

`formulas`	formulas as specified in `rfrugalParam`
`data`	dataset to obtain parameterization for
`range`	range of parameters to target
`...`	other parameters to be included in each copula

Value

A list suitable for the cop entry of the pars argument of rfrugalParam

Generate a dummy dataset

Description

Create a dummy dataset for the purpose of checking coefficient numbers

Usage

gen_dummy_dat(family, pars, dat, LHSs, dims)
gen_dummy_dat(family, pars, dat, LHSs, dims)

Arguments

`family`	families for variables and copula
`pars`	list of lists of parameters
`dat`	optional data frame of covariates
`LHSs`	left-hand sides from `formulas`
`dims`	number of variables in each class

Return causl_fam function from integer index

Description

Return causl_fam function from integer index

Usage

get_family(val)

gaussian_causl_fam(link)

t_causl_fam(link)

Gamma_causl_fam(link)

binomial_causl_fam(link)

beta_causl_fam(link)

categorical_causl_fam(link)

ordinal_causl_fam(link)
get_family(val)

gaussian_causl_fam(link)

t_causl_fam(link)

Gamma_causl_fam(link)

binomial_causl_fam(link)

beta_causl_fam(link)

categorical_causl_fam(link)

ordinal_causl_fam(link)

Arguments

`val`	integer corresponding to distributional family
`link`	link function

Details

The functions gaussian_causl_fam() etc. represent the functions that are returned by get_family().

A few function of this form can be defined by the user, and it should return the following:

name: the name of the relevant family;
ddist: a function returning the density of the distributions;
qdist: a function returning the quantiles from probabilities;
rdist: a function to sample values from the distribution;
pdist: a cumulative distribution function;
pars: a list of the names of the parameters used;
default: a function that returns a list of the default values for an observation and each of the parameters;
link: the specified link function.

The function should also give the output the class "causl_family", so that it is interpreted appropriately. Note that ddist should have a log argument, to allow the log-likelihood to be evaluated.

The only parameterization of the categorical family currently implemented is the multivariate logistic parameterization. For a random variable $X$ with $K$ states, dependence on a vector $\boldsymbol{Z}$ uses:

$\log \dfrac{P(X=k)}{P(X=1)} = \beta_{k}^T \boldsymbol{Z},$

and the $\beta_k$ vectors are stored as $(\beta_2,...\beta_K)$ .

The ordinal family is parameterized using a variation of the ordinal logistic regression model. This takes the logits of entries in the cumulative distribution function and treats the covariates variables linearly on that scale. Suppose $\boldsymbol{Z}$ is the vector of covariates and there are $K$ levels, then

$\log \dfrac{P(X \leq k)}{P(X > k)} = \beta_k^T \boldsymbol{Z}.$

As in the categorical case, the vectors $\beta_k$ are stored as $(\beta_1,\ldots,\beta_{K-1})$ .

Functions

gaussian_causl_fam(): Gaussian distribution family
t_causl_fam(): Student's t distribution family
Gamma_causl_fam(): Gamma distribution family
binomial_causl_fam(): binomial distribution family
beta_causl_fam(): beta distribution family
categorical_causl_fam(): multinomial/categorical distribution family
ordinal_causl_fam(): ordinal categorical distribution family

Get maximum weight for each segment of a distribution

Description

Get maximum weight for each segment of a distribution

Usage

get_max_weights(pars, forms_X, fam_X, qden, fam_Z, LHS_Z, ranges, link, ...)
get_max_weights(pars, forms_X, fam_X, qden, fam_Z, LHS_Z, ranges, link, ...)

Arguments

`pars`	list with all regression parameters
`forms_X`	formulae for treatments
`fam_X`, `fam_Z`	vector of families for treatments and covariates
`qden`	density of proposals
`LHS_Z`	variables in covariates
`ranges`	range of segments
`link`	link functions for treatments
`...`	not currently used

Get density of treatments

Description

Get density of treatments

Usage

get_X_density(dat, eta, phi, qden, family, link, par2, log = FALSE)
get_X_density(dat, eta, phi, qden, family, link, par2, log = FALSE)

Arguments

`dat`	data frame of variables to change conditional distribution of
`eta`	list (or matrix) of linear forms
`phi`	vector of dispersion coefficients
`qden`	functions for densities used to simulate variables
`family`	vector of distribution families
`link`	link functions for GLMs
`par2`	vector of degrees of freedom
`log`	logical: should log-density be returned?

Value

a numeric vector of weights

Get univariate densities and uniform order statistics

Description

Get univariate densities and uniform order statistics

Usage

glm_dens(x, eta, phi, other_pars, family = 1, link)

univarDens(x, eta, phi, other_pars, family = 1, link)
glm_dens(x, eta, phi, other_pars, family = 1, link)

univarDens(x, eta, phi, other_pars, family = 1, link)

Arguments

`x`	vector of observations
`eta`, `phi`	linear component and dispersion parameters
`other_pars`	other parameters for certain families
`family`	numeric indicator of family
`link`	link function

Details

fam follows the usual numeric pattern: 1=normal, 2=t-distribution and 3=Gamma with a log-link.

Value

A list with entries being the numeric vectors u (the quantiles of the input values) and ld (the log density of each observation).

Functions

univarDens(): old name

Simulate from a GLM

Description

Simulate values from some generalized linear models

Usage

glm_sim(family, eta, phi, other_pars, link, quantiles = TRUE)
glm_sim(family, eta, phi, other_pars, link, quantiles = TRUE)

Arguments

`family`	vector of distribution families
`eta`	list (or matrix) of linear forms
`phi`	vector of dispersion coefficients
`other_pars`	list of other parameters for specified family
`link`	link functions for GLMs
`quantiles`	logical indicating whether to return quantiles

Value

a numeric vector of weights

Check if family is categorical

Description

Check if family is categorical

Usage

is_categorical(x)
is_categorical(x)

Arguments

`x`	a family, either numerical, a name, or a `causl_family` object

Details

Returns a logical indicating if the object is the input object represents a categorical or ordinal variable. If it cannot represent a family then NA is returned.

Default method

Description

Default method

Usage

link(x, ...)
link(x, ...)

Arguments

`...`

other arguments (not currently used)

Set up link functions

Description

Set up link functions

Usage

link_setup(
  link,
  family,
  vars,
  sources = links_list,
  fam_list = list(family_vals)
)
link_setup(
  link,
  family,
  vars,
  sources = links_list,
  fam_list = list(family_vals)
)

Arguments

`link`	input given to `msm_samp()` or `causalSamp()`
`family`	the list of families for `Z`,`X` and `Y` variables
`vars`	a list of vectors of variable names with the same structure as `family`
`sources`	list of links for parametric families
`fam_list`	list of data frames in the same format as `family_vals`

Obtain link from a `causl_family` or `causl_copula` obect

Description

Obtain link from a causl_family or causl_copula obect

Usage

## S3 method for class 'causl_family'
link(x, ...)

## S3 method for class 'causl_copula'
link(x, ...)
## S3 method for class 'causl_family'
link(x, ...)

## S3 method for class 'causl_copula'
link(x, ...)

Arguments

`x`	an object of class `causl_family` or `causl_copula`

Functions

link(causl_copula): method for causl_copula object

List of link functions for copuulas

Description

List of link functions for copuulas

Usage

links_cop(family)
links_cop(family)

Arguments

family

either the name of the family or its integer representative

Details

This function returns the default link function for each possible copula. These are given in the table:

value	family	link	link name
1	gaussian	logit((1+rho)/2)	logit2
2	t	logit((1+rho)/2)	logit2
3	Clayton	log(1+theta)	log1p
4	Gumbel	log(theta - 1)	log1m
5	Frank	logit(1 + theta)	log1p
6	Joe	identity	identity
11	FGM	logit((1+rho)/2)	logit2

List of links available for each parametric family

Description

This is a named list whose entries are character vectors of allowed link functions.

Usage

links_list

linksList
links_list

linksList

Format

An object of class list of length 8.

linksList is the old name for links_list

Functions

linksList: Old name

Log-likelihood for frugal parameterization

Description

Log-likelihood for frugal parameterization

Usage

ll_frugal(pars, dat, formulas, family, link, kwd = "cop")
ll_frugal(pars, dat, formulas, family, link, kwd = "cop")

Arguments

`pars`	parameter values
`dat`	`data.frame` containing data
`formulas`	list of lists of formulas
`family`	families for variables and copula
`link`	list of link functions
`kwd`	string to use for copula

Combine multiple formulas

Description

Take collection of formulae and create one formula with all variables on the right-hand side of any of the originals.

Usage

merge_formulas(formulas)
merge_formulas(formulas)

Arguments

formulas

list of formulas to merge

Modify `causl_model` object

Description

Change one or more components of a causl_model object.

Usage

modify.causl_model(
  x,
  over = FALSE,
  formulas,
  family,
  pars,
  link,
  dat,
  method,
  kwd
)
modify.causl_model(
  x,
  over = FALSE,
  formulas,
  family,
  pars,
  link,
  dat,
  method,
  kwd
)

Arguments

`x`	an object of class `causl_model`
`over`	logical: should components be added/modified or entirely over-written? This function can be used to modify
`formulas`	list of lists of formulas
`family`	families for variables and copula
`pars`	list of lists of parameters
`link`	list of link functions
`dat`	optional data frame of covariates
`method`	either `"inversion"` (the default), `"inversion_mv"`, or `"rejection"`
`kwd`	word used for copula formula and parameters

Negative log-likelihood

Description

Negative log-likelihood

Usage

nll2(
  theta,
  dat,
  mm,
  beta,
  phi,
  inCop,
  fam_cop = 1,
  family,
  link,
  cop_pars = NULL,
  use_cpp = TRUE,
  other_pars = list()
)
nll2(
  theta,
  dat,
  mm,
  beta,
  phi,
  inCop,
  fam_cop = 1,
  family,
  link,
  cop_pars = NULL,
  use_cpp = TRUE,
  other_pars = list()
)

Arguments

`theta`	concatenated vector of parameters (`beta` followed by `phi`)
`dat`	matrix of data
`mm`	model matrix for use with `beta`
`beta`	(sparse) matrix of regression parameters for each variable and copula
`phi`	vector of dispersion parameters
`inCop`	vector of integers giving variables in `dat` to be included in copula
`fam_cop`, `family`	integer and integer vector for copula and distribution families respectively
`link`	vector of link functions
`cop_pars`	other parameters for copula
`use_cpp`	logical: should Rcpp functions be used?
`other_pars`	other parameters to pass to `glm_dens`

Details

The number of columns of beta should be the number of columns in dat plus the number required to parameterize the copula. The first few columns and the entries in phi are assumed to be in the order of those in dat. If the $i$ th family for a variable does not require a dispersion parameter then the value of phi[i] is ignored.

Sets up copula quantities only

Description

Sets up copula quantities only

Usage

pair_copula_setup(formulas, family, pars, LHSs, quans, ord)
pair_copula_setup(formulas, family, pars, LHSs, quans, ord)

Arguments

`formulas`	list of formulas for copula only
`family`	list of families for copula only
`pars`	list of copula parameters
`LHSs`	left-hand sides for all variables
`quans`	character vector of already existing variables to include
`ord`	topological ordering

Get parameter masks for regression parameters

Description

Get parameter masks for regression parameters

Usage

par_masks(formulas, family = rep(1, nv), full_form)
par_masks(formulas, family = rep(1, nv), full_form)

Arguments

`formulas`	formulas to create mask for
`family`	vector or list of families
`full_form`	(optionally) merged list of `formulas`

Display output from `causl_model`

Description

Display output from causl_model

Usage

## S3 method for class 'causl_model'
print(x, ...)
## S3 method for class 'causl_model'
print(x, ...)

Arguments

`x`	an object of class `causl_model`
`...`	additional arguments (not used)

Obtain univariate densities

Description

Ultimately should also work for ordinal and categorical cases

Usage

process_discrete_dens(dat, family, LHSs)
process_discrete_dens(dat, family, LHSs)

Arguments

`dat`	data frame of observations
`family`	families for the Y and Z distributions, and the copula. Should be the same length as `formulas`
`LHSs`	left-hand sides from `formulas`

Process formulas, families and parameters

Description

Process formulas, families and parameters

Usage

process_inputs(formulas, family, pars, link, dat, kwd, method = "inversion")

process_formulas(formulas, len = 4)

process_family(family, dims, func_return = get_family)
process_inputs(formulas, family, pars, link, dat, kwd, method = "inversion")

process_formulas(formulas, len = 4)

process_family(family, dims, func_return = get_family)

Arguments

`formulas`	list of lists of formulas
`family`	families for variables and copula
`pars`	list of lists of parameters
`link`	list of link functions
`dat`	optional data frame of covariates
`kwd`	keyword for copula
`method`	either `"inversion"` (the default), `"inversion_mv"`, or `"rejection"`
`len`	number of formulas
`dims`	number of variables in each class
`func_return`	function to use to process character arguments

Details

Function that processes and checks the validity of the main arguments used for simulating data.

For causl we use the get_family() function to process character based arguments, but we allow for other functions to be used in packages that build on this one.

Functions

process_formulas(): Process input for family variables
process_family(): Process input for family variables

Obtain quantiles for prespecified variables

Description

Obtain quantiles for prespecified variables

Usage

process_prespecified(dat, prespec)
process_prespecified(dat, prespec)

Arguments

`dat`	data frame containing variables
`prespec`	character vector of prespecified variables in `dat`

Details

Currently takes the rank of each entry, and subtracts 1/2 and normalizes by the number of entries. If there are $k$ ties they are randomly sorted with a uniform random variable in the symmetric interval around the rank of width $k/n$ .

Get weights for rejection sampling

Description

Get weights for rejection sampling

Usage

rejectionWeights(dat, mms, family, pars, qden, link)
rejectionWeights(dat, mms, family, pars, qden, link)

Arguments

`dat`	data frame of variables to change conditional distribution of
`mms`	list of model matrices
`family`	vector of distribution families
`pars`	parameters for new distributions
`qden`	functions for densities used to simulate variables
`link`	link functions for GLMs

Value

a numeric vector of weights

Rescale quantiles to conditional copula

Description

Rescale quantiles to conditional copula

Usage

rescale_cop(U, X, beta, family = 1, par2)

rescaleCop(U, X, beta, family = 1, par2)
rescale_cop(U, X, beta, family = 1, par2)

rescaleCop(U, X, beta, family = 1, par2)

Arguments

`U`	matrix of quantiles
`X`	model matrix of covariates
`beta`	list of parameters (see details)
`family`	variety of copula to use
`par2`	additional parameter for some copulas

Details

The variable to be transformed must be in the final column of U, with variables being conditioned upon in the earlier columns.

family can be 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas. Gamma distributed, beta distributed or discrete respectively. pars should be a list with entries beta and phi, as well as possibly par2 if family=2. U should have the same length as X has rows, and X should have the same number of columns as the length of pars$beta.

Value

vector of rescaled quantiles

Functions

rescaleCop(): Old name, now deprecated

Rescale quantiles to arbitrary random variable.

Description

Rescale quantiles to arbitrary random variable.

Usage

rescale_var(U, X, pars, family = 1, link)

rescaleVar(U, X, pars, family = 1, link)
rescale_var(U, X, pars, family = 1, link)

rescaleVar(U, X, pars, family = 1, link)

Arguments

`U`	vector of quantiles
`X`	model matrix of covariates
`pars`	list of parameters (see details)
`family`	family of distributions to use
`link`	link function

Details

family can be 1, 2, 3, 4 or 5 for Gaussian, t-distributed, Gamma distributed, beta distributed or discrete respectively, and 11 for ordinal variables. pars should be a list with entries beta and phi, as well as possibly par2, trunc and nlevel if the family is set to 2 or 5. U should have the same length as X has rows, and X should have the same number of columns as the length of pars$beta.

Value

vector of rescaled variables

Functions

rescaleVar(): Old name, now deprecated

Sample from a causal model

Description

Obtain samples from a causal model parameterized as in Evans and Didelez (2024).

Usage

rfrugal(n, causl_model, control = list())

rfrugalParam(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  family = c(1, 1, 1, 1),
  pars,
  link = NULL,
  dat = NULL,
  method = "inversion",
  control = list(),
  ...
)
rfrugal(n, causl_model, control = list())

rfrugalParam(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  family = c(1, 1, 1, 1),
  pars,
  link = NULL,
  dat = NULL,
  method = "inversion",
  control = list(),
  ...
)

Arguments

`n`	number of samples required
`causl_model`	object of class `causl_model`
`control`	list of options for the algorithm
`formulas`	list of lists of formulas
`family`	families for variables and copula
`pars`	list of lists of parameters
`link`	list of link functions
`dat`	optional data frame of covariates
`method`	either `"inversion"` (the default), `"inversion_mv"`, or `"rejection"`
`...`	other arguments, such as custom families

Details

Samples from a given causal model under the frugal parameterization.

We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.

pars should be a named list containing variable names that correspond to the LHS of formulae in formulas. Each of these should themselves be a list containing beta (a vector of regression parameters) and (possibly) phi, a dispersion parameter. For any discrete variable that is a treatment, you can also specify p, an initial proportion to simulate from (otherwise this defaults to 0.5).

A variety of sampling methods are implemented. The inversion method with pair-copulas is the default (method="inversion"), but we cam also use a multivariate copula (method="inversion_mv") or even rejection sampling (method="rejection").

The only control parameters are cop: which gives a keyword for the copula that defaults to "cop"; quiet which defaults to FALSE but will reduce output if set to TRUE; and (if rejection sampling is selected) careful: this logical enables one to implement the full rejection sampling method, which means we do get exact samples (note this method is generally very slow, especially if we have an outlying value, so the default is FALSE).

Value

A data frame containing the simulated data.

Functions

rfrugalParam(): old function for simulation

Examples

pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
rfrugalParam(100, pars = pars)


pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
rfrugalParam(100, pars = pars)

Sample from multivariate copulas

Description

Sample from multivariate copulas

Usage

rGaussCop(n, Sigma)

rtCop(n, Sigma, df)

rfgmCopula(n, d = 2, alpha)
rGaussCop(n, Sigma)

rtCop(n, Sigma, df)

rfgmCopula(n, d = 2, alpha)

Arguments

`n`	sample size
`Sigma`	in which each slice is a correlation matrix
`df`	degrees of freedom
`d`	dimension of copula
`alpha`	(vector of) parameter values

Details

Quicker than rCopula.

Note that rfgmCopula only works for $d = 2$ .

Value

A vector of the simulated random variables.

Functions

rGaussCop(): Gaussian copula
rtCop(): t-copula
rfgmCopula(): FGM-copula

Simulate copula values

Description

Simulate copula values

Usage

sim_copula(dat, family, par, par2, model_matrix)

sim_CopVal(dat, family, par, par2, model_matrix)
sim_copula(dat, family, par, par2, model_matrix)

sim_CopVal(dat, family, par, par2, model_matrix)

Arguments

`dat`	data frame with empty columns
`family`	numeric indicator of copula type
`par`	mandatory parameters
`par2`	optional parameters
`model_matrix`	design matrix for covariates

Details

Returns data frame containing columns y and z1, ..., zk.

The family variables are numeric and taken from VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.

Value

A data frame of the same dimension as dat containing the simulated values.

Functions

sim_CopVal(): Old name, now deprecated

Simulate for single time-step

Description

Simulate for single time-step

Usage

sim_inversion(out, proc_inputs)

sim_multi(out, proc_inputs)

sim_rejection(out, proc_inputs, careful)
sim_inversion(out, proc_inputs)

sim_multi(out, proc_inputs)

sim_rejection(out, proc_inputs, careful)

Arguments

`out`	data frame for output
`proc_inputs`	output of `process_inputs()`
`careful`	should full, slower method be used?

Details

sim_inversion and sim_rejection correspond to performing the sampling by inversion or using rejection sampling.

sim_multi first simulates from the copula then transforms to the correct margins in the correct causal ordering

Functions

sim_multi(): simulation with multivariate copula
sim_rejection(): Rejection sampling code

Simulate a single variable using the inversion method

Description

Each entry formulas, family, pars, link is a list with two entries, the first referring to the variable being simulated and the second to the pair-copulas being used.

Usage

sim_variable(n, formulas, family, pars, link, dat, quantiles)
sim_variable(n, formulas, family, pars, link, dat, quantiles)

Arguments

`n`	sample size
`formulas`	list consisting of a formula for the output variables and a list of formulae for the pair-copula
`family`	list containing family variable
`pars`	list with two entries, first a list of parameters for response, and second a further list of parameters for pair-copula
`link`	list of same form as `family`
`dat`	data frame of current variables
`quantiles`	data frame of quantiles

Value

The data frame dat with an additional column given by the left-hand side of formula[[1]].

Simulate from vine copula

Description

Simulate from vine copula

Usage

sim_vinecop(dat, family, par, par2 = NULL, model_matrix, link)
sim_vinecop(dat, family, par, par2 = NULL, model_matrix, link)

Arguments

`dat`	data frame to be filled in
`family`	family to simulate from
`par`	matrix of parameters
`par2`	extra parameters for t-copula
`model_matrix`	design matrix for covariates
`link`	link functions for parameters (currently unused)

Value

A data frame of the same dimensions as dat.

Simulate initial X values

Description

Simulate initial X values

Usage

sim_X(n, fam_x, theta, offset, sim = TRUE)
sim_X(n, fam_x, theta, offset, sim = TRUE)

Arguments

`n`	number of observations
`fam_x`	number for distribution family
`theta`	parameters for model
`offset`	optional mean correction
`sim`	should variables be simulated?

Details

Returns a list that includes a data frame containing a column x, as well as the density that was used to generate it. Possible families are Gaussian (=1), t (=2), Exponential (=3), beta (=4) Bernoulli/categorical (=5) and log-normal (=6).

For the exponential distribution, theta is the mean. Beta can take one or two parameters, and if there is only one it is just repeated.

The offset parameter alters the median for the normal and t-distributions, or the median of the logarithm in the case of a log-normal.

Value

A list with two entries: x a vector of the simulated values, and qden, which contains a function that evaluates to the density of the distribution used to generate those values.

Transform categorical or ordinal parameters into probabilities

Description

Transform categorical or ordinal parameters into probabilities

Usage

theta_to_p_ord(theta)

theta_to_p_cat(theta)
theta_to_p_ord(theta)

theta_to_p_cat(theta)

Arguments

theta

provided log-linear parameters

Details

Returns the probabilities implied by given log-linear parameters.

Functions

theta_to_p_ord(): for ordinal variables

Obtain variable ordering from formulas

Description

Obtain variable ordering from formulas

Usage

var_order(formulas, dims, inc_cop = TRUE, method)
var_order(formulas, dims, inc_cop = TRUE, method)

Arguments

`formulas`	list of lists of formulas
`dims`	number of variables in each class
`inc_cop`	logical indicating whether to include copula in the ordering
`method`	either `"inversion"` (the default), `"inversion_mv"`, or `"rejection"`

Package 'causl'

Help Index

Adjust values of copula parameters individually

Description

Usage

Arguments

Sample from a causal model

Description

Usage

Arguments

Details

Value

References

Examples

Copula family functions

Description

Usage

Arguments

Details

Functions

Define a causl_model object

Description

Usage

Arguments

Details

Check parameters for univariate families

Description

Usage

Arguments

Functions

Density of a multivariate copula

Description

Usage

Arguments

Details

Value

Functions

Vectorized conditional copula function

Description

Usage

Arguments

Details

Density of a Mixed Copula

Description

Usage

Arguments

Value

Empirical CDF

Description

Usage

Arguments

Details

Empirical copula

Description

Usage

Arguments

Extract parameter estimates and standard errors

Description

Usage

Arguments

Obtain list of family functions

Description

Usage

Arguments

Examples

Numbers for parametric families

Description

Usage

Format

Details

Functions

Fit multivariate copula regression model

Description

Usage

Arguments

Details

Value

Functions

Tools for manipulating formulas

Description

Define a `causl_model` object

Obtain link from a `causl_family` or `causl_copula` obect