| Title: | Methods for Specifying, Simulating from and Fitting Causal Models |
|---|---|
| Description: | Model and simulate from multivariate distributions using marginal causal parameters. |
| Authors: | Robin Evans [aut, cre], Xi Lin [aut], Chase Mathis [aut] |
| Maintainer: | Robin Evans <[email protected]> |
| License: | GPL-2 |
| Version: | 0.10.4.9000 |
| Built: | 2026-05-11 07:31:06 UTC |
| Source: | https://github.com/rje42/causl |
Adjust values of copula parameters individually
adj_vars( cop_pars, strong = character(0), weak = character(0), factor = c(5, 0.2) )adj_vars( cop_pars, strong = character(0), weak = character(0), factor = c(5, 0.2) )
cop_pars |
list of copula parameters, as output by |
strong, weak
|
character vectors of variables to make strong or weak |
factor |
vector of two real values, to multiply coefficients by |
Obtain samples from a causal model using the rejection sampling approach of Evans and Didelez (2024). Kept for compatibility of simulations from that paper.
causalSamp( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), pars, family, link = NULL, dat = NULL, method = "rejection", control = list(), seed )causalSamp( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), pars, family, link = NULL, dat = NULL, method = "rejection", control = list(), seed )
n |
number of samples required |
formulas |
list of lists of formulas |
pars |
list of lists of parameters |
family |
families for Z,X,Y and copula |
link |
list of link functions |
dat |
data frame of covariates |
method |
only |
control |
list of options for the algorithm |
seed |
random seed used for replication |
Samples from a given causal model using rejection sampling (or, if everything is discrete, direct sampling).
The entries for formula and family should each be a
list with four entries, corresponding to the , , and
the copula. formula determines the model, so it is crucial that
every variable to be simulated is represented there exactly once. Each
entry of that list can either be a single formula, or a list of formulae.
Each corresponding entry in family should be the same length as the
list in formula or of length 1 (in which case it will be repeated
for all the variables therein).
We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.
The family variables for the copula are also numeric and taken from
VineCopula.
Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel,
5 for Frank, 6 for Joe and 11 for FGM copulas.
pars should be a named list containing: either entries z,
x, y and cop, or variable names that correspond to the
LHS of formulae in formulas. Each of these should themselves be a list
containing beta (a vector of regression parameters) and (possibly)
phi, a dispersion parameter. For any discrete variable that is a
treatment, you can also specify p, an initial proportion to simulate
from (otherwise this defaults to 0.5).
Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit and probit links are available.
Control parameters are oversamp (default value 10), trace (default
value 0, increasing to 1 increases verbosity of output),
max_oversamp (default value 1000), warn (which currently does
nothing), max_wt which is set to 1, and increases each time the function
is recalled.
Control parameters also include cop, which gives a keyword for the
copula that defaults to "cop".
This function is kept largely for the replication of simulations from Evans and Didelez (2024).
A data frame containing the simulated data.
Evans, R.J. and Didelez, V. Parameterizing and simulating from causal models (with discussion). Journal of the Royal Statistical Society, Series B, 2024.
pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) causalSamp(100, family = rep(1,4), pars = pars)pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) causalSamp(100, family = rep(1,4), pars = pars)
Copula family functions
get_copula(family_index, link) gaussian_causl_cop(link) t_causl_cop(link) emp_causl_cop(link) sim_cop(causl_copula, beta_matrix, other_pars, model_matrix)get_copula(family_index, link) gaussian_causl_cop(link) t_causl_cop(link) emp_causl_cop(link) sim_cop(causl_copula, beta_matrix, other_pars, model_matrix)
family_index |
integer representing copula family |
link |
link function |
causl_copula |
family from which to simulate |
beta_matrix |
matrix of regression coefficients |
other_pars |
other parameters for some families |
model_matrix |
matrix of regressors |
get_copula returns the causl_copula that corresponds to the particular
integer given. So far, 1 for Gaussian and 2 for t copulas are implemented.
The copula_fam functions return a list that contains, for each valid family:
name: the name of the family;
ddist: function to evaluate the density;
rdist: function to obtain samples from copula;
pdist: copula function
pars: character vector of the parameter names;
default: list of default values;
link: the chosen link function.
get_copula(): get copula family function
gaussian_causl_cop(): Gaussian copula family
t_causl_cop(): t copula family
emp_causl_cop(): empirical copula family
sim_cop(): simulate from copula family
causl_model objectThis defines a causl_model object, that can be used either for simulation
or inference.
causl_model( formulas, family, pars, link, dat = NULL, method = "inversion", kwd = "cop", control = list() )causl_model( formulas, family, pars, link, dat = NULL, method = "inversion", kwd = "cop", control = list() )
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
method |
either |
kwd |
word used for copula formula and parameters |
control |
list of options for the algorithm |
The components formulas and family must both be specified, and have
matching lengths. If pars is specified, then the model can be used for
simulation and inference, if not then only for inference. link is optional,
and if not specified default links will be used.
Checks existence of beta vectors and then assesses appropriate length
check_rej(formulas, family, pars, dims, kwd) check_pars(formulas, family, pars, dummy_dat, LHSs, kwd, dims)check_rej(formulas, family, pars, dims, kwd) check_pars(formulas, family, pars, dummy_dat, LHSs, kwd, dims)
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
dims |
number of variables in each class |
dummy_dat |
a dummy dataset, as generated by |
LHSs |
left-hand sides from |
check_rej(): Checks for rejection sampling
Density of a multivariate copula
dGaussCop(x, Sigma, log = FALSE, use_cpp = TRUE, N) dtCop(x, Sigma, df, log = FALSE, use_cpp = TRUE) dfgmCopula(x, alpha, log = FALSE)dGaussCop(x, Sigma, log = FALSE, use_cpp = TRUE, N) dtCop(x, Sigma, df, log = FALSE, use_cpp = TRUE) dfgmCopula(x, alpha, log = FALSE)
x |
samples on (0,1) |
Sigma |
collection of matrices |
log |
logical: return log-density? |
use_cpp |
logical: use the C routine? |
N |
optional integer for number of covariance matrices |
df |
degrees of freedom |
alpha |
parameter for copula |
Computes the density for data from a
Gaussian or t-copula. Currently use_cpp only
works for dGaussCop.
numeric vector of densities
dGaussCop(): Gaussian copula
dtCop(): t-Copula density
dfgmCopula(): bivariate FGM copula
A faster Vectorized conditional copula function where we don't build redudendant copulas if unique(eta) is small
cVCopula(U, copula, param, par2 = NULL, inverse = FALSE, cdf = FALSE)cVCopula(U, copula, param, par2 = NULL, inverse = FALSE, cdf = FALSE)
U |
matrix of quantiles |
copula |
family of copula to use |
param |
vector of parameters |
par2 |
Degrees of freedom for t-copula |
inverse |
should inverse CDF be returned? |
cdf |
should we evaluate the CDF copula, not the conditional? |
Should have nrow(U) = length(param).
Density of a Mixed Copula
dGaussDiscCop(x, m, Sigma, eta, log = FALSE, use_cpp = TRUE)dGaussDiscCop(x, m, Sigma, eta, log = FALSE, use_cpp = TRUE)
x |
matrix of samples on (0,1) |
m |
number of discrete variables |
Sigma |
collection of matrices |
eta |
eta matrix |
log |
logical: return log=density? |
use_cpp |
logical: use the C routine? |
numeric vector of densities
Empirical CDF
emp_cdf(xo, inv = FALSE, zero = "min")emp_cdf(xo, inv = FALSE, zero = "min")
xo |
vector of observed values |
inv |
logical: should inverse CDF be returned |
zero |
value to use as inverse for zero |
zero can be "min" (for the minimum value of xo), "m1" (for
the minimum minus 1) or "mInf" (for -Inf).
Empirical copula
emp_cop(u, pts = u, smoothing = c("none", "checkerboard"))emp_cop(u, pts = u, smoothing = c("none", "checkerboard"))
u |
matrix of integral probability transformed values |
pts |
index of points at which to define copula |
smoothing |
method of smoothing to use |
Extract parameter estimates and standard errors
ests_ses(fit, beta, merged_formula, kwd)ests_ses(fit, beta, merged_formula, kwd)
fit |
output of |
beta |
output of |
merged_formula |
formula with all variables on RHS |
kwd |
keyword for copula variable |
Check properties of family variables
is_categorical(x) is_discrete(x)is_categorical(x) is_discrete(x)
x |
a family, either numerical, a name, or a |
If the specific function does represent any family then NA is
returned.
is_categorical returns a logical indicating if the object is the
input object represents a categorical or ordinal variable.
is_discrete returns a logical indicating if the object is the
input object represents a discrete random variable.
is_categorical(): Check if family is categorical
is_discrete(): Check if family is discrete
Obtain list of family functions from numeric or character representation
family_list(family, func_return = get_family)family_list(family, func_return = get_family)
family |
numeric or character vector of families |
func_return |
function to apply to list of families |
family_list(c(1,3,5)) family_list(c("t","binomial"))family_list(c(1,3,5)) family_list(c("t","binomial"))
Data frames containing
val: an integer
family: a vector giving the associated parametric family for that integer.
The integer val may be used in place of the name of the parametric family
when specifying the family object.
family_vals familyVals copula_valsfamily_vals familyVals copula_vals
family_vals is a data.frame with 9 rows and 2 columns
familyVals is the same object as family_vals
copula_vals is a data.frame with 7 rows and 2 columns
familyVals will be removed in version 1.0.0.
familyVals: Old name
copula_vals: Values for copula families
Fit multivariate copula regression model
fit_causl( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, cop_pars, use_cpp = TRUE, control = list(), other_pars = list(), method = "optim" ) fitCausal( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, par2, sandwich = TRUE, use_cpp = TRUE, control = list() )fit_causl( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, cop_pars, use_cpp = TRUE, control = list(), other_pars = list(), method = "optim" ) fitCausal( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, par2, sandwich = TRUE, use_cpp = TRUE, control = list() )
dat |
data frame of observations |
formulas |
list of model formulae, for outcome, covariates, and finally for the copula |
family |
families for variables and copula as above. Should be the
same length as |
link |
link functions for each variable |
cop_pars |
additional parameters for copula if required |
use_cpp |
logical: should C++ routines be used? |
control |
list of parameters to be passed to |
other_pars |
list of other parameters to use (e.g. degrees of freedom for a t-distribution) |
method |
either |
par2 |
former name for |
sandwich |
logical: should sandwich standard errors be returned? |
formulas is list of three or more formulae giving predictors of
y-margin, z-margin(s) and interaction parameters. Fit is by maximum
likelihood.
family takes numeric values of the following forms:
| val | family |
| 0 | binomial |
| 1 | gaussian |
| 2 | t |
| 3 | Gamma |
| 4 | beta |
| 5 | binomial |
| 6 | lognormal |
| 10 | categorical |
| 11 | ordinal |
control has the same arguments as the argument in optim, as well
as sandwich, a logical indicating if sandwich estimates of standard errors
should be computed, newton, a logical which controls whether Newton iterates should be
performed at the end, and cop which can edit the restricted variable name
for the left-hand side of formulae.
Useful for altering are trace (1 shows steps of optimization) and
maxit for the number of steps.
cop_pars is currently just a numeric scalar, generally representing the
degrees of freedom to use in a t-copula.
The list other_pars should be named with the relevant variables, and
each entry should be a named list containing the relevant parameters. As
an example, for a t-copula, the required parameter is the degrees of
freedom, df.
Warning By default, none of the variables should be called cop, as
this is reserved for the copula. The reserved word can be changed using
the argument cop within control.
Returns a list of class cop_fit.
fitCausal(): old name
Fit using maximum likelihood estimation
fit_optim( dat, full_form, family, fam_cop, link, mm, cop_pars, LHSs, other_pars, control, use_cpp )fit_optim( dat, full_form, family, fam_cop, link, mm, cop_pars, LHSs, other_pars, control, use_cpp )
dat |
data frame of observations |
full_form |
full formula with all variables on right-hand side |
family |
families for variables and copula as above. Should be the
same length as |
fam_cop |
the family of the copula |
link |
link functions for each variable |
mm |
model matrix |
cop_pars |
additional parameters for copula if required |
LHSs |
character string of left hand sides of formula |
other_pars |
list of other parameters to use (e.g. degrees of freedom for a t-distribution) |
control |
list of parameters to be passed to |
use_cpp |
logical: should C++ routines be used? |
Tools for manipulating formulas
lhs(formulas, surv = FALSE) lhs(formulas) <- value rhs_vars(formulas) tidy_formulas(formulas, kwd, prefix = "V")lhs(formulas, surv = FALSE) lhs(formulas) <- value rhs_vars(formulas) tidy_formulas(formulas, kwd, prefix = "V")
formulas |
list of formulae |
surv |
logical indicating whether to treat as survey data |
value |
character vector to assign |
kwd |
string used to denote copula |
prefix |
string to begin each new variable name |
lhs returns a character vector containing left-hand sides of a
list of formulae. If surv=TRUE then two responses are returned in the
event of the left-hand side being a valid Surv object. lhs<- allows
one to assign the left-hand sides of variables in the obvious way.
tidy_formulas ensures that all formulae in a list have a left hand side,
by giving them names of the form Vn where n is some positive integer. The
prefix V can be changed using the argument prefix.
rhs_vars extracts all the variables used on the right-hand sides of a list
of formulas.
lhs(): Obtain left-hand sides from list of formulas
lhs(formulas) <- value: Assign left-hand sides to list of formulas
rhs_vars(): Extract variables from right-hand sides
tidy_formulas(): Tidy up formulae
Attempts to ensure that values after passing through the standard link function used for Gaussian copulas will have the specified value. For other copulas this will not target the correct range, but it can still be used by considering how the relevant link functions work for the Gaussian and other copula.
gen_cop_pars(formulas, data, range = c(-1, 1), ...)gen_cop_pars(formulas, data, range = c(-1, 1), ...)
formulas |
formulas as specified in |
data |
dataset to obtain parameterization for |
range |
range of parameters to target |
... |
other parameters to be included in each copula |
A list suitable for the cop entry of the pars argument of
rfrugalParam
Create a dummy dataset for the purpose of checking coefficient numbers
gen_dummy_dat(family, pars, dat, LHSs, dims)gen_dummy_dat(family, pars, dat, LHSs, dims)
family |
families for variables and copula |
pars |
list of lists of parameters |
dat |
optional data frame of covariates |
LHSs |
left-hand sides from |
dims |
number of variables in each class |
Return causl_fam function from integer index
get_family(val) gaussian_causl_fam(link) t_causl_fam(link) Gamma_causl_fam(link) binomial_causl_fam(link) beta_causl_fam(link) categorical_causl_fam(link) ordinal_causl_fam(link)get_family(val) gaussian_causl_fam(link) t_causl_fam(link) Gamma_causl_fam(link) binomial_causl_fam(link) beta_causl_fam(link) categorical_causl_fam(link) ordinal_causl_fam(link)
val |
integer corresponding to distributional family |
link |
link function |
The functions gaussian_causl_fam() etc. represent the functions that are
returned by get_family().
A few function of this form can be defined by the user, and it should return the following:
name: the name of the relevant family;
ddist: a function returning the density of the distributions;
qdist: a function returning the quantiles from probabilities;
rdist: a function to sample values from the distribution;
pdist: a cumulative distribution function;
pars: a list of the names of the parameters used;
default: a function that returns a list of the default values for an
observation and each of the parameters;
link: the specified link function.
The function should also give the output the class "causl_family", so that
it is interpreted appropriately. Note that ddist should have a log
argument, to allow the log-likelihood to be evaluated.
The only parameterization of the categorical family currently implemented
is the multivariate logistic parameterization. For a random variable
with states, dependence on a vector uses:
and the vectors are stored as .
The ordinal family is parameterized using a variation of the ordinal
logistic regression model. This takes the logits of entries in the cumulative
distribution function and treats the covariates variables linearly on that
scale. Suppose is the vector of covariates and there are
levels, then
As in the categorical case, the vectors are stored as
.
gaussian_causl_fam(): Gaussian distribution family
t_causl_fam(): Student's t distribution family
Gamma_causl_fam(): Gamma distribution family
binomial_causl_fam(): binomial distribution family
beta_causl_fam(): beta distribution family
categorical_causl_fam(): multinomial/categorical distribution family
ordinal_causl_fam(): ordinal categorical distribution family
Get maximum weight for each segment of a distribution
get_max_weights(pars, forms_X, fam_X, qden, fam_Z, LHS_Z, ranges, link, ...)get_max_weights(pars, forms_X, fam_X, qden, fam_Z, LHS_Z, ranges, link, ...)
pars |
list with all regression parameters |
forms_X |
formulae for treatments |
fam_X, fam_Z
|
vector of families for treatments and covariates |
qden |
density of proposals |
LHS_Z |
variables in covariates |
ranges |
matrix of ranges of segments |
link |
link functions for treatments |
... |
not currently used |
Get density of treatments
get_X_density(dat, eta, phi, qden, family, link, other_par, log = FALSE)get_X_density(dat, eta, phi, qden, family, link, other_par, log = FALSE)
dat |
data frame of variables to change conditional distribution of |
eta |
list (or matrix) of linear forms |
phi |
vector of dispersion coefficients |
qden |
functions for densities used to simulate variables |
family |
vector of distribution families |
link |
link functions for GLMs |
other_par |
other parameters for family |
log |
logical: should log-density be returned? |
a numeric vector of weights
Get univariate densities and uniform order statistics
glm_dens(x, eta, phi, other_pars, family = 1, link) univarDens(x, eta, phi, other_pars, family = 1, link)glm_dens(x, eta, phi, other_pars, family = 1, link) univarDens(x, eta, phi, other_pars, family = 1, link)
x |
vector of observations |
eta, phi
|
linear component and dispersion parameters |
other_pars |
other parameters for certain families |
family |
numeric indicator of family |
link |
link function |
fam follows the usual numeric pattern: 1=normal,
2=t-distribution and 3=Gamma with a log-link.
A list with entries being the numeric vectors u (the
quantiles of the input values) and ld (the log density of each
observation).
univarDens(): old name
Simulate values from some generalized linear models
glm_sim(family, eta, phi, other_pars, link, quantiles = TRUE)glm_sim(family, eta, phi, other_pars, link, quantiles = TRUE)
family |
vector of distribution families |
eta |
list (or matrix) of linear forms |
phi |
vector of dispersion coefficients |
other_pars |
list of other parameters for specified family |
link |
link functions for GLMs |
quantiles |
logical indicating whether to return quantiles |
a numeric vector of weights
causl_family objectsInsert/remove extra list level for causl_family objects
insert_lev(x, target_class = "causl_family") rmv_lev(x, target_class = "causl_family")insert_lev(x, target_class = "causl_family") rmv_lev(x, target_class = "causl_family")
x |
list to insert level into |
target_class |
class to insert/remove level above |
rmv_lev(): remove a level of listing
causl_ objectsObtain link from causl_ objects
link(x, ...) ## S3 method for class 'causl_family' link(x, ...) ## S3 method for class 'causl_copula' link(x, ...)link(x, ...) ## S3 method for class 'causl_family' link(x, ...) ## S3 method for class 'causl_copula' link(x, ...)
x |
an object of class |
... |
other arguments (not currently used) |
link(causl_family): method for causl_family obect
link(causl_copula): method for causl_copula object
Helper function to apply link function
link_apply(eta, link, family_nm, inverse = TRUE)link_apply(eta, link, family_nm, inverse = TRUE)
eta |
inner product of |
link |
link function |
family_nm |
family of distributions to use |
inverse |
should inverse of specified function be applied? Defaults to true |
inner product eta transformed by inverse of link function
family and vars should have the same structure if vars is
specified.
link_setup( link, family, vars, sources = links_list, fam_list = list(family_vals) )link_setup( link, family, vars, sources = links_list, fam_list = list(family_vals) )
link |
input from |
family |
the list of families for random variables |
vars |
a list of vectors of variable names with the same structure as |
sources |
list of links for parametric families |
fam_list |
list of data frames in the same format as |
List of link functions for copuulas
links_cop(family)links_cop(family)
family |
either the name of the family or its integer representative |
This function returns the default link function for each possible copula. These are given in the table:
| value | family | link | link name |
| 1 | gaussian | logit((1+rho)/2) | logit2 |
| 2 | t | logit((1+rho)/2) | logit2 |
| 3 | Clayton | log(1+theta) | log1p |
| 4 | Gumbel | log(theta - 1) | log1m |
| 5 | Frank | logit(1 + theta) | log1p |
| 6 | Joe | identity | identity |
| 11 | FGM | logit((1+rho)/2) | logit2 |
This is a named list whose entries are character vectors of allowed link functions.
links_list linksListlinks_list linksList
An object of class list of length 8.
linksList is the old name for links_list
linksList: Old name
Log-likelihood for frugal parameterization
ll_frugal(pars, dat, formulas, family, link, kwd = "cop")ll_frugal(pars, dat, formulas, family, link, kwd = "cop")
pars |
parameter values |
dat |
|
formulas |
list of lists of formulas |
family |
families for variables and copula |
link |
list of link functions |
kwd |
string to use for copula |
Take collection of formulae and create one formula with all variables on the right-hand side of any of the originals.
merge_formulas(formulas)merge_formulas(formulas)
formulas |
list of formulas to merge |
Generic for modify
modify(x, ...) ## Default S3 method: modify(x, ...)modify(x, ...) ## Default S3 method: modify(x, ...)
x |
object to be modified |
... |
other arguments |
modify(default): default method
causl_model objectChange one or more components of a causl_model object.
## S3 method for class 'causl_model' modify(x, over = FALSE, formulas, family, pars, link, dat, method, kwd, ...)## S3 method for class 'causl_model' modify(x, over = FALSE, formulas, family, pars, link, dat, method, kwd, ...)
x |
an object of class |
over |
logical: should components be added/modified or entirely over-written? |
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
method |
either |
kwd |
word used for copula formula and parameters |
... |
other arguments (not used) This function can be used to modify a |
cm <- causl_model(formulas = list(Z ~ 1, X~ Z, Y ~ 1, ~ 1), family = list(3, 1, 1, 1), pars = list(Z = list(beta = 1, phi = 1), X = list(beta = c(0.5, 1)), Y = list(beta = 0, phi = 1), cop = list(beta = 1))) modify(cm, pars = list(cop = list(beta = 1.5))) cm$parscm <- causl_model(formulas = list(Z ~ 1, X~ Z, Y ~ 1, ~ 1), family = list(3, 1, 1, 1), pars = list(Z = list(beta = 1, phi = 1), X = list(beta = c(0.5, 1)), Y = list(beta = 0, phi = 1), cop = list(beta = 1))) modify(cm, pars = list(cop = list(beta = 1.5))) cm$pars
Negative log-likelihood
nll2( theta, dat, mm, beta, phi, inCop, fam_cop = 1, family, link, cop_pars = NULL, use_cpp = TRUE, other_pars = list() )nll2( theta, dat, mm, beta, phi, inCop, fam_cop = 1, family, link, cop_pars = NULL, use_cpp = TRUE, other_pars = list() )
theta |
concatenated vector of parameters ( |
dat |
matrix of data |
mm |
model matrix for use with |
beta |
(sparse) matrix of regression parameters for each variable and copula |
phi |
vector of dispersion parameters |
inCop |
vector of integers giving variables in |
fam_cop, family
|
integer and integer vector for copula and distribution families respectively |
link |
vector of link functions |
cop_pars |
other parameters for copula |
use_cpp |
logical: should Rcpp functions be used? |
other_pars |
other parameters to pass to |
The number of columns of beta should be the number of columns
in dat plus the number required to parameterize the copula. The first
few columns and the entries in phi are assumed to be in the order of
those in dat. If the th family for a variable does not require a
dispersion parameter then the value of phi[i] is ignored.
Sets up copula quantities only
pair_copula_setup(formulas, family, pars, LHSs, quans, ord)pair_copula_setup(formulas, family, pars, LHSs, quans, ord)
formulas |
list of formulas for copula only |
family |
list of families for copula only |
pars |
list of copula parameters |
LHSs |
left-hand sides for all variables |
quans |
character vector of already existing variables to include |
ord |
topological ordering |
Get parameter masks for regression parameters
par_masks(formulas, family = rep(1, nv), full_form)par_masks(formulas, family = rep(1, nv), full_form)
formulas |
formulas to create mask for |
family |
vector or list of families |
full_form |
(optionally) merged list of |
causl_model
Display output from causl_model
## S3 method for class 'causl_model' print(x, ...)## S3 method for class 'causl_model' print(x, ...)
x |
an object of class |
... |
additional arguments (not used) |
Ultimately should also work for ordinal and categorical cases
process_discrete_dens(dat, family, LHSs)process_discrete_dens(dat, family, LHSs)
dat |
data frame of observations |
family |
families for variables and copula as above. Should be the
same length as |
LHSs |
left-hand sides from |
New function to sort out links and families simultaneously
process_family_link(family, link, dims, func_return = get_family)process_family_link(family, link, dims, func_return = get_family)
family |
families for variables and copula |
link |
list of link functions |
dims |
number of variables in each class |
func_return |
function to use to process character arguments |
Process formulas, families and parameters
process_inputs( formulas, family, pars, link, dat, method = "inversion", control = list() ) process_formulas(formulas, method, len = 4) process_family(family, dims, link = link, func_return = get_family)process_inputs( formulas, family, pars, link, dat, method = "inversion", control = list() ) process_formulas(formulas, method, len = 4) process_family(family, dims, link = link, func_return = get_family)
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
method |
either |
control |
list of options for the algorithm |
len |
number of formulas |
dims |
number of variables in each class |
func_return |
function to use to process character arguments |
Function that processes and checks the validity of the main arguments
used for simulating data. The control argument only requires values useful
for obtaining the empirical (conditional) quantiles of pre-specified data.
For causl we use the get_family() function to process character based
arguments, but we allow for other functions to be used in packages that
build on this one.
process_formulas(): Process input for family variables
process_family(): Process input for family variables
Obtain quantiles for prespecified variables
process_prespecified( dat, prespec, cond = TRUE, nlevs = 5, cor_thresh = 0.25, tol = sqrt(.Machine$double.eps) )process_prespecified( dat, prespec, cond = TRUE, nlevs = 5, cor_thresh = 0.25, tol = sqrt(.Machine$double.eps) )
dat |
data frame containing variables |
prespec |
character vector of prespecified variables in |
cond |
logical: should conditional quantiles be estimated? |
nlevs |
maximum number of levels for 'discrete' variable |
cor_thresh |
threshold for when linear regression should be used to model dependence |
tol |
tolerance for similarity of ranks for applying uniform noise |
Currently takes the rank of each entry, and subtracts 1/2 and
normalizes by the number of entries. If there are ties they are
randomly sorted with a uniform random variable
in the symmetric interval around the rank of width .
nlevs is used to classify discrete vs continuous variables. If a variable
has more than this number of distinct values, it is considered to be
continuous. cor_thresh is used to determine when the correlation is small
enough that it need not be modelled. tol is for classifying which ranks
are sufficiently close together to add noise as a group.
Get weights for rejection sampling
rejectionWeights(dat, mms, family, pars, qden, link)rejectionWeights(dat, mms, family, pars, qden, link)
dat |
data frame of variables to change conditional distribution of |
mms |
list of model matrices |
family |
vector of distribution families |
pars |
parameters for new distributions |
qden |
functions for densities used to simulate variables |
link |
link functions for GLMs |
a numeric vector of weights
Rescale quantiles to conditional copula
rescale_cop(U, X, family = 1, pars, df, inverse = TRUE, cdf = FALSE) rescaleCop(U, X, beta, family = 1, df)rescale_cop(U, X, family = 1, pars, df, inverse = TRUE, cdf = FALSE) rescaleCop(U, X, beta, family = 1, df)
U |
matrix of quantiles |
X |
model matrix of covariates |
family |
variety of copula to use |
pars |
named list of parameters (see details) |
df |
degrees of freedom for t-Copula |
inverse |
should inverse CDF be returned? |
cdf |
should we evaluate the CDF copula, not the conditional? |
beta |
vector of regression parameters |
The variable to be transformed must be in the final column of
U, with variables being conditioned upon in the earlier columns.
family can be 1 for Gaussian, 2 for t, 3 for Clayton, 4 for
Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas. pars should
contain either
regression parameters (labelled as beta) or a constant rank correlation
tau; df must be specified if family = 2. U should have the
same length as X has rows, and X
should have the same number of columns as the length of pars$beta.
vector of rescaled quantiles
rescaleCop(): Old name, now deprecated
Rescale quantiles to arbitrary random variable.
rescale_var(U, X, pars, family = 1, link) rescaleVar(U, X, pars, family = 1, link)rescale_var(U, X, pars, family = 1, link) rescaleVar(U, X, pars, family = 1, link)
U |
vector of quantiles |
X |
model matrix of covariates |
pars |
list of parameters (see details) |
family |
family of distributions to use |
link |
link function |
family can be 1, 2, 3, 4 or 5 for Gaussian, t-distributed,
Gamma distributed, beta distributed or discrete respectively, and 11 for
ordinal variables.
pars should be a list with entries beta and
phi, as well as possibly df, trunc and nlevel if the family
is set to 2 or 5.
U should have the same length as X has rows, and
X should have the same number of columns as the length of
pars$beta.
vector of rescaled variables
rescaleVar(): Old name, now deprecated
Obtain samples from a causal model parameterized as in Evans and Didelez (2024).
rfrugal(n, causl_model, control = list()) rfrugalParam( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), family = c(1, 1, 1, 1), pars, link = NULL, dat = NULL, method = "inversion", control = list(), ... )rfrugal(n, causl_model, control = list()) rfrugalParam( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), family = c(1, 1, 1, 1), pars, link = NULL, dat = NULL, method = "inversion", control = list(), ... )
n |
number of samples required |
causl_model |
object of class |
control |
list of options for the algorithm |
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
method |
either |
... |
other arguments, such as custom families |
Samples from a given causal model under the frugal parameterization.
The entries for formula and family should each be a
list with four entries, corresponding to the , , and
the copula. formula determines the model, so it is crucial that
every variable to be simulated is represented there exactly once. Each
entry of that list can either be a single formula, or a list of formulae.
Each corresponding entry in family should be the same length as the
list in formula or of length 1 (in which case it will be repeated
for all the variables therein).
We use the following codes for different families of distributions:
| val | family |
| 0 | binomial |
| 1 | gaussian |
| 2 | t |
| 3 | Gamma |
| 4 | beta |
| 5 | binomial |
| 6 | lognormal |
| 11 | ordinal |
| 10 | categorical |
The family variables for the copula are also numeric and taken from
VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton,
4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.
pars should be a named list containing variable names that correspond
to the LHS of
formulae in formulas. Each of these should themselves be a list
containing beta (a vector of regression parameters) and (possibly)
phi, a dispersion parameter. For any discrete variable that is a
treatment, you can also specify p, an initial proportion to simulate
from (otherwise this defaults to 0.5).
Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit, probit, and log links are available.
A variety of sampling methods are implemented. The
inversion method with pair-copulas is the default (method="inversion"),
but we cam also use a multivariate copula (method="inversion_mv") or even
rejection sampling (method="rejection"). Note that the inveresion_mv
method simulates the entire copula, so it cannot depend upon intermediate
variables.
The only control parameters are cop: which gives a keyword for the
copula that defaults to "cop"; quiet which defaults to FALSE but will
reduce output if set to TRUE; and (if rejection sampling is selected)
careful: this logical enables one to implement the full rejection
sampling method, which means we do get exact samples (note this method
is generally very slow, especially if we have an outlying value, so the
default is FALSE).
A data frame containing the simulated data.
rfrugalParam(): old function for simulation
pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) rfrugalParam(100, pars = pars)pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) rfrugalParam(100, pars = pars)
Sample from multivariate copulas
rGaussCop(n, Sigma) rtCop(n, Sigma, df) rfgmCopula(n, d = 2, alpha)rGaussCop(n, Sigma) rtCop(n, Sigma, df) rfgmCopula(n, d = 2, alpha)
n |
sample size |
Sigma |
in which each slice is a correlation matrix |
df |
degrees of freedom |
d |
dimension of copula |
alpha |
(vector of) parameter values |
Quicker than rCopula.
Note that rfgmCopula only works for .
A vector of the simulated random variables.
rGaussCop(): Gaussian copula
rtCop(): t-copula
rfgmCopula(): FGM-copula
Compute robust standard errors
sandwich_errors(func, pars, method = "simple", trace = 0L, other_args)sandwich_errors(func, pars, method = "simple", trace = 0L, other_args)
func |
function to be optimized |
pars |
point at which function should be optimized |
method |
taken from |
trace |
provides more details if > 0. |
other_args |
other arguments |
Simulate copula values
sim_copula(dat, family, par, df, model_matrix) sim_CopVal(dat, family, par, df, model_matrix)sim_copula(dat, family, par, df, model_matrix) sim_CopVal(dat, family, par, df, model_matrix)
dat |
data frame with empty columns |
family |
numeric indicator of copula type |
par |
mandatory parameters |
df |
optional parameters |
model_matrix |
design matrix for covariates |
Returns data frame containing columns y
and z1, ..., zk.
The family variables are numeric and taken from VineCopula.
Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel,
5 for Frank, 6 for Joe and 11 for FGM copulas.
A data frame of the same dimension as dat containing the
simulated values.
sim_CopVal(): Old name, now deprecated
Simulate for single time-step
sim_inversion(out, proc_inputs) sim_multi(out, proc_inputs) sim_rejection(out, proc_inputs, careful)sim_inversion(out, proc_inputs) sim_multi(out, proc_inputs) sim_rejection(out, proc_inputs, careful)
out |
data frame for output |
proc_inputs |
output of |
careful |
should full, slower method be used? |
sim_inversion and sim_rejection correspond to
performing the sampling by inversion or using rejection sampling.
sim_multi first simulates from the copula then transforms to the
correct margins in the correct causal ordering
sim_multi(): simulation with multivariate copula
sim_rejection(): Rejection sampling code
Each entry formulas, family, pars, link is a list
with two entries, the first referring to the variable being simulated and the
second to the pair-copulas being used.
If the quantile is smaller than tol or bigger than 1 - tol the
associated value is moved to tol and 1 - tol respectively.
sim_variable( n, formulas, family, pars, link, dat, quantiles, tol = 10 * .Machine$double.eps )sim_variable( n, formulas, family, pars, link, dat, quantiles, tol = 10 * .Machine$double.eps )
n |
sample size |
formulas |
list consisting of a formula for the output variables and a list of formulae for the pair-copula |
family |
list containing family variable |
pars |
list with two entries, first a list of parameters for response, and second a further list of parameters for pair-copula |
link |
list of same form as |
dat |
data frame of current variables |
quantiles |
data frame of quantiles |
tol |
tolerance for quantile closeness to 0 or 1 |
The data frame dat with an additional column given by the left-hand side of formula[[1]].
Simulate from vine copula
sim_vinecop(dat, family, par, df = NULL, model_matrix, link)sim_vinecop(dat, family, par, df = NULL, model_matrix, link)
dat |
data frame to be filled in |
family |
family to simulate from |
par |
matrix of parameters |
df |
extra parameters for t-copula |
model_matrix |
design matrix for covariates |
link |
link functions for parameters (currently unused) |
A data frame of the same dimensions as dat.
Simulate initial X values
sim_X(n, fam_x, theta, offset, sim = TRUE)sim_X(n, fam_x, theta, offset, sim = TRUE)
n |
number of observations |
fam_x |
number for distribution family |
theta |
parameters for model |
offset |
optional mean correction |
sim |
should variables be simulated? |
Returns a list that includes a data frame containing a column
x, as well as the density that was used to generate it. Possible
families are Gaussian (=1), t (=2), Exponential (=3), beta (=4)
Bernoulli/categorical (=5) and log-normal (=6).
For the exponential distribution, theta is the mean.
Beta can take one or two parameters, and if there is only
one it is just repeated.
The offset parameter alters the median for the normal and t-distributions,
or the median of the logarithm in the case of a log-normal.
A list with two entries: x a vector of the simulated
values, and qden, which contains a function that evaluates to the
density of the distribution used to generate those values.
Transform categorical or ordinal parameters into probabilities
theta_to_p_ord(theta) theta_to_p_cat(theta)theta_to_p_ord(theta) theta_to_p_cat(theta)
theta |
provided log-linear parameters |
Returns the probabilities implied by given log-linear parameters.
theta_to_p_ord(): for ordinal variables
Obtain variable ordering from formulas
var_order(formulas, dims, inc_cop = TRUE, method)var_order(formulas, dims, inc_cop = TRUE, method)
formulas |
list of lists of formulas |
dims |
number of variables in each class |
inc_cop |
logical indicating whether to include copula in the ordering |
method |
either |