Title: | Methods for Specifying, Simulating from and Fitting Causal Models |
---|---|
Description: | Model multivariate distributions using causal parameters. |
Authors: | Robin Evans [aut, cre], Xi Lin [aut] |
Maintainer: | Robin Evans <[email protected]> |
License: | GPL-2 |
Version: | 0.9.6 |
Built: | 2024-12-28 06:13:58 UTC |
Source: | https://github.com/rje42/causl |
Adjust values of copula parameters individually
adj_vars( cop_pars, strong = character(0), weak = character(0), factor = c(5, 0.2) )
adj_vars( cop_pars, strong = character(0), weak = character(0), factor = c(5, 0.2) )
cop_pars |
list of copula parameters, as output by |
strong , weak
|
character vectors of variables to make strong or weak |
factor |
vector of two real values, to multiply coefficients by |
Obtain samples from a causal model using the rejection sampling approach of Evans and Didelez (2024).
causalSamp( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), pars, family, link = NULL, dat = NULL, method = "rejection", control = list(), seed )
causalSamp( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), pars, family, link = NULL, dat = NULL, method = "rejection", control = list(), seed )
n |
number of samples required |
formulas |
list of lists of formulas |
pars |
list of lists of parameters |
family |
families for Z,X,Y and copula |
link |
list of link functions |
dat |
data frame of covariates |
method |
only |
control |
list of options for the algorithm |
seed |
random seed used for replication |
Samples from a given causal model using rejection sampling (or, if everything is discrete, direct sampling).
The entries for formula
and family
should each be a
list with four entries, corresponding to the ,
,
and
the copula.
formula
determines the model, so it is crucial that
every variable to be simulated is represented there exactly once. Each
entry of that list can either be a single formula, or a list of formulae.
Each corresponding entry in family
should be the same length as the
list in formula
or of length 1 (in which case it will be repeated
for all the variables therein).
We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.
The family variables for the copula are also numeric and taken from
VineCopula
.
Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel,
5 for Frank, 6 for Joe and 11 for FGM copulas.
pars
should be a named list containing: either entries z
,
x
, y
and cop
, or variable names that correspond to the
LHS of formulae in formulas
. Each of these should themselves be a list
containing beta
(a vector of regression parameters) and (possibly)
phi
, a dispersion parameter. For any discrete variable that is a
treatment, you can also specify p
, an initial proportion to simulate
from (otherwise this defaults to 0.5).
Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit and probit links are available.
Control parameters are oversamp
(default value 10), trace
(default
value 0, increasing to 1 increases verbosity of output),
max_oversamp
(default value 1000), warn
(which currently does
nothing), max_wt
which is set to 1, and increases each time the function
is recalled.
Control parameters also include cop
, which gives a keyword for the
copula that defaults to "cop"
.
This function is kept largely for the replication of simulations from Evans and Didelez (2024).
A data frame containing the simulated data.
Evans, R.J. and Didelez, V. Parameterizing and simulating from causal models (with discussion). Journal of the Royal Statistical Society, Series B, 2024.
pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) causalSamp(100, pars = pars)
pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) causalSamp(100, pars = pars)
Copula family functions
get_copula(family_index, link = NULL) gaussian_causl_cop(link) t_causl_cop(link) sim_cop(causl_copula, beta_matrix, other_pars, model_matrix)
get_copula(family_index, link = NULL) gaussian_causl_cop(link) t_causl_cop(link) sim_cop(causl_copula, beta_matrix, other_pars, model_matrix)
family_index |
integer representing copula family |
link |
link function |
causl_copula |
family from which to simulate |
beta_matrix |
matrix of regression coefficients |
other_pars |
other parameters for some families |
model_matrix |
matrix of regressors |
get_copula
returns the causl_copula
that corresponds to the particular
integer given. So far, 1
for Gaussian and 2
for t copulas are implemented.
The copula_fam
functions return a list that contains, for each valid family:
name
: the name of the family;
ddist
: function to evaluate the density;
rdist
: function to obtain samples;
pars
: character vector of the parameter names;
default
: list of default values;
link
: the chosen link function.
get_copula()
: getter copula family
gaussian_causl_cop()
: Gaussian copula family
t_causl_cop()
: t copula family
sim_cop()
: simulate from copula family
causl_model
objectThis defines a causl_model
object, that can be used either for simulation
or inference.
causl_model( formulas, family, pars, link, dat = NULL, method = "inversion", kwd = "cop" )
causl_model( formulas, family, pars, link, dat = NULL, method = "inversion", kwd = "cop" )
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
method |
either |
kwd |
word used for copula formula and parameters |
The components formulas
and family
must both be specified, and have
matching lengths. If pars
is specified, then the model can be used for
simulation and inference, if not then only for inference. link
is optional,
and if not specified default links will be used.
Checks existence of beta vectors and then assesses appropriate length
check_rej(formulas, family, pars, dims, kwd) check_pars(formulas, family, pars, dummy_dat, LHSs, kwd, dims)
check_rej(formulas, family, pars, dims, kwd) check_pars(formulas, family, pars, dummy_dat, LHSs, kwd, dims)
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
dims |
number of variables in each class |
kwd |
keyword for copula |
dummy_dat |
a dummy dataset, as generated by |
LHSs |
left-hand sides from |
check_rej()
: Checks for rejection sampling
Density of a multivariate copula
dGaussCop(x, Sigma, log = FALSE, use_cpp = TRUE, N) dtCop(x, Sigma, df, log = FALSE, use_cpp = TRUE) dfgmCopula(x, alpha)
dGaussCop(x, Sigma, log = FALSE, use_cpp = TRUE, N) dtCop(x, Sigma, df, log = FALSE, use_cpp = TRUE) dfgmCopula(x, alpha)
x |
samples on (0,1) |
Sigma |
collection of matrices |
log |
logical: return log=density? |
use_cpp |
logical: use the C routine? |
N |
optional integer for number of covariance matrices |
df |
degrees of freedom |
alpha |
parameter for copula |
Computes the density for data from a
Gaussian or t-copula. Currently use_cpp
only
works for dGaussCop
.
numeric vector of densities
dGaussCop()
: Gaussian copula
dtCop()
: t-Copula density
dfgmCopula()
: bivariate FGM copula
Vectorized conditional copula function
cVCopula(U, copula, param, par2, inverse = FALSE)
cVCopula(U, copula, param, par2, inverse = FALSE)
U |
matrix of quantiles |
copula |
family of copula to use |
param |
vector of parameters |
par2 |
Degrees of freedom for t-copula |
inverse |
should inverse CDF be returned? |
Should have nrow(U) = length(param)
.
Density of a Mixed Copula
dGaussDiscCop(x, m, Sigma, eta, log = FALSE, use_cpp = TRUE)
dGaussDiscCop(x, m, Sigma, eta, log = FALSE, use_cpp = TRUE)
x |
matrix of samples on (0,1) |
m |
number of discrete variables |
Sigma |
collection of matrices |
eta |
eta matrix |
log |
logical: return log=density? |
use_cpp |
logical: use the C routine? |
numeric vector of densities
Extract parameter estimates and standard errors
ests_ses(fit, beta, merged_formula, kwd)
ests_ses(fit, beta, merged_formula, kwd)
fit |
output of |
beta |
output of |
merged_formula |
formula with all variables on RHS |
kwd |
keyword for copula variable |
Obtain list of family functions from numeric or character representation
family_list(family, func_return = get_family)
family_list(family, func_return = get_family)
family |
numeric or character vector of families |
func_return |
function to apply to list of families |
family_list(c(1,3,5)) family_list(c("t","binomial"))
family_list(c(1,3,5)) family_list(c("t","binomial"))
Data frames containing
val
: an integer
family
: a vector giving the associated parametric family for that integer.
The integer val
may be used in place of the name of the parametric family
when specifying the family
object.
family_vals familyVals copula_vals
family_vals familyVals copula_vals
family_vals
is a data.frame
with 9 rows and 2 columns
familyVals
is the same object as family_vals
copula_vals
is a data.frame
with 7 rows and 2 columns
familyVals
will be removed in version 1.0.0.
familyVals
: Old name
copula_vals
: Values for copula families
Fit multivariate copula regression model
fit_causl( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, cop_pars, use_cpp = TRUE, control = list(), other_pars = list() ) fitCausal( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, par2, sandwich = TRUE, use_cpp = TRUE, control = list() )
fit_causl( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, cop_pars, use_cpp = TRUE, control = list(), other_pars = list() ) fitCausal( dat, formulas = list(y ~ x, z ~ 1, ~x), family = rep(1, length(formulas)), link, par2, sandwich = TRUE, use_cpp = TRUE, control = list() )
dat |
data frame of observations |
formulas |
list of model formulae, for Y, for the Z variables, and finally for the copula |
family |
families for the Y and Z distributions, and the copula. Should
be the same length as |
link |
link functions for each variable |
cop_pars |
additional parameters for copula if required |
use_cpp |
logical: should C++ routines be used? |
control |
list of parameters to be passed to |
other_pars |
list of other parameters to use (e.g. degrees of freedom for a t-distribution) |
par2 |
former name for |
sandwich |
logical: should sandwich standard errors be returned? |
forms
is list of three or more formulae giving
predictors of y-margin, z-margin(s) and interaction
parameters. Fit is by maximum likelihood.
control
has the same arguments as the argument in optim
, as well
as sandwich
, a logical indicating if sandwich estimates of standard errors
should be computed, newton
, a logical which controls whether Newton iterates should be
performed at the end, and cop
which can edit the restricted variable name
for the left-hand side of formulae.
Useful for altering are trace
(1 shows steps of optimization) and
maxit
for the number of steps.
The list other_pars
should be named with the relevant variables, and
each entry should be a named list containing the relevant parameters.
Warning By default, none of the variables should be called cop
, as
this is reserved for the copula. The reserved word can be changed using
the argument cop
within control.
Returns a list of class cop_fit
.
fitCausal()
: old name
Tools for manipulating formulas
lhs(formulas, surv = FALSE) lhs(formulas) <- value rhs_vars(formulas) tidy_formulas(formulas, kwd, prefix = "V")
lhs(formulas, surv = FALSE) lhs(formulas) <- value rhs_vars(formulas) tidy_formulas(formulas, kwd, prefix = "V")
formulas |
list of formulae |
surv |
logical indicating whether to treat as survey data |
value |
character vector to assign |
kwd |
string used to denote copula |
prefix |
string to begin each new variable name |
lhs
returns a character vector containing left-hand sides of a
list of formulae. If surv=TRUE
then two responses are returned in the
event of the left-hand side being a valid Surv
object. lhs<-
allows
one to assign the left-hand sides of variables in the obvious way.
tidy_formulas
ensures that all formulae in a list have a left hand side,
by giving them names of the form Vn
where n
is some positive integer. The
prefix V
can be changed using the argument prefix
.
rhs_vars
extracts all the variables used on the right-hand sides of a list
of formulas.
lhs()
: Obtain left-hand sides from list of formulas
lhs(formulas) <- value
: Assign left-hand sides to list of formulas
rhs_vars()
: Extract variables from right-hand sides
tidy_formulas()
: Tidy up formulae
Attempts to ensure that values after passing through the standard link function used for Gaussian copulas will have the specified value. For other copulas this will not target the correct range, but it can still be used by considering how the relevant link functions work for the Gaussian and other copula.
gen_cop_pars(formulas, data, range = c(-1, 1), ...)
gen_cop_pars(formulas, data, range = c(-1, 1), ...)
formulas |
formulas as specified in |
data |
dataset to obtain parameterization for |
range |
range of parameters to target |
... |
other parameters to be included in each copula |
A list suitable for the cop
entry of the pars
argument of
rfrugalParam
Create a dummy dataset for the purpose of checking coefficient numbers
gen_dummy_dat(family, pars, dat, LHSs, dims)
gen_dummy_dat(family, pars, dat, LHSs, dims)
family |
families for variables and copula |
pars |
list of lists of parameters |
dat |
optional data frame of covariates |
LHSs |
left-hand sides from |
dims |
number of variables in each class |
Return causl_fam function from integer index
get_family(val) gaussian_causl_fam(link) t_causl_fam(link) Gamma_causl_fam(link) binomial_causl_fam(link) beta_causl_fam(link) categorical_causl_fam(link) ordinal_causl_fam(link)
get_family(val) gaussian_causl_fam(link) t_causl_fam(link) Gamma_causl_fam(link) binomial_causl_fam(link) beta_causl_fam(link) categorical_causl_fam(link) ordinal_causl_fam(link)
val |
integer corresponding to distributional family |
link |
link function |
The functions gaussian_causl_fam()
etc. represent the functions that are
returned by get_family()
.
A few function of this form can be defined by the user, and it should return the following:
name
: the name of the relevant family;
ddist
: a function returning the density of the distributions;
qdist
: a function returning the quantiles from probabilities;
rdist
: a function to sample values from the distribution;
pdist
: a cumulative distribution function;
pars
: a list of the names of the parameters used;
default
: a function that returns a list of the default values for an
observation and each of the parameters;
link
: the specified link function.
The function should also give the output the class "causl_family"
, so that
it is interpreted appropriately. Note that ddist
should have a log
argument, to allow the log-likelihood to be evaluated.
gaussian_causl_fam()
: Gaussian distribution family
t_causl_fam()
: Student's t distribution family
Gamma_causl_fam()
: Gamma distribution family
binomial_causl_fam()
: binomial distribution family
beta_causl_fam()
: beta distribution family
categorical_causl_fam()
: multinomial/categorical distribution family
ordinal_causl_fam()
: ordinal categorical distribution family
Get maximum weight for each segment of a distribution
get_max_weights(pars, forms_X, fam_X, qden, fam_Z, LHS_Z, ranges, link, ...)
get_max_weights(pars, forms_X, fam_X, qden, fam_Z, LHS_Z, ranges, link, ...)
pars |
list with all regression parameters |
forms_X |
formulae for treatments |
fam_X , fam_Z
|
vector of families for treatments and covariates |
qden |
density of proposals |
LHS_Z |
variables in covariates |
ranges |
range of segments |
link |
link functions for treatments |
... |
not currently used |
Get density of treatments
get_X_density(dat, eta, phi, qden, family, link, par2, log = FALSE)
get_X_density(dat, eta, phi, qden, family, link, par2, log = FALSE)
dat |
data frame of variables to change conditional distribution of |
eta |
list (or matrix) of linear forms |
phi |
vector of dispersion coefficients |
qden |
functions for densities used to simulate variables |
family |
vector of distribution families |
link |
link functions for GLMs |
par2 |
vector of degrees of freedom |
log |
logical: should log-density be returned? |
a numeric vector of weights
Get univariate densities and uniform order statistics
glm_dens(x, eta, phi, other_pars, family = 1, link) univarDens(x, eta, phi, other_pars, family = 1, link)
glm_dens(x, eta, phi, other_pars, family = 1, link) univarDens(x, eta, phi, other_pars, family = 1, link)
x |
vector of observations |
eta , phi
|
linear component and dispersion parameters |
other_pars |
other parameters for certain families |
family |
numeric indicator of family |
link |
link function |
fam
follows the usual numeric pattern: 1=normal,
2=t-distribution and 3=Gamma with a log-link.
A list with entries being the numeric vectors u
(the
quantiles of the input values) and ld
(the log density of each
observation).
univarDens()
: old name
Simulate values from some generalized linear models
glm_sim(family, eta, phi, other_pars, link, quantiles = TRUE)
glm_sim(family, eta, phi, other_pars, link, quantiles = TRUE)
family |
vector of distribution families |
eta |
list (or matrix) of linear forms |
phi |
vector of dispersion coefficients |
other_pars |
list of other parameters for specified family |
link |
link functions for GLMs |
quantiles |
logical indicating whether to return quantiles |
a numeric vector of weights
Check if family is categorical
is_categorical(x)
is_categorical(x)
x |
a family, either numerical, a name, or a |
Returns a logical indicating if the object is the input object
represents a categorical or ordinal variable. If it cannot represent a
family then NA
is returned.
Default method
link(x, ...)
link(x, ...)
`...` |
other arguments (not currently used) |
Set up link functions
link_setup( link, family, vars, sources = links_list, fam_list = list(family_vals) )
link_setup( link, family, vars, sources = links_list, fam_list = list(family_vals) )
link |
input given to |
family |
the list of families for |
vars |
a list of vectors of variable names with the same structure as |
sources |
list of links for parametric families |
fam_list |
list of data frames in the same format as |
causl_family
or causl_copula
obectObtain link from a causl_family
or causl_copula
obect
## S3 method for class 'causl_family' link(x, ...) ## S3 method for class 'causl_copula' link(x, ...)
## S3 method for class 'causl_family' link(x, ...) ## S3 method for class 'causl_copula' link(x, ...)
x |
an object of class |
link(causl_copula)
: method for causl_copula
object
List of link functions for copuulas
links_cop(family)
links_cop(family)
family |
either the name of the family or its integer representative |
This function returns the default link function for each possible copula. These are given in the table:
value | family | link | link name |
1 | gaussian | logit((1+rho)/2) | logit2 |
2 | t | logit((1+rho)/2) | logit2 |
3 | Clayton | log(1+theta) | log1p |
4 | Gumbel | log(theta - 1) | log1m |
5 | Frank | logit(1 + theta) | log1p |
6 | Joe | identity | identity |
11 | FGM | logit((1+rho)/2) | logit2 |
This is a named list whose entries are character vectors of allowed link functions.
links_list linksList
links_list linksList
An object of class list
of length 8.
linksList
is the old name for links_list
linksList
: Old name
Log-likelihood for frugal parameterization
ll_frugal(pars, dat, formulas, family, link, kwd = "cop")
ll_frugal(pars, dat, formulas, family, link, kwd = "cop")
pars |
parameter values |
dat |
|
formulas |
list of lists of formulas |
family |
families for variables and copula |
link |
list of link functions |
kwd |
string to use for copula |
Take collection of formulae and create one formula with all variables on the right-hand side of any of the originals.
merge_formulas(formulas)
merge_formulas(formulas)
formulas |
list of formulas to merge |
causl_model
objectChange one or more components of a causl_model
object.
modify.causl_model( x, over = FALSE, formulas, family, pars, link, dat, method, kwd )
modify.causl_model( x, over = FALSE, formulas, family, pars, link, dat, method, kwd )
x |
an object of class |
over |
logical: should components be added/modified or entirely over-written? This function can be used to modify |
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
method |
either |
kwd |
word used for copula formula and parameters |
Negative log-likelihood
nll2( theta, dat, mm, beta, phi, inCop, fam_cop = 1, family, link, cop_pars = NULL, use_cpp = TRUE, other_pars = list() )
nll2( theta, dat, mm, beta, phi, inCop, fam_cop = 1, family, link, cop_pars = NULL, use_cpp = TRUE, other_pars = list() )
theta |
concatenated vector of parameters ( |
dat |
matrix of data |
mm |
model matrix for use with |
beta |
(sparse) matrix of regression parameters for each variable and copula |
phi |
vector of dispersion parameters |
inCop |
vector of integers giving variables in |
fam_cop , family
|
integer and integer vector for copula and distribution families respectively |
link |
vector of link functions |
cop_pars |
other parameters for copula |
use_cpp |
logical: should Rcpp functions be used? |
other_pars |
other parameters to pass to |
The number of columns of beta
should be the number of columns
in dat
plus the number required to parameterize the copula. The first
few columns and the entries in phi
are assumed to be in the order of
those in dat
. If the th
family for a variable does not require a dispersion parameter then the value of
phi[i]
is ignored.
Sets up copula quantities only
pair_copula_setup(formulas, family, pars, LHSs, quans, ord)
pair_copula_setup(formulas, family, pars, LHSs, quans, ord)
formulas |
list of formulas for copula only |
family |
list of families for copula only |
pars |
list of copula parameters |
LHSs |
left-hand sides for all variables |
quans |
character vector of already existing variables to include |
ord |
topological ordering |
Get parameter masks for regression parameters
par_masks(formulas, family = rep(1, nv), full_form)
par_masks(formulas, family = rep(1, nv), full_form)
formulas |
formulas to create mask for |
family |
vector or list of families |
full_form |
(optionally) merged list of |
causl_model
Display output from causl_model
## S3 method for class 'causl_model' print(x, ...)
## S3 method for class 'causl_model' print(x, ...)
x |
an object of class |
... |
additional arguments (not used) |
Ultimately should also work for ordinal and categorical cases
process_discrete_dens(dat, family, LHSs)
process_discrete_dens(dat, family, LHSs)
dat |
data frame of observations |
family |
families for the Y and Z distributions, and the copula. Should
be the same length as |
LHSs |
left-hand sides from |
Process formulas, families and parameters
process_inputs(formulas, family, pars, link, dat, kwd, method = "inversion") process_formulas(formulas, len = 4) process_family(family, dims, func_return = get_family)
process_inputs(formulas, family, pars, link, dat, kwd, method = "inversion") process_formulas(formulas, len = 4) process_family(family, dims, func_return = get_family)
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
kwd |
keyword for copula |
method |
either |
len |
number of formulas |
dims |
number of variables in each class |
func_return |
function to use to process character arguments |
Function that processes and checks the validity of the main arguments used for simulating data.
For causl
we use the get_family()
function to process character based
arguments, but we allow for other functions to be used in packages that
build on this one.
process_formulas()
: Process input for family variables
process_family()
: Process input for family variables
Obtain quantiles for prespecified variables
process_prespecified(dat, prespec)
process_prespecified(dat, prespec)
dat |
data frame containing variables |
prespec |
character vector of prespecified variables in |
Currently takes the rank of each entry, and subtracts 1/2 and
normalizes by the number of entries. If there are ties they are
randomly sorted with a uniform random variable
in the symmetric interval around the rank of width
.
Get weights for rejection sampling
rejectionWeights(dat, mms, family, pars, qden, link)
rejectionWeights(dat, mms, family, pars, qden, link)
dat |
data frame of variables to change conditional distribution of |
mms |
list of model matrices |
family |
vector of distribution families |
pars |
parameters for new distributions |
qden |
functions for densities used to simulate variables |
link |
link functions for GLMs |
a numeric vector of weights
Rescale quantiles to conditional copula
rescale_cop(U, X, beta, family = 1, par2) rescaleCop(U, X, beta, family = 1, par2)
rescale_cop(U, X, beta, family = 1, par2) rescaleCop(U, X, beta, family = 1, par2)
U |
matrix of quantiles |
X |
model matrix of covariates |
beta |
list of parameters (see details) |
family |
variety of copula to use |
par2 |
additional parameter for some copulas |
The variable to be transformed must be in the final column of
U
, with variables being conditioned upon in the earlier columns.
family
can be 1 for Gaussian, 2 for t, 3 for Clayton, 4 for
Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas. Gamma distributed,
beta distributed or discrete respectively. pars
should be a list
with entries beta
and phi
, as well as possibly par2
if
family=2
.
U
should have the same length as X
has rows, and X
should have the same number of columns as the length of pars$beta
.
vector of rescaled quantiles
rescaleCop()
: Old name, now deprecated
Rescale quantiles to arbitrary random variable.
rescale_var(U, X, pars, family = 1, link) rescaleVar(U, X, pars, family = 1, link)
rescale_var(U, X, pars, family = 1, link) rescaleVar(U, X, pars, family = 1, link)
U |
vector of quantiles |
X |
model matrix of covariates |
pars |
list of parameters (see details) |
family |
family of distributions to use |
link |
link function |
family
can be 1, 2, 3, 4 or 5 for Gaussian, t-distributed,
Gamma distributed, beta distributed or discrete respectively, and 11 for
ordinal variables.
pars
should be a list with entries beta
and
phi
, as well as possibly par2
, trunc
and nlevel
if the family
is set to 2 or 5.
U
should have the same length as X
has rows, and
X
should have the same number of columns as the length of
pars$beta
.
vector of rescaled variables
rescaleVar()
: Old name, now deprecated
Obtain samples from a causal model parameterized as in Evans and Didelez (2024).
rfrugal(n, causl_model, control = list()) rfrugalParam( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), family = c(1, 1, 1, 1), pars, link = NULL, dat = NULL, method = "inversion", control = list(), ... )
rfrugal(n, causl_model, control = list()) rfrugalParam( n, formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)), family = c(1, 1, 1, 1), pars, link = NULL, dat = NULL, method = "inversion", control = list(), ... )
n |
number of samples required |
causl_model |
object of class |
control |
list of options for the algorithm |
formulas |
list of lists of formulas |
family |
families for variables and copula |
pars |
list of lists of parameters |
link |
list of link functions |
dat |
optional data frame of covariates |
method |
either |
... |
other arguments, such as custom families |
Samples from a given causal model under the frugal parameterization.
The entries for formula
and family
should each be a
list with four entries, corresponding to the ,
,
and
the copula.
formula
determines the model, so it is crucial that
every variable to be simulated is represented there exactly once. Each
entry of that list can either be a single formula, or a list of formulae.
Each corresponding entry in family
should be the same length as the
list in formula
or of length 1 (in which case it will be repeated
for all the variables therein).
We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.
The family variables for the copula are also numeric and taken from
VineCopula
. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton,
4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.
pars
should be a named list containing variable names that correspond
to the LHS of
formulae in formulas
. Each of these should themselves be a list
containing beta
(a vector of regression parameters) and (possibly)
phi
, a dispersion parameter. For any discrete variable that is a
treatment, you can also specify p
, an initial proportion to simulate
from (otherwise this defaults to 0.5).
Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit, probit, and log links are available.
A variety of sampling methods are implemented. The
inversion method with pair-copulas is the default (method="inversion"
),
but we cam also use a multivariate copula (method="inversion_mv"
) or even
rejection sampling (method="rejection"
).
The only control parameters are cop
: which gives a keyword for the
copula that defaults to "cop"
; quiet
which defaults to FALSE
but will
reduce output if set to TRUE
; and (if rejection sampling is selected)
careful
: this logical enables one to implement the full rejection
sampling method, which means we do get exact samples (note this method
is generally very slow, especially if we have an outlying value, so the
default is FALSE
).
A data frame containing the simulated data.
rfrugalParam()
: old function for simulation
pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) rfrugalParam(100, pars = pars)
pars <- list(z=list(beta=0, phi=1), x=list(beta=c(0,0.5), phi=1), y=list(beta=c(0,0.5), phi=0.5), cop=list(beta=1)) rfrugalParam(100, pars = pars)
Sample from multivariate copulas
rGaussCop(n, Sigma) rtCop(n, Sigma, df) rfgmCopula(n, d = 2, alpha)
rGaussCop(n, Sigma) rtCop(n, Sigma, df) rfgmCopula(n, d = 2, alpha)
n |
sample size |
Sigma |
in which each slice is a correlation matrix |
df |
degrees of freedom |
d |
dimension of copula |
alpha |
(vector of) parameter values |
Quicker than rCopula.
Note that rfgmCopula
only works for .
A vector of the simulated random variables.
rGaussCop()
: Gaussian copula
rtCop()
: t-copula
rfgmCopula()
: FGM-copula
Simulate copula values
sim_copula(dat, family, par, par2, model_matrix) sim_CopVal(dat, family, par, par2, model_matrix)
sim_copula(dat, family, par, par2, model_matrix) sim_CopVal(dat, family, par, par2, model_matrix)
dat |
data frame with empty columns |
family |
numeric indicator of copula type |
par |
mandatory parameters |
par2 |
optional parameters |
model_matrix |
design matrix for covariates |
Returns data frame containing columns y
and z1, ..., zk
.
The family variables are numeric and taken from VineCopula
.
Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel,
5 for Frank, 6 for Joe and 11 for FGM copulas.
A data frame of the same dimension as dat
containing the
simulated values.
sim_CopVal()
: Old name, now deprecated
Simulate for single time-step
sim_inversion(out, proc_inputs) sim_multi(out, proc_inputs) sim_rejection(out, proc_inputs, careful)
sim_inversion(out, proc_inputs) sim_multi(out, proc_inputs) sim_rejection(out, proc_inputs, careful)
out |
data frame for output |
proc_inputs |
output of |
careful |
should full, slower method be used? |
sim_inversion
and sim_rejection
correspond to
performing the sampling by inversion or using rejection sampling.
sim_multi
first simulates from the copula then transforms to the
correct margins in the correct causal ordering
sim_multi()
: simulation with multivariate copula
sim_rejection()
: Rejection sampling code
Each entry formulas
, family
, pars
, link
is a list
with two entries, the first referring to the variable being simulated and the
second to the pair-copulas being used.
sim_variable(n, formulas, family, pars, link, dat, quantiles)
sim_variable(n, formulas, family, pars, link, dat, quantiles)
n |
sample size |
formulas |
list consisting of a formula for the output variables and a list of formulae for the pair-copula |
family |
list containing family variable |
pars |
list with two entries, first a list of parameters for response, and second a further list of parameters for pair-copula |
link |
list of same form as |
dat |
data frame of current variables |
quantiles |
data frame of quantiles |
The data frame dat
with an additional column given by the left-hand side of formula[[1]]
.
Simulate from vine copula
sim_vinecop(dat, family, par, par2 = NULL, model_matrix, link)
sim_vinecop(dat, family, par, par2 = NULL, model_matrix, link)
dat |
data frame to be filled in |
family |
family to simulate from |
par |
matrix of parameters |
par2 |
extra parameters for t-copula |
model_matrix |
design matrix for covariates |
link |
link functions for parameters (currently unused) |
A data frame of the same dimensions as dat
.
Simulate initial X values
sim_X(n, fam_x, theta, offset, sim = TRUE)
sim_X(n, fam_x, theta, offset, sim = TRUE)
n |
number of observations |
fam_x |
number for distribution family |
theta |
parameters for model |
offset |
optional mean correction |
sim |
should variables be simulated? |
Returns a list that includes a data frame containing a column
x
, as well as the density that was used to generate it. Possible
families are Gaussian (=1), t (=2), Exponential (=3), beta (=4)
Bernoulli/categorical (=5) and log-normal (=6).
For the exponential distribution, theta
is the mean.
Beta can take one or two parameters, and if there is only
one it is just repeated.
The offset
parameter alters the median for the normal and t-distributions,
or the median of the logarithm in the case of a log-normal.
A list with two entries: x
a vector of the simulated
values, and qden
, which contains a function that evaluates to the
density of the distribution used to generate those values.
Transform categorical or ordinal parameters into probabilities
theta_to_p_ord(theta) theta_to_p_cat(theta)
theta_to_p_ord(theta) theta_to_p_cat(theta)
theta |
provided log-linear parameters |
Returns the probabilities implied by given log-linear parameters.
theta_to_p_ord()
: for ordinal variables
Obtain variable ordering from formulas
var_order(formulas, dims, inc_cop = TRUE, method)
var_order(formulas, dims, inc_cop = TRUE, method)
formulas |
list of lists of formulas |
dims |
number of variables in each class |
inc_cop |
logical indicating whether to include copula in the ordering |
method |
either |