Package 'causl'

Title: Methods for Specifying, Simulating from and Fitting Causal Models
Description: Model multivariate distributions using causal parameters.
Authors: Robin Evans [aut, cre], Xi Lin [aut]
Maintainer: Robin Evans <[email protected]>
License: GPL-2
Version: 0.9.5
Built: 2024-09-25 09:20:10 UTC
Source: https://github.com/rje42/causl

Help Index


Adjust values of copula parameters individually

Description

Adjust values of copula parameters individually

Usage

adj_vars(
  cop_pars,
  strong = character(0),
  weak = character(0),
  factor = c(5, 0.2)
)

Arguments

cop_pars

list of copula parameters, as output by gen_cop_pars()

strong, weak

character vectors of variables to make strong or weak

factor

vector of two real values, to multiply coefficients by


Sample from a causal model

Description

Obtain samples from a causal model using the rejection sampling approach of Evans and Didelez (2024).

Usage

causalSamp(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  pars,
  family,
  link = NULL,
  dat = NULL,
  method = "rejection",
  control = list(),
  seed
)

Arguments

n

number of samples required

formulas

list of lists of formulas

pars

list of lists of parameters

family

families for Z,X,Y and copula

link

list of link functions

dat

data frame of covariates

method

only "rejection" is valid

control

list of options for the algorithm

seed

random seed used for replication

Details

Samples from a given causal model using rejection sampling (or, if everything is discrete, direct sampling).

The entries for formula and family should each be a list with four entries, corresponding to the ZZ, XX, YY and the copula. formula determines the model, so it is crucial that every variable to be simulated is represented there exactly once. Each entry of that list can either be a single formula, or a list of formulae. Each corresponding entry in family should be the same length as the list in formula or of length 1 (in which case it will be repeated for all the variables therein).

We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.

The family variables for the copula are also numeric and taken from VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.

pars should be a named list containing: either entries z, x, y and cop, or variable names that correspond to the LHS of formulae in formulas. Each of these should themselves be a list containing beta (a vector of regression parameters) and (possibly) phi, a dispersion parameter. For any discrete variable that is a treatment, you can also specify p, an initial proportion to simulate from (otherwise this defaults to 0.5).

Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit and probit links are available.

Control parameters are oversamp (default value 10), trace (default value 0, increasing to 1 increases verbosity of output), max_oversamp (default value 1000), warn (which currently does nothing), max_wt which is set to 1, and increases each time the function is recalled. Control parameters also include cop, which gives a keyword for the copula that defaults to "cop".

This function is kept largely for the replication of simulations from Evans and Didelez (2024).

Value

A data frame containing the simulated data.

References

Evans, R.J. and Didelez, V. Parameterizing and simulating from causal models (with discussion). Journal of the Royal Statistical Society, Series B, 2024.

Examples

pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
causalSamp(100, pars = pars)

Copula family functions

Description

Copula family functions

Usage

get_copula(family_index, link = NULL)

gaussian_causl_cop(link)

t_causl_cop(link)

sim_cop(causl_copula, beta_matrix, other_pars, model_matrix)

Arguments

family_index

integer representing copula family

link

link function

causl_copula

family from which to simulate

beta_matrix

matrix of regression coefficients

other_pars

other parameters for some families

model_matrix

matrix of regressors

Details

get_copula returns the causl_copula that corresponds to the particular integer given. So far, 1 for Gaussian and 2 for t copulas are implemented.

The copula_fam functions return a list that contains, for each valid family:

  • name: the name of the family;

  • ddist: function to evaluate the density;

  • rdist: function to obtain samples;

  • pars: character vector of the parameter names;

  • default: list of default values;

  • link: the chosen link function.

Functions

  • get_copula(): getter copula family

  • gaussian_causl_cop(): Gaussian copula family

  • t_causl_cop(): t copula family

  • sim_cop(): simulate from copula family


Define a causl_model object

Description

This defines a causl_model object, that can be used either for simulation or inference.

Usage

causl_model(
  formulas,
  family,
  pars,
  link,
  dat = NULL,
  method = "inversion",
  kwd = "cop"
)

Arguments

formulas

list of lists of formulas

family

families for variables and copula

pars

list of lists of parameters

link

list of link functions

dat

optional data frame of covariates

method

either "inversion" (the default), "inversion_mv", or "rejection"

kwd

word used for copula formula and parameters

Details

The components formulas and family must both be specified, and have matching lengths. If pars is specified, then the model can be used for simulation and inference, if not then only for inference. link is optional, and if not specified default links will be used.


Check parameters for univariate families

Description

Checks existence of beta vectors and then assesses appropriate length

Usage

check_rej(formulas, family, pars, dims, kwd)

check_pars(formulas, family, pars, dummy_dat, LHSs, kwd, dims)

Arguments

formulas

list of lists of formulas

family

families for variables and copula

pars

list of lists of parameters

dims

number of variables in each class

kwd

keyword for copula

dummy_dat

a dummy dataset, as generated by gen_dummy_dat()

LHSs

left-hand sides from formulas

Functions

  • check_rej(): Checks for rejection sampling


Density of a multivariate copula

Description

Density of a multivariate copula

Usage

dGaussCop(x, Sigma, log = FALSE, use_cpp = TRUE, N)

dtCop(x, Sigma, df, log = FALSE, use_cpp = TRUE)

dfgmCopula(x, alpha)

Arguments

x

samples on (0,1)

Sigma

collection of matrices

log

logical: return log=density?

use_cpp

logical: use the C routine?

N

optional integer for number of covariance matrices

df

degrees of freedom

alpha

parameter for copula

Details

Computes the density for data from a Gaussian or t-copula. Currently use_cpp only works for dGaussCop.

Value

numeric vector of densities

Functions

  • dGaussCop(): Gaussian copula

  • dtCop(): t-Copula density

  • dfgmCopula(): bivariate FGM copula


Vectorized conditional copula function

Description

Vectorized conditional copula function

Usage

cVCopula(U, copula, param, par2, inverse = FALSE)

Arguments

U

matrix of quantiles

copula

family of copula to use

param

vector of parameters

par2

Degrees of freedom for t-copula

inverse

should inverse CDF be returned?

Details

Should have nrow(U) = length(param).


Density of a Mixed Copula

Description

Density of a Mixed Copula

Usage

dGaussDiscCop(x, m, Sigma, eta, log = FALSE, use_cpp = TRUE)

Arguments

x

matrix of samples on (0,1)

m

number of discrete variables

Sigma

collection of matrices

eta

eta matrix

log

logical: return log=density?

use_cpp

logical: use the C routine?

Value

numeric vector of densities


Extract parameter estimates and standard errors

Description

Extract parameter estimates and standard errors

Usage

ests_ses(fit, beta, merged_formula, kwd)

Arguments

fit

output of optim

beta

output of initializeParams2

merged_formula

formula with all variables on RHS

kwd

keyword for copula variable


Obtain list of family functions

Description

Obtain list of family functions from numeric or character representation

Usage

family_list(family, func_return = get_family)

Arguments

family

numeric or character vector of families

func_return

function to apply to list of families

Examples

family_list(c(1,3,5))
family_list(c("t","binomial"))

Numbers for parametric families

Description

Data frames containing

  • val: an integer

  • family: a vector giving the associated parametric family for that integer.

The integer val may be used in place of the name of the parametric family when specifying the family object.

Usage

family_vals

familyVals

copula_vals

Format

family_vals is a data.frame with 9 rows and 2 columns

familyVals is the same object as family_vals

copula_vals is a data.frame with 7 rows and 2 columns

Details

familyVals will be removed in version 1.0.0.

Functions

  • familyVals: Old name

  • copula_vals: Values for copula families


Fit multivariate copula regression model

Description

Fit multivariate copula regression model

Usage

fit_causl(
  dat,
  formulas = list(y ~ x, z ~ 1, ~x),
  family = rep(1, length(formulas)),
  link,
  cop_pars,
  use_cpp = TRUE,
  control = list(),
  other_pars = list()
)

fitCausal(
  dat,
  formulas = list(y ~ x, z ~ 1, ~x),
  family = rep(1, length(formulas)),
  link,
  par2,
  sandwich = TRUE,
  use_cpp = TRUE,
  control = list()
)

Arguments

dat

data frame of observations

formulas

list of model formulae, for Y, for the Z variables, and finally for the copula

family

families for the Y and Z distributions, and the copula. Should be the same length as formulas

link

link functions for each variable

cop_pars

additional parameters for copula if required

use_cpp

logical: should C++ routines be used?

control

list of parameters to be passed to optim

other_pars

list of other parameters to use (e.g. degrees of freedom for a t-distribution)

par2

former name for cop_pars argument

sandwich

logical: should sandwich standard errors be returned?

Details

forms is list of three or more formulae giving predictors of y-margin, z-margin(s) and interaction parameters. Fit is by maximum likelihood.

control has the same arguments as the argument in optim, as well as sandwich, a logical indicating if sandwich estimates of standard errors should be computed, newton, a logical which controls whether Newton iterates should be performed at the end, and cop which can edit the restricted variable name for the left-hand side of formulae. Useful for altering are trace (1 shows steps of optimization) and maxit for the number of steps.

The list other_pars should be named with the relevant variables, and each entry should be a named list containing the relevant parameters.

Warning By default, none of the variables should be called cop, as this is reserved for the copula. The reserved word can be changed using the argument cop within control.

Value

Returns a list of class cop_fit.

Functions

  • fitCausal(): old name


Tools for manipulating formulas

Description

Tools for manipulating formulas

Usage

lhs(formulas, surv = FALSE)

lhs(formulas) <- value

rhs_vars(formulas)

tidy_formulas(formulas, kwd, prefix = "V")

Arguments

formulas

list of formulae

surv

logical indicating whether to treat as survey data

value

character vector to assign

kwd

string used to denote copula

prefix

string to begin each new variable name

Details

lhs returns a character vector containing left-hand sides of a list of formulae. If surv=TRUE then two responses are returned in the event of the left-hand side being a valid Surv object. ⁠lhs<-⁠ allows one to assign the left-hand sides of variables in the obvious way.

tidy_formulas ensures that all formulae in a list have a left hand side, by giving them names of the form Vn where n is some positive integer. The prefix V can be changed using the argument prefix.

rhs_vars extracts all the variables used on the right-hand sides of a list of formulas.

Functions

  • lhs(): Obtain left-hand sides from list of formulas

  • lhs(formulas) <- value: Assign left-hand sides to list of formulas

  • rhs_vars(): Extract variables from right-hand sides

  • tidy_formulas(): Tidy up formulae


Function to generate random copula parameters for simulation

Description

Attempts to ensure that values after passing through the standard link function used for Gaussian copulas will have the specified value. For other copulas this will not target the correct range, but it can still be used by considering how the relevant link functions work for the Gaussian and other copula.

Usage

gen_cop_pars(formulas, data, range = c(-1, 1), ...)

Arguments

formulas

formulas as specified in rfrugalParam

data

dataset to obtain parameterization for

range

range of parameters to target

...

other parameters to be included in each copula

Value

A list suitable for the cop entry of the pars argument of rfrugalParam


Generate a dummy dataset

Description

Create a dummy dataset for the purpose of checking coefficient numbers

Usage

gen_dummy_dat(family, pars, dat, LHSs, dims)

Arguments

family

families for variables and copula

pars

list of lists of parameters

dat

optional data frame of covariates

LHSs

left-hand sides from formulas

dims

number of variables in each class


Return causl_fam function from integer index

Description

Return causl_fam function from integer index

Usage

get_family(val)

gaussian_causl_fam(link)

t_causl_fam(link)

Gamma_causl_fam(link)

binomial_causl_fam(link)

beta_causl_fam(link)

categorical_causl_fam(link)

ordinal_causl_fam(link)

Arguments

val

integer corresponding to distributional family

link

link function

Details

The functions gaussian_causl_fam() etc. represent the functions that are returned by get_family().

A few function of this form can be defined by the user, and it should return the following:

  • name: the name of the relevant family;

  • ddist: a function returning the density of the distributions;

  • qdist: a function returning the quantiles from probabilities;

  • rdist: a function to sample values from the distribution;

  • pdist: a cumulative distribution function;

  • pars: a list of the names of the parameters used;

  • default: a function that returns a list of the default values for an observation and each of the parameters;

  • link: the specified link function.

The function should also give the output the class "causl_family", so that it is interpreted appropriately. Note that ddist should have a log argument, to allow the log-likelihood to be evaluated.

Functions

  • gaussian_causl_fam(): Gaussian distribution family

  • t_causl_fam(): Student's t distribution family

  • Gamma_causl_fam(): Gamma distribution family

  • binomial_causl_fam(): binomial distribution family

  • beta_causl_fam(): beta distribution family

  • categorical_causl_fam(): multinomial/categorical distribution family

  • ordinal_causl_fam(): ordinal categorical distribution family

See Also

family_vals


Get maximum weight for each segment of a distribution

Description

Get maximum weight for each segment of a distribution

Usage

get_max_weights(pars, forms_X, fam_X, qden, fam_Z, LHS_Z, ranges, link, ...)

Arguments

pars

list with all regression parameters

forms_X

formulae for treatments

fam_X, fam_Z

vector of families for treatments and covariates

qden

density of proposals

LHS_Z

variables in covariates

ranges

range of segments

link

link functions for treatments

...

not currently used


Get density of treatments

Description

Get density of treatments

Usage

get_X_density(dat, eta, phi, qden, family, link, par2, log = FALSE)

Arguments

dat

data frame of variables to change conditional distribution of

eta

list (or matrix) of linear forms

phi

vector of dispersion coefficients

qden

functions for densities used to simulate variables

family

vector of distribution families

link

link functions for GLMs

par2

vector of degrees of freedom

log

logical: should log-density be returned?

Value

a numeric vector of weights


Get univariate densities and uniform order statistics

Description

Get univariate densities and uniform order statistics

Usage

glm_dens(x, eta, phi, other_pars, family = 1, link)

univarDens(x, eta, phi, other_pars, family = 1, link)

Arguments

x

vector of observations

eta, phi

linear component and dispersion parameters

other_pars

other parameters for certain families

family

numeric indicator of family

link

link function

Details

fam follows the usual numeric pattern: 1=normal, 2=t-distribution and 3=Gamma with a log-link.

Value

A list with entries being the numeric vectors u (the quantiles of the input values) and ld (the log density of each observation).

Functions

  • univarDens(): old name


Simulate from a GLM

Description

Simulate values from some generalized linear models

Usage

glm_sim(family, eta, phi, other_pars, link, quantiles = TRUE)

Arguments

family

vector of distribution families

eta

list (or matrix) of linear forms

phi

vector of dispersion coefficients

other_pars

list of other parameters for specified family

link

link functions for GLMs

quantiles

logical indicating whether to return quantiles

Value

a numeric vector of weights


Check if family is categorical

Description

Check if family is categorical

Usage

is_categorical(x)

Arguments

x

a family, either numerical, a name, or a causl_family object

Details

Returns a logical indicating if the object is the input object represents a categorical or ordinal variable. If it cannot represent a family then NA is returned.


Obtain link from a causl_family or causl_copula obect

Description

Obtain link from a causl_family or causl_copula obect

Usage

## S3 method for class 'causl_family'
link(x, ...)

## S3 method for class 'causl_copula'
link(x, ...)

Arguments

x

an object of class causl_family or causl_copula

Functions

  • link(causl_copula): method for causl_copula object


Log-likelihood for frugal parameterization

Description

Log-likelihood for frugal parameterization

Usage

ll_frugal(pars, dat, formulas, family, link, kwd = "cop")

Arguments

pars

parameter values

dat

data.frame containing data

formulas

list of lists of formulas

family

families for variables and copula

link

list of link functions

kwd

string to use for copula


Combine multiple formulas

Description

Take collection of formulae and create one formula with all variables on the right-hand side of any of the originals.

Usage

merge_formulas(formulas)

Arguments

formulas

list of formulas to merge


Modify causl_model object

Description

Change one or more components of a causl_model object.

Usage

modify.causl_model(
  x,
  over = FALSE,
  formulas,
  family,
  pars,
  link,
  dat,
  method,
  kwd
)

Arguments

x

an object of class causl_model

over

logical: should components be added/modified or entirely over-written?

This function can be used to modify

formulas

list of lists of formulas

family

families for variables and copula

pars

list of lists of parameters

link

list of link functions

dat

optional data frame of covariates

method

either "inversion" (the default), "inversion_mv", or "rejection"

kwd

word used for copula formula and parameters


Negative log-likelihood

Description

Negative log-likelihood

Usage

nll2(
  theta,
  dat,
  mm,
  beta,
  phi,
  inCop,
  fam_cop = 1,
  family,
  link,
  cop_pars = NULL,
  use_cpp = TRUE,
  other_pars = list()
)

Arguments

theta

concatenated vector of parameters (beta followed by phi)

dat

matrix of data

mm

model matrix for use with beta

beta

(sparse) matrix of regression parameters for each variable and copula

phi

vector of dispersion parameters

inCop

vector of integers giving variables in dat to be included in copula

fam_cop, family

integer and integer vector for copula and distribution families respectively

link

vector of link functions

cop_pars

other parameters for copula

use_cpp

logical: should Rcpp functions be used?

other_pars

other parameters to pass to glm_dens

Details

The number of columns of beta should be the number of columns in dat plus the number required to parameterize the copula. The first few columns and the entries in phi are assumed to be in the order of those in dat. If the iith family for a variable does not require a dispersion parameter then the value of phi[i] is ignored.


Sets up copula quantities only

Description

Sets up copula quantities only

Usage

pair_copula_setup(formulas, family, pars, LHSs, quans, ord)

Arguments

formulas

list of formulas for copula only

family

list of families for copula only

pars

list of copula parameters

LHSs

left-hand sides for all variables

quans

character vector of already existing variables to include

ord

topological ordering


Get parameter masks for regression parameters

Description

Get parameter masks for regression parameters

Usage

par_masks(formulas, family = rep(1, nv), full_form)

Arguments

formulas

formulas to create mask for

family

vector or list of families

full_form

(optionally) merged list of formulas


Display output from causl_model

Description

Display output from causl_model

Usage

## S3 method for class 'causl_model'
print(x, ...)

Arguments

x

an object of class causl_model

...

additional arguments (not used)


Obtain univariate densities

Description

Ultimately should also work for ordinal and categorical cases

Usage

process_discrete_dens(dat, family, LHSs)

Arguments

dat

data frame of observations

family

families for the Y and Z distributions, and the copula. Should be the same length as formulas

LHSs

left-hand sides from formulas


Process formulas, families and parameters

Description

Process formulas, families and parameters

Usage

process_inputs(formulas, family, pars, link, dat, kwd, method = "inversion")

process_formulas(formulas, len = 4)

process_family(family, dims, func_return = get_family)

Arguments

formulas

list of lists of formulas

family

families for variables and copula

pars

list of lists of parameters

link

list of link functions

dat

optional data frame of covariates

kwd

keyword for copula

method

either "inversion" (the default), "inversion_mv", or "rejection"

len

number of formulas

dims

number of variables in each class

func_return

function to use to process character arguments

Details

Function that processes and checks the validity of the main arguments used for simulating data.

For causl we use the get_family() function to process character based arguments, but we allow for other functions to be used in packages that build on this one.

Functions

  • process_formulas(): Process input for family variables

  • process_family(): Process input for family variables


Obtain quantiles for prespecified variables

Description

Obtain quantiles for prespecified variables

Usage

process_prespecified(dat, prespec)

Arguments

dat

data frame containing variables

prespec

character vector of prespecified variables in dat

Details

Currently takes the rank of each entry, and subtracts 1/2. If there are kk ties they are randomly sorted with a uniform random variable in the symmetric interval around the rank of width k/nk/n.


Get weights for rejection sampling

Description

Get weights for rejection sampling

Usage

rejectionWeights(dat, mms, family, pars, qden, link)

Arguments

dat

data frame of variables to change conditional distribution of

mms

list of model matrices

family

vector of distribution families

pars

parameters for new distributions

qden

functions for densities used to simulate variables

link

link functions for GLMs

Value

a numeric vector of weights


Rescale quantiles to conditional copula

Description

Rescale quantiles to conditional copula

Usage

rescale_cop(U, X, beta, family = 1, par2)

rescaleCop(U, X, beta, family = 1, par2)

Arguments

U

matrix of quantiles

X

model matrix of covariates

beta

list of parameters (see details)

family

variety of copula to use

par2

additional parameter for some copulas

Details

The variable to be transformed must be in the final column of U, with variables being conditioned upon in the earlier columns.

family can be 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas. Gamma distributed, beta distributed or discrete respectively. pars should be a list with entries beta and phi, as well as possibly par2 if family=2. U should have the same length as X has rows, and X should have the same number of columns as the length of pars$beta.

Value

vector of rescaled quantiles

Functions

  • rescaleCop(): Old name, now deprecated


Rescale quantiles to arbitrary random variable.

Description

Rescale quantiles to arbitrary random variable.

Usage

rescale_var(U, X, pars, family = 1, link)

rescaleVar(U, X, pars, family = 1, link)

Arguments

U

vector of quantiles

X

model matrix of covariates

pars

list of parameters (see details)

family

family of distributions to use

link

link function

Details

family can be 1, 2, 3, 4 or 5 for Gaussian, t-distributed, Gamma distributed, beta distributed or discrete respectively, and 11 for ordinal variables. pars should be a list with entries beta and phi, as well as possibly par2, trunc and nlevel if the family is set to 2 or 5. U should have the same length as X has rows, and X should have the same number of columns as the length of pars$beta.

Value

vector of rescaled variables

Functions

  • rescaleVar(): Old name, now deprecated


Sample from a causal model

Description

Obtain samples from a causal model parameterized as in Evans and Didelez (2024).

Usage

rfrugal(n, causl_model, control = list())

rfrugalParam(
  n,
  formulas = list(list(z ~ 1), list(x ~ z), list(y ~ x), list(~1)),
  family = c(1, 1, 1, 1),
  pars,
  link = NULL,
  dat = NULL,
  method = "inversion",
  control = list(),
  ...
)

Arguments

n

number of samples required

causl_model

object of class causl_model

control

list of options for the algorithm

formulas

list of lists of formulas

family

families for variables and copula

pars

list of lists of parameters

link

list of link functions

dat

optional data frame of covariates

method

either "inversion" (the default), "inversion_mv", or "rejection"

...

other arguments, such as custom families

Details

Samples from a given causal model under the frugal parameterization.

The entries for formula and family should each be a list with four entries, corresponding to the ZZ, XX, YY and the copula. formula determines the model, so it is crucial that every variable to be simulated is represented there exactly once. Each entry of that list can either be a single formula, or a list of formulae. Each corresponding entry in family should be the same length as the list in formula or of length 1 (in which case it will be repeated for all the variables therein).

We use the following codes for different families of distributions: 0 or 5 = binary; 1 = normal; 2 = t-distribution; 3 = gamma; 4 = beta; 6 = log-normal.

The family variables for the copula are also numeric and taken from VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.

pars should be a named list containing variable names that correspond to the LHS of formulae in formulas. Each of these should themselves be a list containing beta (a vector of regression parameters) and (possibly) phi, a dispersion parameter. For any discrete variable that is a treatment, you can also specify p, an initial proportion to simulate from (otherwise this defaults to 0.5).

Link functions for the Gaussian, t and Gamma distributions can be the identity, inverse or log functions. Gaussian and t-distributions default to the identity, and Gamma to the log link. For the Bernoulli the logit, probit, and log links are available.

A variety of sampling methods are implemented. The inversion method with pair-copulas is the default (method="inversion"), but we cam also use a multivariate copula (method="inversion_mv") or even rejection sampling (method="rejection").

The only control parameters are cop: which gives a keyword for the copula that defaults to "cop"; quiet which defaults to FALSE but will reduce output if set to TRUE; and (if rejection sampling is selected) careful: this logical enables one to implement the full rejection sampling method, which means we do get exact samples (note this method is generally very slow, especially if we have an outlying value, so the default is FALSE).

Value

A data frame containing the simulated data.

Functions

  • rfrugalParam(): old function for simulation

Examples

pars <- list(z=list(beta=0, phi=1),
             x=list(beta=c(0,0.5), phi=1),
             y=list(beta=c(0,0.5), phi=0.5),
             cop=list(beta=1))
rfrugalParam(100, pars = pars)

Sample from multivariate copulas

Description

Sample from multivariate copulas

Usage

rGaussCop(n, Sigma)

rtCop(n, Sigma, df)

rfgmCopula(n, d = 2, alpha)

Arguments

n

sample size

Sigma

in which each slice is a correlation matrix

df

degrees of freedom

d

dimension of copula

alpha

(vector of) parameter values

Details

Quicker than rCopula.

Note that rfgmCopula only works for d=2d = 2.

Value

A vector of the simulated random variables.

Functions

  • rGaussCop(): Gaussian copula

  • rtCop(): t-copula

  • rfgmCopula(): FGM-copula


Simulate copula values

Description

Simulate copula values

Usage

sim_copula(dat, family, par, par2, model_matrix)

sim_CopVal(dat, family, par, par2, model_matrix)

Arguments

dat

data frame with empty columns

family

numeric indicator of copula type

par

mandatory parameters

par2

optional parameters

model_matrix

design matrix for covariates

Details

Returns data frame containing columns y and z1, ..., zk.

The family variables are numeric and taken from VineCopula. Use, for example, 1 for Gaussian, 2 for t, 3 for Clayton, 4 for Gumbel, 5 for Frank, 6 for Joe and 11 for FGM copulas.

Value

A data frame of the same dimension as dat containing the simulated values.

Functions

  • sim_CopVal(): Old name, now deprecated


Simulate for single time-step

Description

Simulate for single time-step

Usage

sim_inversion(out, proc_inputs)

sim_multi(out, proc_inputs)

sim_rejection(out, proc_inputs, careful)

Arguments

out

data frame for output

proc_inputs

output of process_inputs()

careful

should full, slower method be used?

Details

sim_inversion and sim_rejection correspond to performing the sampling by inversion or using rejection sampling.

sim_multi first simulates from the copula then transforms to the correct margins in the correct causal ordering

Functions

  • sim_multi(): simulation with multivariate copula

  • sim_rejection(): Rejection sampling code


Simulate a single variable using the inversion method

Description

Each entry formulas, family, pars, link is a list with two entries, the first referring to the variable being simulated and the second to the pair-copulas being used.

Usage

sim_variable(n, formulas, family, pars, link, dat, quantiles)

Arguments

n

sample size

formulas

list consisting of a formula for the output variables and a list of formulae for the pair-copula

family

list containing family variable

pars

list with two entries, first a list of parameters for response, and second a further list of parameters for pair-copula

link

list of same form as family

dat

data frame of current variables

quantiles

data frame of quantiles

Value

The data frame dat with an additional column given by the left-hand side of formula[[1]].


Simulate from vine copula

Description

Simulate from vine copula

Usage

sim_vinecop(dat, family, par, par2 = NULL, model_matrix, link)

Arguments

dat

data frame to be filled in

family

family to simulate from

par

matrix of parameters

par2

extra parameters for t-copula

model_matrix

design matrix for covariates

link

link functions for parameters (currently unused)

Value

A data frame of the same dimensions as dat.


Simulate initial X values

Description

Simulate initial X values

Usage

sim_X(n, fam_x, theta, offset, sim = TRUE)

Arguments

n

number of observations

fam_x

number for distribution family

theta

parameters for model

offset

optional mean correction

sim

should variables be simulated?

Details

Returns a list that includes a data frame containing a column x, as well as the density that was used to generate it. Possible families are Gaussian (=1), t (=2), Exponential (=3), beta (=4) Bernoulli/categorical (=5) and log-normal (=6).

For the exponential distribution, theta is the mean. Beta can take one or two parameters, and if there is only one it is just repeated.

The offset parameter alters the median for the normal and t-distributions, or the median of the logarithm in the case of a log-normal.

Value

A list with two entries: x a vector of the simulated values, and qden, which contains a function that evaluates to the density of the distribution used to generate those values.


Transform categorical or ordinal parameters into probabilities

Description

Transform categorical or ordinal parameters into probabilities

Usage

theta_to_p_ord(theta)

theta_to_p_cat(theta)

Arguments

theta

provided log-linear parameters

Details

Returns the probabilities implied by given log-linear parameters.

Functions

  • theta_to_p_ord(): for ordinal variables


Obtain variable ordering from formulas

Description

Obtain variable ordering from formulas

Usage

var_order(formulas, dims, inc_cop = TRUE, method)

Arguments

formulas

list of lists of formulas

dims

number of variables in each class

inc_cop

logical indicating whether to include copula in the ordering

method

either "inversion" (the default), "inversion_mv", or "rejection"