Copula Simulation

library(causl)

Simulation

The main function for simulating data is causalSamp(). Suppose we wish to have with μx = 0.5z, μy = 0.3x, and σz2 = σx2 = σy2 = 1. Then we set:

pars <- list(z = list(beta = 0, phi=1),
             x = list(beta = c(0,0.5), phi=1),
             y = list(beta = c(0,0.3), phi=1),
             cop = list(beta = matrix(c(0.5,0.25), ncol=1)))

Note that the copula parameters have been selected for a linear predictor of the form η = 0.5 + 0.25x, and the default link function for the Gaussian copula is g(η) = logit ((η + 1)/2).

Then we simulate as follows:

set.seed(124)  # for consistency
dat <- causalSamp(1e3, formulas=list(z~1, x~z, y~x, ~x), family=rep(1,4), pars=pars)
pairs(dat)

Note that the distribution is as we expect. Regressing on should give a coefficient close to 0.5.

lmXZ <- lm(x ~ z, data=dat)
summary(lmXZ)$coef
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.00939     0.0317  -0.296 7.67e-01
#> z            0.43290     0.0325  13.312 2.44e-37
# lmYX <- lm(y ~ x, data=dat, weights = 1/dnorm(predict(lmXZ)))
# summary(lmYX)

Note that it is not necessary to use the standard variable names z, x and y, but if you choose not to then you must specify the names you want to use with the formulas argument.

pars <- list(a = list(beta = 0, phi=1),
             b = list(beta = c(0,0.5), phi=1),
             c = list(beta = c(0,0.3), phi=1),
             cop = list(beta = matrix(c(0.5,0.25), ncol=1)))
set.seed(124)
dat2 <- causalSamp(1e3, formulas=list(a ~ 1, b ~ a, c ~ b,  ~ b), pars=pars)
#> No families provided, so assuming all variables are Gaussian
pairs(dat2)

causalSamp() has various default settings. The copula CYZ, and the distribution families of X, Y and Z all default to Gaussian. It over-samples by a factor of 10 so that some may be rejected later, though this is increased later on if necessary up to the control parameter max_oversamp (with default value 1000).

Fitting Models

We can fit a model using maximum likelihood.

out <- fit_causl(dat, formulas = list(z~1, y~x, ~x))
out
#> log-likelihood:  -2831 
#> z ~ 1
#>                 est.   s.e. sandwich
#> (intercept) -0.00312 0.0309   0.0308
#>   residual s.e.:  0.957 0.0429 0.0445 
#> 
#> y ~ x
#>                 est.   s.e. sandwich
#> (intercept) -0.00959 0.0351   0.0360
#> x            0.29923 0.0312   0.0284
#>   residual s.e.:  1.11 0.0496 0.049 
#> 
#> copula parameters:
#> cop ~ x
#> cop_z_y:
#>              est.  s.e. sandwich
#> (intercept) 0.426 0.068   0.0651
#> x           0.330 0.056   0.0511

We see that the x coefficient in the regression parameter is correct (even surprisingly so!).

Note that if we allow z to depend upon x we obtain a biased estimate.

out2 <- fit_causl(dat, form = list(y~x, z~x, ~x))
out2
#> log-likelihood:  -2750 
#> y ~ x
#>               est.   s.e. sandwich
#> (intercept) 0.0613 0.0334   0.0334
#> x           0.3788 0.0293   0.0270
#>   residual s.e.:  1.11 0.0495 0.0489 
#> 
#> z ~ x
#>                est.   s.e. sandwich
#> (intercept) 0.00152 0.0285   0.0285
#> x           0.33378 0.0251   0.0250
#>   residual s.e.:  0.812 0.0363 0.0357 
#> 
#> copula parameters:
#> cop ~ x
#> cop_y_z:
#>              est.   s.e. sandwich
#> (intercept) 0.392 0.0623   0.0597
#> x           0.364 0.0604   0.0571

Other families

We can also perform maximum likelihood estimation with other parametric families. Suppose that Z is t-distributed and Y Gamma distributed. We can simulate with

forms <- list(Z ~ 1, X ~ Z, Y ~ X, ~ 1)
fams <- c(2, 5, 3, 1)
pars <- list(Z = list(beta = 0, phi=1, par2=4),
             X = list(beta = c(0,0.5)),
             Y = list(beta = c(1,-0.3), phi=2),
             cop = list(beta = 1))
set.seed(124)
dat <- rfrugalParam(1e3, formulas=forms, family=fams, pars=pars)
#> Inversion method selected: using pair-copula parameterization

Then we can perform maximum likelihood estimation on this data:

(out <- fit_causl(dat, formulas = forms[-2], family = fams[-2], other_pars=list(Z=list(par2=4L))))
#> log-likelihood:  -3219 
#> Z ~ 1
#>                est.  s.e. sandwich
#> (intercept) -0.0023 0.039   0.0384
#>   residual s.e.:  1.11 0.0662 0.0685 
#> 
#> Y ~ X
#>               est.   s.e. sandwich
#> (intercept)  1.053 0.0648   0.0617
#> X           -0.378 0.0823   0.0798
#>   residual s.e.:  2.05 0.0747 0.0722 
#> 
#> copula parameters:
#> cop ~ 1
#>             est.   s.e. sandwich
#> (intercept) 1.09 0.0634   0.0603