We can also fit models in which some of the covariates and/or outcomes are binary. For example, suppose we assume that and X ∣ Z1 = z1, Z2 = z2 ∼ Bernoulli(q), and we assume a Gaussian copula over Z1, Z2, Y with correlation R, where ρZ1Z2 = 0.5, ρZ1Y = 0.3 and ρZ2Y = 0.4.
Then we can set up this model in the usual way:
forms <- list(list(Z1 ~ 1, Z2 ~ 1), X ~ Z1*Z2, Y ~ X, ~ 1)
fams <- list(c(1,5), 5, 1, 1)
pars <- list(Z1 = list(beta=0, phi=1),
Z2 = list(beta=0),
X = list(beta=c(-0.3,0.1,0.2,0)),
Y = list(beta=c(-0.25, 0.5), phi=0.5),
cop = list(beta=matrix(c(0.5,0.3,0.4), nrow=1)))
cm <- causl_model(formulas = forms, family = fams, pars = pars)
Now call rfrugal
as usual.
## Inversion method selected: using pair-copula parameterization
Then we can check that the distributions follow the specification that we asked for.
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: dat$Z1
## D = 0.01, p-value = 0.2
## alternative hypothesis: two-sided
##
## Exact binomial test
##
## data: sum(dat$Z2) and n
## number of successes = 5099, number of trials = 10000, p-value = 0.05
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.50 0.52
## sample estimates:
## probability of success
## 0.51
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3056 0.0289 -10.564 4.36e-26
## Z1 0.0757 0.0293 2.586 9.70e-03
## Z2 0.1623 0.0403 4.024 5.73e-05
## Z1:Z2 0.0323 0.0409 0.788 4.31e-01
These are all consistent with the values we provided. We can finally check the outcome model.
ps <- predict(glmX, type="response")
wt <- dat$X/ps + (1-dat$X)/(1-ps)
glmY <- svyglm(Y ~ X, design = svydesign(~1, weights=wt, data=dat))
summary(glmY)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.244 0.00966 -25.3 1.31e-136
## X 0.493 0.01430 34.5 2.85e-246
We can also extend this to variables that are categorical or ordered
categorical. These options correspond to the family indicators 11 and 10
respectively, or they can be accessed by passing the strings
"categorical"
and "ordinal"
to the
family
argument. Both types of variable are stored as a
factor, with labels for the objects 1
, 2
, up
to ℓ, where ℓ is the number of levels.
These methods respectively use a contrast with the baseline level, or
a contrast with the category immediately below the current one in the
ordering. For example, if we have a three variable categorical variable
using the formula Z ~ A
then the parameters are of the
form: For a categorical variable the contrasts become log {P(Z = i)/P(Z = i − 1)}
for the same range of i.
The regression parameters can be passed either as a vector or a
matrix, but in either case they must be ordered so that all the
coefficients for one contrast precede all those for another. For
example, if we want we could put
Z = list(beta = c(0.5,0.1,-0.5,0.4), nlevel = 3)
in the
pars
argument to require that for an unordered categorical
variable, and the same for an ordinal variable but replacing the second
quantity with log {P(Z = 3)/P(Z = 2)}.
Equivalently, we could put
Z = list(beta = matrix(c(0.5,0.1,-0.5,0.4), nrow=2), nlevel = 3)
,
and the same model would be fitted.
forms <- list(list(Z0 ~ A0, Z1 ~ A0),
list(A0 ~ 1, A1 ~ A0*Z0),
Y ~ A0*A1,
~ A0)
fams <- list(c("categorical","categorical"),c(5,5),1,1)
pars <- list(A0 = list(beta = 0),
Z0 = list(beta = c(0.3,-0.2,0.4,0.1), nlevel=3),
Z1 = list(beta = c(0.3,-0.2,0.4,0.1), nlevel=3),
A1 = list(beta = 2*c(-0.3,0.4,0.3,0.5,0.3,0.5)),
Y = list(beta = c(-0.5,0.2,0.3,0), phi=1),
cop = list(beta=c(2,0.5)))
cm <- causl_model(formulas=forms, pars=pars, family=fams)
In this case we set both covariate variables to be categorical.
## Inversion method selected: using pair-copula parameterization
Now we can check that the parameters being simulated match those input.
## (Intercept) A0
## 0.307 -0.185
## (Intercept) A0
## 0.4175 0.0963
## (Intercept) A0
## 0.306 -0.197
## (Intercept) A0
## 0.378 0.132
Finally, we can check that the outcome model works as it should.
ps <- predict(glm(A1 ~ A0*Z0, family=binomial, data=dat), type="response")
wt <- dat$A1/ps + (1-dat$A1)/(1-ps)
summary(svyglm(Y ~ A0*A1, design = svydesign(~1,data=dat,weights=wt)))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5058 0.00906 -55.819 0.00e+00
## A0 0.2010 0.01786 11.253 2.43e-29
## A1 0.3053 0.01284 23.770 3.38e-124
## A0:A1 0.0105 0.02138 0.492 6.23e-01
Comparing with a naïve (unweighted) estimate.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5863 0.00897 -65.33 0.00e+00
## A0 0.0484 0.01600 3.02 2.51e-03
## A1 0.4671 0.01254 37.25 1.62e-299
## A0:A1 0.1506 0.01959 7.68 1.57e-14