| Title: | Monte Carlo Quantitative Bias Analysis for Unmeasured Confounding |
|---|---|
| Description: | A flexible Monte Carlo quantitative bias analysis (QBA) for unmeasured confounding in observational studies, as described in Hughes et al. The substantive analysis may be a generalised linear model or a Cox proportional hazards model with a binary, continuous, or categorical exposure and measured confounders. The method allows for one or more binary or continuous unmeasured confounders that may be correlated with the measured confounders. Informative priors for a small number of bias parameters encode external information about the unmeasured confounders. |
| Authors: | Tom Palmer [aut, cre] (ORCID: <https://orcid.org/0000-0003-4655-4511>, ROR: <https://ror.org/0524sp257>), Emily Kawabata [aut] (ORCID: <https://orcid.org/0000-0003-4178-5513>), Rachael Hughes [aut] (ORCID: <https://orcid.org/0000-0003-0766-1410>) |
| Maintainer: | Tom Palmer <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9000 |
| Built: | 2026-05-26 14:59:28 UTC |
| Source: | https://github.com/remlapmot/qbaconfound |
Conducts the flexible Monte Carlo quantitative bias analysis (QBA) for unmeasured confounding of Hughes et al. Given a naive analysis model (which omits one or more unmeasured confounders) and informative priors for a small number of bias parameters, the function returns a bias-adjusted estimate of the exposure effect together with an interval that accounts for both the unmeasured confounding and sampling variability.
qbaconfound( formula, data, exposure = NULL, confounders, family = stats::gaussian(), reps = 1000L, sampling_error = TRUE, seed = NULL )qbaconfound( formula, data, exposure = NULL, confounders, family = stats::gaussian(), reps = 1000L, sampling_error = TRUE, seed = NULL )
formula |
The naive analysis model, e.g. |
data |
A data frame containing the outcome, exposure, and measured confounders. |
exposure |
Character vector naming the exposure term(s) in |
confounders |
A single |
family |
The outcome model family: a |
reps |
Number of Monte Carlo replications. |
sampling_error |
Logical; if |
seed |
Optional integer seed for reproducibility. |
The substantive analysis may be a generalised linear model (any stats::glm()
family) or a Cox proportional hazards model. Survival outcomes are detected
automatically when the left-hand side of formula is a survival::Surv()
call, or family can be set to "cox".
Each Monte Carlo replication (see Hughes et al., section 2.4):
draws a value for every bias parameter from its prior (the priors are
specified through u_continuous() / u_binary());
simulates a proxy for each unmeasured confounder as a function of the exposure only (a continuous proxy from a normal model, a binary proxy from a Bernoulli model whose intercept reproduces the drawn prevalence);
refits the outcome model including the simulated proxies, with their coefficients fixed to the drawn values (implemented as a model offset), and reads off the exposure coefficient and its standard error;
adds Monte Carlo sampling error by drawing the bias-adjusted estimate from a normal distribution centred on that coefficient.
The point estimate is the median and the interval the 2.5th and 97.5th percentiles of the resulting distribution of bias-adjusted estimates.
An object of class qbaconfound, a list with elements including
estimates (a data frame of naive and bias-adjusted estimates per exposure
term), draws (the matrix of bias-adjusted estimates across replications),
and n_failed (the number of replications whose model fit failed).
Hughes RA, Kawabata E, Palmer TM, et al. A flexible Monte Carlo quantitative bias analysis for unmeasured confounding. Statistical Methods in Medical Research (under review).
u_continuous(), u_binary(), sim_confounding()
df <- sim_confounding(n = 500, beta_x = 0, seed = 1) # Naive model y ~ x + c1 omits the confounder u; adjust for one continuous U. fit <- qbaconfound( y ~ x + c1, data = df, exposure = "x", confounders = u_continuous(coef_out = c(0.8, 0.1), coef_exp = c(0.6, 0.05), resid_sd = c(0.9, 1.1)), reps = 200, seed = 1 ) fitdf <- sim_confounding(n = 500, beta_x = 0, seed = 1) # Naive model y ~ x + c1 omits the confounder u; adjust for one continuous U. fit <- qbaconfound( y ~ x + c1, data = df, exposure = "x", confounders = u_continuous(coef_out = c(0.8, 0.1), coef_exp = c(0.6, 0.05), resid_sd = c(0.9, 1.1)), reps = 200, seed = 1 ) fit
A small helper that simulates a dataset with one measured confounder c1
and one unmeasured confounder u that jointly confound the exposure-outcome
relationship. The naive model y ~ x + c1 (which omits u) is therefore
biased for the exposure effect, making the data useful for examples and
tests of qbaconfound().
sim_confounding( n = 1000L, beta_x = 0, family = c("gaussian", "binomial"), seed = NULL )sim_confounding( n = 1000L, beta_x = 0, family = c("gaussian", "binomial"), seed = NULL )
n |
Number of observations. |
beta_x |
True exposure effect (on the linear-predictor scale). |
family |
Outcome type: |
seed |
Optional integer seed for reproducibility. |
A data frame with columns y, x, c1, and u (the unmeasured
confounder, included so it can be removed to mimic the unmeasured case).
The true exposure effect is stored in attr(, "beta_x").
df <- sim_confounding(n = 1000, beta_x = 0.5, seed = 42) # Naive (biased) versus full (unbiased) model: coef(lm(y ~ x + c1, df))["x"] coef(lm(y ~ x + c1 + u, df))["x"]df <- sim_confounding(n = 1000, beta_x = 0.5, seed = 42) # Naive (biased) versus full (unbiased) model: coef(lm(y ~ x + c1, df))["x"] coef(lm(y ~ x + c1 + u, df))["x"]
Summarise a Monte Carlo QBA
## S3 method for class 'qbaconfound' summary(object, ...)## S3 method for class 'qbaconfound' summary(object, ...)
object |
A |
... |
Unused. |
A data frame of the naive and bias-adjusted estimates for each
exposure term, with columns term, naive, naive_se, estimate,
conf.low, and conf.high.
Describes the prior distributions of the bias parameters for a single binary
unmeasured confounder, for use in qbaconfound().
u_binary(coef_out, coef_exp, prevalence, prev_dist = c("uniform", "beta"))u_binary(coef_out, coef_exp, prevalence, prev_dist = c("uniform", "beta"))
coef_out |
Length-2 numeric |
coef_exp |
Normal prior for the coefficient(s) of the exposure in the
model for the unmeasured confounder ( |
prevalence |
Length-2 numeric giving the prior for the marginal
prevalence |
prev_dist |
Prior family for the prevalence: |
As for u_continuous() the coefficient of U in the outcome model
(coef_out) and the coefficient(s) of the exposure in the model for U
(coef_exp) are bias parameters. For a binary confounder the third bias
parameter is the marginal prevalence of U (prevalence, i.e.
) rather than a residual standard deviation. A prevalence is
usually easier to elicit and more readily reported in the literature than
the intercept of a logistic model; the intercept needed to reproduce the
drawn prevalence is derived internally.
An object of class qba_u describing a binary unmeasured
confounder.
u_binary(coef_out = c(0.7, 0.1), coef_exp = c(0.4, 0.05), prevalence = c(0.15, 0.25))u_binary(coef_out = c(0.7, 0.1), coef_exp = c(0.4, 0.05), prevalence = c(0.15, 0.25))
Describes the prior distributions of the bias parameters for a single
continuous unmeasured confounder, for use in qbaconfound().
u_continuous(coef_out, coef_exp, resid_sd, resid_dist = c("uniform", "gamma"))u_continuous(coef_out, coef_exp, resid_sd, resid_dist = c("uniform", "gamma"))
coef_out |
Length-2 numeric |
coef_exp |
Normal prior for the coefficient(s) of the exposure in the
model for the unmeasured confounder ( |
resid_sd |
Length-2 numeric giving the prior for the residual standard
deviation |
resid_dist |
Prior family for the residual standard deviation: either
|
The bias model relates the unmeasured confounder U to the study data
through three bias parameters: the coefficient of U in the outcome model
(coef_out, i.e. ), the coefficient(s) of the exposure in the
model for U (coef_exp, i.e. ), and the residual standard
deviation of U given the exposure (resid_sd, i.e. ). Values
for these parameters cannot be estimated from the data and so are drawn from
the prior distributions specified here.
An object of class qba_u describing a continuous unmeasured
confounder.
u_continuous(coef_out = c(0.8, 0.1), coef_exp = c(0.3, 0.05), resid_sd = c(0.9, 1.1))u_continuous(coef_out = c(0.8, 0.1), coef_exp = c(0.3, 0.05), resid_sd = c(0.9, 1.1))