Package 'midoc' reference manual

Title:	A Decision-Making System for Multiple Imputation
Description:	A guidance system for analysis with missing data. It incorporates expert, up-to-date methodology to help researchers choose the most appropriate analysis approach when some data are missing. You provide the available data and the assumed causal structure, including the likely causes of missing data. 'midoc' will advise which analysis approaches can be used, and how best to perform them. 'midoc' follows the framework for the treatment and reporting of missing data in observational studies (TARMOS). Lee et al (2021). <doi:10.1016/j.jclinepi.2021.01.008>.
Authors:	Elinor Curnow [aut, cre, cph] , Jon Heron [aut], Rosie Cornish [aut], Kate Tilling [aut], James Carpenter [aut]
Maintainer:	Elinor Curnow <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.0
Built:	2025-03-03 05:20:25 UTC
Source:	https://github.com/elliecurnow/midoc

Child body mass index data

Description

A simulated dataset

Usage

bmi
bmi

Format

`bmi`

A data frame with 1000 rows and 6 columns:

bmi7: Child's body mass index at age 7 years
matage: Mother's age at pregnancy, standardised relative to a mean age of 30
mated: Mother's educational level: post-16 years qualification or not
pregsize: Mother's pregnancy size: singleton or twins
bwt: Child's birth weight in kilograms
r: Missingness indicator: whether bmi7 is reported or not

...

Inspect complete records analysis model

Description

Check complete records analysis is valid under the proposed analysis model and directed acyclic graph (DAG). Validity means that the proposed approach will allow unbiased estimation of the estimand(s) of interest, including regression parameters, associations, and causal effects.

Usage

checkCRA(y, covs, r_cra, mdag)
checkCRA(y, covs, r_cra, mdag)

Arguments

`y`	The analysis model outcome, specified as a string
`covs`	The analysis model covariate(s), specified as a string (space delimited)
`r_cra`	The complete record indicator, specified as a string
`mdag`	The DAG, specified as a string using dagitty syntax

Details

The DAG should include all observed and unobserved variables related to the analysis model variables and their missingness, as well as all required missingness indicators.

In general, complete records analysis is valid if the analysis model outcome and complete record indicator are unrelated, conditional on the specified covariates. This is determined using the proposed DAG by checking whether the analysis model and complete record indicator are 'd-separated', given the covariates.

Value

A message indicating whether complete records analysis is valid under the proposed DAG and analysis model outcome and covariate(s)

References

Hughes R, Heron J, Sterne J, Tilling K. 2019. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int J Epidemiol. doi:10.1093/ije/dyz032

Bartlett JW, Harel O, Carpenter JR. 2015. Asymptotically Unbiased Estimation of Exposure Odds Ratios in Complete Records Logistic Regression. Am J Epidemiol. doi:10.1093/aje/kwv114

Examples

# Example DAG for which complete records analysis is not valid, but could be
## valid for a different set of covariates
checkCRA(y="bmi7", covs="matage", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r")
# For the DAG in the example above, complete records analysis is valid
## if a different set of covariates is used
checkCRA(y="bmi7", covs="matage mated", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r")

# Example DAG for which complete records is not valid, but could be valid
## for a different estimand
checkCRA(y="bmi7", covs="matage mated", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r matage -> bmi3
               mated -> bmi3 bmi3 -> bmi7 bmi3 -> r")

# Example DAG for which complete records analysis is never valid
checkCRA(y="bmi7", covs="matage mated", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r bmi7 -> r")
# Example DAG for which complete records analysis is not valid, but could be
## valid for a different set of covariates
checkCRA(y="bmi7", covs="matage", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r")
# For the DAG in the example above, complete records analysis is valid
## if a different set of covariates is used
checkCRA(y="bmi7", covs="matage mated", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r")

# Example DAG for which complete records is not valid, but could be valid
## for a different estimand
checkCRA(y="bmi7", covs="matage mated", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r matage -> bmi3
               mated -> bmi3 bmi3 -> bmi7 bmi3 -> r")

# Example DAG for which complete records analysis is never valid
checkCRA(y="bmi7", covs="matage mated", r_cra="r",
         mdag="matage -> bmi7 mated -> matage mated -> bmi7
               sep_unmeas -> mated sep_unmeas -> r bmi7 -> r")

Inspect multiple imputation model

Description

Check multiple imputation is valid under the proposed imputation model and directed acyclic graph (DAG). Validity means that the proposed approach will allow unbiased estimation of the estimand(s) of interest, including regression parameters, associations, and causal effects. The imputation model should include all other analysis model variables as predictors, as well as any auxiliary variables. The DAG should include all observed and unobserved variables related to the analysis model variables and their missingness, as well as all required missingness indicators.

Usage

checkMI(dep, preds, r_dep, mdag)
checkMI(dep, preds, r_dep, mdag)

Arguments

`dep`	The partially observed variable to be imputed, specified as a string
`preds`	The imputation model predictor(s), specified as a string (space delimited)
`r_dep`	The partially observed variable's missingness indicator, specified as a string
`mdag`	The DAG, specified as a string using dagitty syntax

Details

In principle, multiple imputation is valid if each partially observed variable is unrelated to its own missingness, given its imputation model predictors.

Value

A message indicating whether multiple imputation is valid under the proposed DAG and imputation model

References

Curnow E, Tilling K, Heron JE, Cornish RP, Carpenter JR. 2023. Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias. Frontiers in Epidemiology. doi:10.3389/fepid.2023.1237447

Examples

# Example DAG for which multiple imputation is valid
checkMI(dep="bmi7", preds="matage mated pregsize", r_dep="r",
        mdag="matage -> bmi7 mated -> matage mated -> bmi7
              sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7
              pregsize -> bwt sep_unmeas -> bwt")

# Example DAG for which multiple imputation is not valid, due to a collider
checkMI(dep="bmi7", preds="matage mated bwt", r_dep="r",
        mdag="matage -> bmi7 mated -> matage mated -> bmi7
              sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7
              pregsize -> bwt sep_unmeas -> bwt")
# Example DAG for which multiple imputation is valid
checkMI(dep="bmi7", preds="matage mated pregsize", r_dep="r",
        mdag="matage -> bmi7 mated -> matage mated -> bmi7
              sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7
              pregsize -> bwt sep_unmeas -> bwt")

# Example DAG for which multiple imputation is not valid, due to a collider
checkMI(dep="bmi7", preds="matage mated bwt", r_dep="r",
        mdag="matage -> bmi7 mated -> matage mated -> bmi7
              sep_unmeas -> mated sep_unmeas -> r pregsize -> bmi7
              pregsize -> bwt sep_unmeas -> bwt")

Inspect parametric model specification

Description

Explore whether the observed relationships in the specified dataset are consistent with the proposed parametric model (which may represent the analysis or imputation model).

Usage

checkModSpec(formula, family, data, plot = TRUE, message = TRUE)
checkModSpec(formula, family, data, plot = TRUE, message = TRUE)

Arguments

`formula`	A symbolic description of the model to be fitted, with the dependent variable on the left of a ~ operator, and the covariates, separated by + operators, on the right, specified as a string
`family`	A description of the error distribution and link function to be used in the model, specified as a string; family functions that are supported are "gaussian(identity)" and "binomial(logit)"
`data`	A data frame containing all the variables stated in the formula
`plot`	If TRUE (the default) and there is evidence of model mis-specification, displays a plot which can be used to explore the functional form of each covariate in the specified model; use plot = FALSE to disable the plot
`message`	If TRUE (the default), displays a message indicating whether the relationships between the dependent variable and covariates are likely to be correctly specified or not; use message = FALSE to suppress the message

Value

An object of type 'mimod' (a list containing the specified formula, family, and dataset name). Optionally, a message indicating whether the relationships between the dependent variable and covariates are likely to be correctly specified or not. If there is evidence of model mis-specification, optionally returns a plot of the model residuals versus the fitted values which can be used to explore the appropriate functional form for the specified model.

References

Curnow E, Carpenter JR, Heron JE, et al. 2023. Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified. J Clin Epidemiol. doi:10.1016/j.jclinepi.2023.06.011

Examples

# Example (incorrectly) assuming a linear relationship
checkModSpec(formula="bmi7~matage+mated+pregsize",
             family="gaussian(identity)", data=bmi)
  ## For the example above, (correctly) assuming a quadratic relationship
checkModSpec(formula="bmi7~matage+I(matage^2)+mated+pregsize",
             family="gaussian(identity)", data=bmi)
# Example (incorrectly) assuming a linear relationship
checkModSpec(formula="bmi7~matage+mated+pregsize",
             family="gaussian(identity)", data=bmi)
  ## For the example above, (correctly) assuming a quadratic relationship
checkModSpec(formula="bmi7~matage+I(matage^2)+mated+pregsize",
             family="gaussian(identity)", data=bmi)

Lists missing data patterns in the specified dataset

Description

This function summarises the missing data patterns in the specified dataset. Each row in the output corresponds to a missing data pattern (1=observed, 0=missing). The number and percentage of observations is also displayed for each missing data pattern. The first column indicates the number of missing data patterns. The second column refers to the analysis model outcome ('y'), with all other variables ('covs') displayed in subsequent columns. Alternatively, 'y' can be used to display the primary variable of interest, e.g. 'y' could refer to the exposure, with all other variables listed in 'covs'.

Usage

descMissData(y, covs, data, plot = FALSE)
descMissData(y, covs, data, plot = FALSE)

Arguments

`y`	The analysis model outcome, specified as a string
`covs`	The analysis model covariate(s), specified as a string (space delimited)
`data`	A data frame containing the specified analysis model outcome and covariate(s)
`plot`	If TRUE, displays a plot using md.pattern to visualise the missing data patterns; use plot = FALSE (the default) to disable the plot

Value

A summary of the missing data patterns

Examples

descMissData(y="bmi7", covs="matage mated", data=bmi)
descMissData(y="bmi7", covs="matage mated pregsize bwt", data=bmi, plot=TRUE)
descMissData(y="bmi7", covs="matage mated", data=bmi)
descMissData(y="bmi7", covs="matage mated pregsize bwt", data=bmi, plot=TRUE)

Performs multiple imputation

Description

Creates multiple imputations using mice, based on the options and dataset specified by a call to proposeMI. If a substantive model is specified, also calculates the pooled estimates using pool.

Usage

doMImice(mipropobj, seed, substmod = " ", message = TRUE)
doMImice(mipropobj, seed, substmod = " ", message = TRUE)

Arguments

`mipropobj`	An object of type 'miprop', created by a call to 'proposeMI'
`seed`	An integer that is used to set the seed of the 'mice' call
`substmod`	Optionally, a symbolic description of the substantive model to be fitted, specified as a string; if supplied, the model will be fitted to each imputed dataset and the results pooled
`message`	If TRUE (the default), displays a message summarising the analysis that has been performed; use message = FALSE to suppress the message

Value

A 'mice' object of class 'mids' (the multiply imputed datasets). Optionally, a message summarising the analysis that has been performed.

Examples

# First specify the imputation model as a 'mimod' object
## (suppressing the message)
mimod_bmi7 <- checkModSpec(formula="bmi7~matage+I(matage^2)+mated+pregsize",
                           family="gaussian(identity)",
                           data=bmi,
                           message=FALSE)
# Save the proposed 'mice' options as a 'miprop' object
## (suppressing the message)
miprop <- proposeMI(mimodobj=mimod_bmi7,
                    data=bmi,
                    message=FALSE,
                    plot = FALSE)
# Create the set of imputed datasets using the proposed 'mice' options
imp <- doMImice(miprop,123)

# Additionally, fit the substantive model to each imputed dataset and display
## the pooled results
doMImice(miprop, 123, substmod="lm(bmi7 ~ matage + I(matage^2) + mated)")
# First specify the imputation model as a 'mimod' object
## (suppressing the message)
mimod_bmi7 <- checkModSpec(formula="bmi7~matage+I(matage^2)+mated+pregsize",
                           family="gaussian(identity)",
                           data=bmi,
                           message=FALSE)
# Save the proposed 'mice' options as a 'miprop' object
## (suppressing the message)
miprop <- proposeMI(mimodobj=mimod_bmi7,
                    data=bmi,
                    message=FALSE,
                    plot = FALSE)
# Create the set of imputed datasets using the proposed 'mice' options
imp <- doMImice(miprop,123)

# Additionally, fit the substantive model to each imputed dataset and display
## the pooled results
doMImice(miprop, 123, substmod="lm(bmi7 ~ matage + I(matage^2) + mated)")

Compares data with proposed DAG

Description

Explore whether relationships between fully observed variables in the specified dataset are consistent with the proposed directed acyclic graph (DAG) using localTests functionality.

Usage

exploreDAG(mdag, data)
exploreDAG(mdag, data)

Arguments

`mdag`	The DAG, specified as a string using dagitty syntax
`data`	A data frame containing all the variables stated in the DAG. All ordinal variables must be integer-coded and all categorical variables must be dummy-coded.

Value

A message indicating whether the relationships between fully observed variables in the specified dataset are consistent with the proposed DAG

Examples

exploreDAG(mdag="matage -> bmi7 mated -> matage mated -> bmi7
                 sep_unmeas -> mated sep_unmeas -> r",
           data=bmi)
exploreDAG(mdag="matage -> bmi7 mated -> matage mated -> bmi7
                 sep_unmeas -> mated sep_unmeas -> r",
           data=bmi)

Run an interactive vignette for the midoc package

Description

Runs an interactive version of the midoc vignette: Multiple Imputation DOCtor (midoc). In the interactive version, you can apply midoc functions in shiny-package apps using your own DAG and data.

Usage

midocVignette()
midocVignette()

Value

A browser-based, interactive version of the midoc vignette

Examples


# Run the interactive vignette
midocVignette()

# Run the interactive vignette
midocVignette()

Suggests multiple imputation options

Description

Suggests the mice options to perform multiple imputation, based on the proposed set of imputation models (one for each partially observed variable) and specified dataset.

Usage

proposeMI(mimodobj, data, plot = TRUE, plotprompt = TRUE, message = TRUE)
proposeMI(mimodobj, data, plot = TRUE, plotprompt = TRUE, message = TRUE)

Arguments

`mimodobj`	An object, or list of objects, of type 'mimod', which stands for 'multiple imputation model', created by a call to checkModSpec
`data`	A data frame containing all the variables required for imputation and the substantive analysis
`plot`	If TRUE (the default), displays diagnostic plots for the proposed 'mice' call; use plot=FALSE to disable the plots
`plotprompt`	If TRUE (the default), the user is prompted before the second plot is displayed; use plotprompt=FALSE to remove the prompt
`message`	If TRUE (the default), displays a message describing the proposed 'mice' options; use message=FALSE to suppress the message

Value

An object of type 'miprop', which can be used to run 'mice' using the proposed options, plus, optionally, a message and diagnostic plots describing the proposed 'mice' options

Examples

# First specify each imputation model as a 'mimod' object
## (suppressing the message)
mimod_bmi7 <- checkModSpec(formula="bmi7~matage+I(matage^2)+mated+pregsize",
                           family="gaussian(identity)",
                           data=bmi,
                           message=FALSE)
mimod_pregsize <- checkModSpec(
                           formula="pregsize~bmi7+matage+I(matage^2)+mated",
                           family="binomial(logit)",
                           data=bmi,
                           message=FALSE)

# Display the proposed 'mice' options (suppressing the plot prompt)
## When specifying a single imputation model
proposeMI(mimodobj=mimod_bmi7,
          data=bmi,
          plotprompt = FALSE)
## When specifying more than one imputation model (suppressing the plots)
proposeMI(mimodobj=list(mimod_bmi7,mimod_pregsize),
          data=bmi,
          plot = FALSE)
# First specify each imputation model as a 'mimod' object
## (suppressing the message)
mimod_bmi7 <- checkModSpec(formula="bmi7~matage+I(matage^2)+mated+pregsize",
                           family="gaussian(identity)",
                           data=bmi,
                           message=FALSE)
mimod_pregsize <- checkModSpec(
                           formula="pregsize~bmi7+matage+I(matage^2)+mated",
                           family="binomial(logit)",
                           data=bmi,
                           message=FALSE)

# Display the proposed 'mice' options (suppressing the plot prompt)
## When specifying a single imputation model
proposeMI(mimodobj=mimod_bmi7,
          data=bmi,
          plotprompt = FALSE)
## When specifying more than one imputation model (suppressing the plots)
proposeMI(mimodobj=list(mimod_bmi7,mimod_pregsize),
          data=bmi,
          plot = FALSE)

Package 'midoc'

Help Index

Child body mass index data

Description

Usage

Format

bmi

Inspect complete records analysis model

Description

Usage

Arguments

Details

Value

References

Examples

Inspect multiple imputation model

Description

Usage

Arguments

Details

Value

References

Examples

Inspect parametric model specification

Description

Usage

Arguments

Value

References

Examples

Lists missing data patterns in the specified dataset

Description

Usage

Arguments

Value

Examples

Performs multiple imputation

Description

Usage

Arguments

Value

Examples

Compares data with proposed DAG

Description

Usage

Arguments

Value

Examples

Run an interactive vignette for the midoc package

Description

Usage

Value

Examples

Suggests multiple imputation options

Description

Usage

Arguments

Value

Examples

`bmi`