Title: | Doubly-Ranked Stratification in Mendelian Randomization |
---|---|
Description: | Doubly-ranked and residual stratification for instrumental variable and Mendelian randomization studies and further stratification-based analysis. |
Authors: | Haodong Tian |
Maintainer: | Haodong Tian <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-12-27 02:58:20 UTC |
Source: | https://github.com/HDTian/DRMR |
Creat a simulated data set for stratification
getDat(N=10000, IVtype='cont', ZXmodel='A', XYmodel='1', printRR=FALSE )
getDat(N=10000, IVtype='cont', ZXmodel='A', XYmodel='1', printRR=FALSE )
N |
a value indicates the sample size |
IVtype |
a character indicates which type of IV is used. Three IV types are available: the binary/dichotomous instrument ( |
ZXmodel |
a character indicates which instrument-exposure model is used. The available chioces include: |
XYmodel |
a character indicates which exposure-outcome model is used. The available chioces include: |
printRR |
Logic value indicates whether the coefficient of determination values for the instrument strength to be returned. |
getDat
returns a data frame consisting of the instrument, the exposure, the outcome information.
Tian, H., Mason, A. M., Liu, C., & Burgess, S. (2022). "Relaxing parametric assumptions for non-linear Mendelian randomization using a doubly-ranked stratification method". bioRxiv, 2022-06. (Doubly-ranked stratification)
dat<-getDat( IVtype='cont', ZXmodel='A',XYmodel='1' ) #get a toy data
dat<-getDat( IVtype='cont', ZXmodel='A',XYmodel='1' ) #get a toy data
Creat a simulated data set for stratification. getDatU
contains different scenarios of the confounder effects on the outcome.
getDatU(N = 10000, IVtype = "cont", UYmodel = "U1", XYmodel = "1", printRR = FALSE)
getDatU(N = 10000, IVtype = "cont", UYmodel = "U1", XYmodel = "1", printRR = FALSE)
N |
a value indicates the sample size |
IVtype |
a character indicates which type of IV is used. Two IV types are available: the binary/dichotomous instrument ( |
UYmodel |
a character indicates which confounder-outcome model is used. The available chioces include: |
XYmodel |
a character indicates which exposure-outcome model is used. The available chioces include: |
printRR |
Logic value indicates whether the coefficient of determination values for the instrument strength to be returned. |
getDatU
returns a data frame consisting of the instrument, the exposure, the outcome information.
Tian, H., Mason, A. M., Liu, C., & Burgess, S. (2022). "Relaxing parametric assumptions for non-linear Mendelian randomization using a doubly-ranked stratification method". bioRxiv, 2022-06. (Doubly-ranked stratification)
dat<-getDatU(IVtype='cont', XYmodel='1',UYmodel='U1',printRR=TRUE )
dat<-getDatU(IVtype='cont', XYmodel='1',UYmodel='U1',printRR=TRUE )
Get the Gelman-Rubin uniformity statistics for each stratum. This is used to check the degree of coarsenness when the exposure is coarsened.
getGRstats(rdat, Nc = 2, roundnum = 3)
getGRstats(rdat, Nc = 2, roundnum = 3)
rdat |
a data containing the stratification information. |
Nc |
integer value indicates how many chain used for calculating the GR statistic. No absolutely optimal choice. Default value is |
roundnum |
a digit indicates how many decimal places the result shoul be retained |
See supplementary Text S1 of the original paper for more details.
getGRstats
gives the GR statistic values for each stratum. Small values (<1.02, the heuristic threshold) indicates a good degree of the coarsenness.
Tian, H., Mason, A. M., Liu, C., & Burgess, S. (2022). "Relaxing parametric assumptions for non-linear Mendelian randomization using a doubly-ranked stratification method". bioRxiv, 2022-06.
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='1' ) #get a toy data rdat<-Stratify(dat) #Do stratification on the data getGRstats(rdat,Nc=2,roundnum=3) getGRstats(rdat,Nc=5,roundnum=3) getGRstats(rdat,Nc=10,roundnum=3) getGRstats(rdat,Nc=100,roundnum=3)
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='1' ) #get a toy data rdat<-Stratify(dat) #Do stratification on the data getGRstats(rdat,Nc=2,roundnum=3) getGRstats(rdat,Nc=5,roundnum=3) getGRstats(rdat,Nc=10,roundnum=3) getGRstats(rdat,Nc=100,roundnum=3)
Return the summary information of each stratum
getSummaryInf(rdat, family_used='gaussian', covariate=FALSE, target = FALSE, XYmodel = "1", bxthre = 1e-05, getHeterQ = TRUE, onlyDR = FALSE)
getSummaryInf(rdat, family_used='gaussian', covariate=FALSE, target = FALSE, XYmodel = "1", bxthre = 1e-05, getHeterQ = TRUE, onlyDR = FALSE)
rdat |
a data containing the stratification information. |
family_used |
a character indicates the type of the outcome. Currently support the exponential family that can be recognized by |
covariate |
logic value indicates whether or not adjust covariates. |
target |
logic value indicates whether or not calculate the target effect values fo each stratum. Target effects can only be known when the true causal effect is known. If you have real data, you may not need this argument. The default value is |
XYmodel |
a character indicating the exposure-outcome model on which the target effect is calculated and based. Only applicable when |
bxthre |
a threshold value for the instrument-exposure assocations. When absolute instrument-exposure association is less than the threshold value, the MR ests will not be calculated for this stratum to avoid extreme MR results. |
getHeterQ |
logic value indicating whether generates the heterogeneity testing results. The default value is |
onlyDR |
logic vlaue indicating whether only get the summary information for the doubly-ranked stratification. Default value is |
The heterogeneity statistic can be found in Supplementary Text S2. The details of the target effects can be found in Supplementary Text S3.
If covariates are to be adjusted, they should be named as C1
, C2
, etc in rdat
.
getSummaryInf
returns a list consisting of the following elements: Rres
, RHeterQ
, DRres
, DRHeterQ
. Rres
and DRres
contain the summry information for each stratum by the residual and doubly-ranked stratification method. RHeterQ
and DRHeterQ
contains the heterogeneity information.
The summary information for each stratum include (all variables are stratum-specific)
size |
sample size |
min |
minimal exposure (or covariate) value |
1q |
the 1st quartile exposure (or covariate) value |
3q |
the 3rd quartile exposure (or covariate) value |
max |
maximal exposure (or covariate) value |
Fvalue |
F statistic value of the instrument (ie the instrument strength) |
bx |
instrument-exposure associations |
bxse |
standard error of |
by |
instrument-outcome associations |
byse |
standard error of |
est |
MR estimate |
se |
standard error of |
target |
target effect values |
Note that if the stratificaiton is for the covariate, the values of min
, 1q
, 3q
and max
are for the covariate as well. If you are confused about the variable, these values are for rdat$M
(generally rdat$M==rdat$X
).
The heterogeneity test information includes
Q statistic |
heterogeneity statistic value |
df |
the degree of freedom |
p-value |
the p-value |
Tian, H., Mason, A. M., Liu, C., & Burgess, S. (2022). "Relaxing parametric assumptions for non-linear Mendelian randomization using a doubly-ranked stratification method". bioRxiv, 2022-06.
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='1' ) #get a toy data rdat<-Stratify(dat) #Do stratification on the data RES<-getSummaryInf( rdat, target=TRUE, bxthre=1e-5, XYmodel='1',getHeterQ=TRUE)
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='1' ) #get a toy data rdat<-Stratify(dat) #Do stratification on the data RES<-getSummaryInf( rdat, target=TRUE, bxthre=1e-5, XYmodel='1',getHeterQ=TRUE)
Smooth
helps to smooth the stratification results based on the stratum-specifc results.
Smooth(RES, StraMet='DR', Rall = NA, baseline = NA, splinestyle = 'Bspline', Norder = 1, XYmodel = '0', Knots = NA, Lambda = 0, random_effect = TRUE, getHeterQ = TRUE, Plot = FALSE, ylim_used = NA)
Smooth(RES, StraMet='DR', Rall = NA, baseline = NA, splinestyle = 'Bspline', Norder = 1, XYmodel = '0', Knots = NA, Lambda = 0, random_effect = TRUE, getHeterQ = TRUE, Plot = FALSE, ylim_used = NA)
RES |
the summary information data derived by |
StraMet |
a character variable indicates which stratification method result will be used for further smoothing. Default is |
Rall |
a vector indicates the exposure range for smoothing and visualization. If |
baseline |
a value representing the baseline exposure value for visualizing the causal effect shape. If |
splinestyle |
a character variable indicates the spline style used for smoothing. Default is |
Norder |
a positive integer value indicates the order to be used for smoothing. The default value is |
XYmodel |
a character variable indicates the index of the true underlying causal model. |
Knots |
either a vector or a character indicating the internal knots used for smoothing. If |
Lambda |
a positive value indicates the tuning parameter for smoothing roughness. Default value is |
random_effect |
a logic value indicates whether to use random-effect for smoothing models. Default value is |
getHeterQ |
a logic value indicates whether generates the heterogeneity testing results for the visulization results. The default value is |
Plot |
a logic value indicates whether print the visulization results immediately. Default is |
ylim_used |
a vector indicates the y axis limit for visulization of h'(x). |
The smoothing is based on B-spline system. Relevant papers will be appear to provide further details. The present smoothing methods for stratification results are fractional polynomial method and the piecewise linear method (see Reference below). Both of them can be achieved by Smooth
(see Examples below).
Smooth()
gives a list containing the following elements:
thetahat |
the estimated parameters for the B-spline basisfunctions |
var.matrix |
variance-covariance matrix of |
summary |
summary information of the smoothing results, consisting of the estimates, standard errors and p-values for the B-spline basisfunctions. |
p |
visulization plot for the derivatives of the causal effect shape |
hp |
visulization plot for the causal effect shape |
Staley J R, Burgess S. Semiparametric methods for estimation of a nonlinear exposure-outcome relationship using instrumental variables with application to Mendelian randomization[J]. Genetic epidemiology, 2017, 41(4): 341-352.
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='2' ) rdat<-Stratify(dat) RES<-getSummaryInf( rdat,target=FALSE) library(metafor) #the fractional polynomial method (e.g. with degree 2) smooth_res<-Smooth(RES,Norder=3,baseline=0) #the piecewise linear method cutting_values<-(RES$DRres$mean[-1] + head( RES$DRres$mean,-1) )/2 smooth_res<-Smooth(RES,Norder=1,baseline=0,Knots=cutting_values)
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='2' ) rdat<-Stratify(dat) RES<-getSummaryInf( rdat,target=FALSE) library(metafor) #the fractional polynomial method (e.g. with degree 2) smooth_res<-Smooth(RES,Norder=3,baseline=0) #the piecewise linear method cutting_values<-(RES$DRres$mean[-1] + head( RES$DRres$mean,-1) )/2 smooth_res<-Smooth(RES,Norder=1,baseline=0,Knots=cutting_values)
Do doubly-ranked and residual stratification on a data
Stratify(dat, onExposure = TRUE, Ns = 10, SoP = NA, seed = NA)
Stratify(dat, onExposure = TRUE, Ns = 10, SoP = NA, seed = NA)
dat |
a dataset to be stratified. |
onExposure |
Logic value to determine whether stratification is done on exposure or not. If |
Ns |
a value indicates the number of strata to be built. |
SoP |
a value indicates the size of pre-strata. Default size of pre-stratum is equal to the number of strata. |
seed |
a value of seed for reproducibility. Default value is |
Stratify
does not require the pre-stratum or stratum has the equal size, so no need to drop individuals. Even if nrow(dat)/Ns
or nrow(dat)/SoP
is not an integer, stratification still works. One can let SoP>Ns
, which will help to increase the stability of the doubly-ranked stratification results.
Note that the Stratify
function will not induce internal randomness autonomously, in the sense that the same inputted data will always return the same output, so no worries about the reproducibility even you forget to fix seed. The argument seed
help to induce internal randomness when the data contains the fixed instrument or exposure values.
Stratify
returns the same data as dat
with augmented stratificaiton information. The new columns Rstratum
, pre_stratum
and DRstratum
represents the residual stratification results, the pre-stratum, and the doubly-ranked stratification results rexpectively.
Burgess, S., Davies, N. M., & Thompson, S. G. (2014). "Instrumental variable analysis with a nonlinear exposure-outcome relationship". Epidemiology (Cambridge, Mass.), 25(6), 877. (Residual stratification)
Tian, H., Mason, A. M., Liu, C., & Burgess, S. (2022). "Relaxing parametric assumptions for non-linear Mendelian randomization using a doubly-ranked stratification method". bioRxiv, 2022-06. (Doubly-ranked stratification)
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='2' ) rdat<-Stratify(dat)
dat<-getDat( IVtype='cont', ZXmodel='C',XYmodel='2' ) rdat<-Stratify(dat)