Title: | MR-SimSS: Mendelian Randomisation (MR) method that combats Winner's Curse using a simulated sample splitting approach |
---|---|
Description: | Designed to provide users with a method, namely MR-SimSS, which uses simulated sample splitting in order to alleviate Winner's Curse bias in MR causal effect estimates. This approach also takes into account sample overlap between the exposure and outcome genome-wide association studies. It uses summary statistics from genome-wide association studies and works in combination with existing MR methods, such as IVW and MR-RAPS. |
Authors: | Amanda Forde [aut, cre] |
Maintainer: | Amanda Forde <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-12-24 05:31:14 UTC |
Source: | https://github.com/amandaforde/mr.simss |
A package designed to provide users with a method, namely MR-SimSS, which uses simulated sample splitting in order to alleviate Winner's Curse bias in MR causal effect estimates. This approach also takes into account sample overlap between the exposure and outcome genome-wide association studies. It uses summary statistics from genome-wide association studies and works in combination with existing MR methods, such as IVW and MR-RAPS.
Full documentation available here: https://amandaforde.github.io/mr.simss/
est_lambda
is a function which allows users to obtain an
unbiased estimate for lambda, a term used to describe the
correlation between the SNP-outcome and SNP-exposure effect sizes, using a conditional log-likelihood approach. This
correlation is affected by the number of overlapping samples between the two
GWASs and the correlation between the exposure and the outcome. Thus, when
using the function mr_simss
, if the fraction of overlap and the
correlation between exposure and outcome are unknown, it is recommended to
employ est_lambda
and use the value returned from est_lambda
in
mr_simss
.
Note: For greater accuracy in the estimation of
lambda, it is advisable to use summary statistics of the entire set of
unpruned SNPs from the exposure and outcome GWASs.
est_lambda(data, z.threshold = 0.5)
est_lambda(data, z.threshold = 0.5)
data |
A data frame to be inputted by the user containing summary
statistics from the exposure and outcome GWASs. It must have at least five
columns with column names |
z.threshold |
A value which is used to obtain a subset of SNPs which have
absolute z-statistics for both exposure and outcome GWASs less than
this value. The method then assumes that both of the true SNP-outcome and
SNP-exposure effect sizes of each SNP in this subset are approximately 0.
The default setting is |
A value which is an estimate of lambda, the correlation between the SNP-outcome and SNP-exposure effect sizes, using a conditional log-likelihood approach. Note that this estimate is unbiased but potentially has a high degree of variance.
https://amandaforde.github.io/mr.simss/articles/perform-MR-SimSS.html
for illustration of the use of est_lambda
with a toy data set and
https://amandaforde.github.io/mr.simss/articles/derive-MR-SimSS.html for
the theoretical derivation of this method based on a conditional
log-likelihood approach for estimating lambda.
mr_simss
is the main function for the method, MR-SimSS, which
is a method based on simulated sample splitting in order to alleviate
Winner's Curse bias in MR causal effect estimates. It also takes into
account sample overlap between the exposure and outcome GWASs. It uses GWAS
summary statistics and works in combination with existing MR methods, such as
IVW and MR-RAPS.
mr_simss( data, subset = FALSE, sub.cut = 0.05, est.lambda = TRUE, n.exposure = 1, n.outcome = 1, n.overlap = 1, cor.xy = 0, n.iter = 1000, splits = 2, pi = 0.5, pi2 = 0.5, threshold = 5e-08, mr_method = "mr_ivw", parallel = TRUE, n.cores = NULL, lambda.thresh = 0.5 )
mr_simss( data, subset = FALSE, sub.cut = 0.05, est.lambda = TRUE, n.exposure = 1, n.outcome = 1, n.overlap = 1, cor.xy = 0, n.iter = 1000, splits = 2, pi = 0.5, pi2 = 0.5, threshold = 5e-08, mr_method = "mr_ivw", parallel = TRUE, n.cores = NULL, lambda.thresh = 0.5 )
data |
A data frame to be inputted by the user containing summary
statistics from the exposure and outcome GWASs. It must have at least five
columns with column names |
subset |
A logical which permits the user to perform this method with
either the original complete set of SNPs or a subset of SNPs in order to
reduce computational time. The default setting is |
sub.cut |
A numerical value required if |
est.lambda |
A logical which allows the user to specify if they want to use
the function, |
n.exposure |
A numerical value to be specified by the user which is equal
to the number of individuals that were in the exposure GWAS. It should be
specified by the user if |
n.outcome |
A numerical value to be specified by the user which is equal
to the number of individuals that were in the outcome GWAS. It should be
specified by the user if |
n.overlap |
A numerical value to be specified by the user which is equal
to the number of individuals that were in both the exposure and outcome
GWAS. It should be specified by the user if |
cor.xy |
A numerical value to be specified by the user which is equal to
the observed correlation between the exposure and the outcome. This value
must be between -1 and 1. It should be specified by the user if
|
n.iter |
A numerical value which specifies the number of iterations of
the method, i.e. the number of times sample splits are randomly simulated.
The default setting is |
splits |
A numerical value that must be equal to 2 or 3, indicating
whether splits of 2 or 3 should be simulated. It is recommended that in the
case of no overlap between the two GWASs that splits of 2 should be used
while in the presence of overlap, especially full overlap, splits of 3
should be used. The default setting is |
pi |
A numerical value which determines the fraction of the first split
in both the 2 and 3 split approaches. This is the fraction that will be used
for SNP selection. The default setting is |
pi2 |
A numerical value which determines the fraction of the second split
in the 3 split approach. The default setting is |
threshold |
A numerical value which specifies the threshold used to
select instrument SNPs for MR at each iteration. The default setting is
|
mr_method |
A string which specifies the MR method that MR-SimSS works in
combination with. It is possible to use any method outputted in the list
|
parallel |
A logical value which allows the user to specify if they wish
to use this function in parallel or in series. The default setting is
|
n.cores |
A numerical value which determines how many cores will be used
if |
lambda.thresh |
A value which is used when estimating lambda to
obtain a subset of SNPs which have absolute z-statistics for both exposure and outcome GWASs less than
this value. The method then assumes that both of the true SNP-outcome and
SNP-exposure effect sizes of each SNP in this subset are approximately 0.
The default setting is |
A list containing two elements, summary
and results
.
summary
is a data frame with one row which outputs b
, the
estimated causal effect of exposure on outcome obtained using the
MR-SimSS method, as well as se
, the associated standard
error of this estimate and pval
, corresponding p-value. It
also contains the MR method used, the average number of instrument SNPs used
in each iteration and the number of iterations used. results
is a
data frame which contains the output from each iteration. It is in a similar
style as the output from using the function mr
from the
TwoSampleMR
R package.
https://amandaforde.github.io/mr.simss/articles/perform-MR-SimSS.html
for illustration of the use of mr_simss
with a toy data set and further
information regarding this MR method.
split2
is a function which is used by the main function, mr_simss
in order to perform the 2 split approach of the method, MR-SimSS.
split2(data, lambda.val = 0, pi = 0.5, mr_method = "mr_ivw", threshold = 5e-08)
split2(data, lambda.val = 0, pi = 0.5, mr_method = "mr_ivw", threshold = 5e-08)
data |
A data frame to be inputted by the user containing summary
statistics from the exposure and outcome GWASs. It must have at least five
columns with column names |
lambda.val |
A numerical value which is computed within the main function, |
pi |
A numerical value which determines the fraction of the first split. This is the fraction that will be used
for SNP selection. The default setting is |
mr_method |
A string which specifies the MR method that MR-SimSS works in
combination with. It is possible to use any method outputted in the list
|
threshold |
A numerical value which specifies the threshold used to
select instrument SNPs for MR at each iteration. The default setting is
|
A data frame which contains
the output from one iteration. It is in a similar style as the output from
using the function mr
from the TwoSampleMR
R package. The MR method used, the number of instrument SNPs, the causal effect estimate, it associated standard error and p-value are all outputted.
split3
is a function which is used by the main function, mr_simss
in order to perform the 3 split approach of the method, MR-SimSS.
split3( data, lambda.val = 0, pi = 0.5, pi2 = 0.5, mr_method = "mr_ivw", threshold = 5e-08 )
split3( data, lambda.val = 0, pi = 0.5, pi2 = 0.5, mr_method = "mr_ivw", threshold = 5e-08 )
data |
A data frame to be inputted by the user containing summary
statistics from the exposure and outcome GWASs. It must have at least five
columns with column names |
lambda.val |
A numerical value which is computed within the main function, |
pi |
A numerical value which determines the fraction of the first split. This is the fraction that will be used
for SNP selection. The default setting is |
pi2 |
A numerical value which determines the fraction of the second split. The default setting is |
mr_method |
A string which specifies the MR method that MR-SimSS works in
combination with. It is possible to use any method outputted in the list
|
threshold |
A numerical value which specifies the threshold used to
select instrument SNPs for MR at each iteration. The default setting is
|
A data frame which contains
the output from one iteration. It is in a similar style as the output from
using the function mr
from the TwoSampleMR
R package. The MR method used, the number of instrument SNPs, the causal effect estimate, it associated standard error and p-value are all outputted.