Title: | R Package for MR-PATH |
---|---|
Description: | This package implements methods for fitting the MR-PATH model. |
Authors: | Daniel Iong |
Maintainer: | Daniel <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2024-12-30 03:35:18 UTC |
Source: | https://github.com/remlapmot/MRPATH |
Mendelian Randomization (MR) is a popular method in epidemiology and genetics that uses genetic variation as instrumental variables for causal inference. This package implements methods for fitting the MR-PATH model from (Iong et al. 2020).
Daniel Iong, [email protected]
Qingyuan Zhao, [email protected]
Yang Chen, [email protected]
Maintainer: Daniel Iong <[email protected]>
Iong D, Zhao Q, Chen Y (2020). “A Latent Mixture Model for Heterogeneous Causal Mechanisms in Mendelian Randomization.” 2007.06476.
This dataset is created from three genome-wide association studies using the three-sample summary-data MR design (Zhao et al. 2019):
Selection: Akiyama et al. (2017)
Exposure: Locke et al. (2015)
Outcome: Mahajan et al. (2018)
The 60 SNPs selected are independent (distance mega base pairs,
in a reference panel) and are associated with T2D (p-value less than
).
data(bmi_t2d)
data(bmi_t2d)
A data.frame
with 60 rows and 6 variables.
Akiyama M, Okada Y, Kanai M, Takahashi A, Momozawa Y, Ikeda M, Iwata N, Ikegawa S, Hirata M, Matsuda K, others (2017).
“Genome-wide association study identifies 112 new loci for body mass index in the Japanese population.”
Nature Genetics, 49(10), 1458.
Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J, others (2015).
“Genetic studies of body mass index yield new insights for obesity biology.”
Nature, 518(7538), 197–206.
Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, Payne AJ, Steinthorsdottir V, Scott RA, Grarup N, others (2018).
“Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps.”
Nature genetics, 50(11), 1505–1513.
Zhao Q, Chen Y, Wang J, Small DS (2019).
“Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization.”
International Journal of Epidemiology, 48(5), 1478–1492.
ISSN 14643685, doi:10.1093/ije/dyz142, https://academic.oup.com/ije/advance-article-abstract/doi/10.1093/ije/dyz142/5531250.
Computes SNP-specific cluster membership probabilities from the MR-PATH model.
computeClusterMembProb(data, Nsamples = 50000, impt_samples = NULL, MCEM_fit = NULL)
computeClusterMembProb(data, Nsamples = 50000, impt_samples = NULL, MCEM_fit = NULL)
data |
A data frame. (see Details in |
Nsamples |
Number of desired samples. |
impt_samples |
Importance samples from |
MCEM_fit |
MC-EM fit from |
If data
contains a column named SNP
, the rows of the output matrix will be labeled by SNP name.
Returns a p
by K
matrix containing cluster membership probabilities.
https://arxiv.org/abs/2007.06476
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Compute SNP-specific cluster membership probabilities clustermemb_prob = computeClusterMembProb(hdl_chd, MCEM_fit = MCEM_fit)
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Compute SNP-specific cluster membership probabilities clustermemb_prob = computeClusterMembProb(hdl_chd, MCEM_fit = MCEM_fit)
Obtain importance samples of SNP-specific latent variables in the MR-PATH model
getImportanceSamples(data, MCEM_fit, Nsamples = 50000)
getImportanceSamples(data, MCEM_fit, Nsamples = 50000)
Nsamples |
Number of desired samples |
data |
A data frame. (see Details in |
MCEM_fit |
Output from |
thetaX |
|
beta |
|
alpha |
|
W |
|
sampleBetas
, computeClusterMembProb
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Obtain importance samples impt_samples = getImportanceSamples(hdl_chd, MCEM_fit)
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Obtain importance samples impt_samples = getImportanceSamples(hdl_chd, MCEM_fit)
This dataset is created from three genome-wide association studies using the three-sample summary-data MR design (Zhao et al. 2019):
Selection: GWAS of HDL-C by Teslovich et al. (2010)
Exposure: GWAS of lipoprotein subfractions by Kettunen et al. (2016)
Outcome: The CARDIoGRAMplusC4D with 1000 Genome Project imputation GWAS of CAD (Nikpay et al. 2015).
The 151 SNPs selected are independent (distance mega base pairs,
in a reference panel) and are associated with at least one plasma lipid trait (the minimum p-value with HDl-C, LDL-C, and triglycerides is less than
).
data(hdl_chd)
data(hdl_chd)
A data.frame
with 151 rows and 6 variables.
Kettunen J, Demirkan A, Würtz P, Draisma HH, Haller T, Rawal R, Vaarhorst A, Kangas AJ, Lyytikäinen L, Pirinen M, others (2016).
“Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA.”
Nature communications, 7(1), 1–9.
Nikpay M, Goel A, Won H, Hall LM, Willenborg C, Kanoni S, Saleheen D, Kyriakou T, Nelson CP, Hopewell JC, others (2015).
“A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease.”
Nature Genetics, 47(10), 1121.
Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, others (2010).
“Biological, clinical and population relevance of 95 loci for blood lipids.”
Nature, 466(7307), 707–713.
Zhao Q, Chen Y, Wang J, Small DS (2019).
“Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization.”
International Journal of Epidemiology, 48(5), 1478–1492.
ISSN 14643685, doi:10.1093/ije/dyz142, https://academic.oup.com/ije/advance-article-abstract/doi/10.1093/ije/dyz142/5531250.
Fits MR-PATH model with MC-EM algorithm.
MR_PATH(K, data, initVals, overDispersedY = FALSE, equalSds = FALSE, computeSE = TRUE, Nstart_MC = 500L, M = 4L, max_Nsamples = 500000L, min_iters = 2L, max_iters = 100L, alpha = 0.05, gamma = 0.05, eps = 0.005, verbose = FALSE, saveTraj = FALSE)
MR_PATH(K, data, initVals, overDispersedY = FALSE, equalSds = FALSE, computeSE = TRUE, Nstart_MC = 500L, M = 4L, max_Nsamples = 500000L, min_iters = 2L, max_iters = 100L, alpha = 0.05, gamma = 0.05, eps = 0.005, verbose = FALSE, saveTraj = FALSE)
K |
Number of desired clusters (see Details). |
data |
A data frame (see Details) |
initVals |
List of initial values (see Details). |
overDispersedY |
If |
equalSds |
If |
computeSE |
If |
Nstart_MC |
Initial Monte-Carlo sample size for E-step MC approximation. Default is |
M |
Geometric rate at which MC sample size is increased. Default is |
max_Nsamples |
Max MC sample size for E-step MC approximation. Default is |
min_iters |
Min. number of iterations. Default is |
max_iters |
Max number of iterations. Default is |
alpha |
Threshold for Type 1 error rate in E-step approximation. Default is |
gamma |
Threshold for Type 1 error rate in convergence criterion. Default is |
eps |
Threshold for convergence criterion. Default is |
verbose |
If |
saveTraj |
If |
K
can be chosen by the user or by the modified BIC criterion with MRPATH_selectModel
.
The input data frame must contain the following variables:
beta.exposure
beta.outcome
se.exposure
se.outcome
initVals
must be a list with the following elements:
m_X
lambdaX
pi
mus
sds
Since the MC-EM algorithm reaches local optima, it is recommended to run MR_PATH
with a couple different initial values and pick the fit with the highest completeDataLogLik
. One way to pick initial values is to first visualize the data with MRPATH_scatterplot
. For automatic initial value selection, see MRPATH_optimizeInitVals
.
MR_PATH
returns the following information about convergence of the MC-EM algorithm:
Niters
: Number of iterations it takes to converge.
N_MC_end
: Number of MC samples at convergence.
completeDataLogLik
: complete-data log-likelihood value at convergence.
Returns a list containing
paramEst |
List of parameter estimates. |
standardErrors |
Vector of standard errors. |
convergenceInfo |
List of information about convergence of MC-EM (see Details). |
paramTraj |
If |
Iong D, Zhao Q, Chen Y (2020). “A Latent Mixture Model for Heterogeneous Causal Mechanisms in Mendelian Randomization.” 2007.06476.
MRPATH_optimizeInitVals
, MRPATH_selectModel
,
MRPATH_scatterplot
require(MRPATH) ### Load HDL-CHD data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals)
require(MRPATH) ### Load HDL-CHD data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals)
Plots a barplot for credible intervals and cluster membership probabilities with SNPs ordered by median of SNP-specific causal effects.
MRPATH_barplot(data, MCEM_fit, ret.snps = FALSE)
MRPATH_barplot(data, MCEM_fit, ret.snps = FALSE)
data |
A data frame. (see Details in |
MCEM_fit |
Output from |
ret.snps |
If |
Returns a 95% credible interval plot (top) and cluster membership probabilities barplot (bottom).
https://arxiv.org/abs/2007.06476
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Plot barplots MRPATH_barplot(hdl_chd, MCEM_fit)
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Plot barplots MRPATH_barplot(hdl_chd, MCEM_fit)
Runs MR_PATH
multiple times and picks fit with highest complete-data log-likelihood.
MRPATH_optimizeInitVals(K, data, Nreps = 10, verbose = FALSE, altModel = FALSE, init_seed = 8686, ...)
MRPATH_optimizeInitVals(K, data, Nreps = 10, verbose = FALSE, altModel = FALSE, init_seed = 8686, ...)
K |
Number of desired clusters (see Details). |
data |
A data frame. (see Details in |
Nreps |
Number of repetitions to run |
verbose |
If |
altModel |
If |
init_seed |
Initial seed (for reproducibility). |
... |
Additional parameters to be passed to |
K
can be chosen by the user or by the modified BIC criterion with MRPATH_selectModel
.
Returns a list with
fit |
MC-EM fit with optimal initial values. |
initVals |
Optimal initial values. |
Iong D, Zhao Q, Chen Y (2020). “A Latent Mixture Model for Heterogeneous Causal Mechanisms in Mendelian Randomization.” 2007.06476.
require(MRPATH) ### Load HDL-CHD data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 Nreps = 10 # Testing # Nreps = 30 # Takes longer, but more stable results verbose = TRUE # Print output initValsObj = MRPATH_optimizeInitVals(K = K, data = hdl_chd, Nreps = Nreps, verbose=verbose) # See optimized initial values initValsObj$initVals # See MC-EM fit with optimized initial values initValsObj$fit
require(MRPATH) ### Load HDL-CHD data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 Nreps = 10 # Testing # Nreps = 30 # Takes longer, but more stable results verbose = TRUE # Print output initValsObj = MRPATH_optimizeInitVals(K = K, data = hdl_chd, Nreps = Nreps, verbose=verbose) # See optimized initial values initValsObj$initVals # See MC-EM fit with optimized initial values initValsObj$fit
Scatterplot with error bars and results from MR_PATH
.
MRPATH_scatterplot(data, MCEM_fit = NULL, exposure_name = "exposure", outcome_name = "outcome", overDispersedY = FALSE, interactive = TRUE)
MRPATH_scatterplot(data, MCEM_fit = NULL, exposure_name = "exposure", outcome_name = "outcome", overDispersedY = FALSE, interactive = TRUE)
data |
A data frame. (see Details in |
MCEM_fit |
Output from |
exposure_name |
Name of risk exposure variable. |
outcome_name |
Name of disease outcome variable. |
overDispersedY |
If |
interactive |
Logical, if plot is interactive. Default |
Each point is colored according to the cluster with the highest cluster membership probability. Solid lines represent mus
and shaded regions represent 68% interval for each cluster.
Returns a ggplot
scatterplot with error bars and results from MR_PATH
.
https://arxiv.org/abs/2007.06476
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Plot data without MC-EM fit results MRPATH_scatterplot(hdl_chd) ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Plot scatterplot with MC-EM fit results MRPATH_scatterplot(hdl_chd, MCEM_fit)
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Plot data without MC-EM fit results MRPATH_scatterplot(hdl_chd) ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) ### Plot scatterplot with MC-EM fit results MRPATH_scatterplot(hdl_chd, MCEM_fit)
Runs MR_PATH
multiple times and picks fit with highest complete-data log-likelihood.
MRPATH_selectModel(data, K_range = 1:3, Nreps = 20, altModel = FALSE, verbose=FALSE, ...)
MRPATH_selectModel(data, K_range = 1:3, Nreps = 20, altModel = FALSE, verbose=FALSE, ...)
data |
A data frame. (see Details in |
K_range |
Range of |
Nreps |
Number of repetitions for |
altModel |
If |
verbose |
If |
... |
Additional parameters to be passed to |
Returns a list with
bestK |
|
bestFit |
MC-EM fit with highest |
Q |
Vector of complete-data log-likelihoods for each |
BIC |
Vector of BIC values for each |
https://arxiv.org/abs/2007.06476
MR_PATH
, MRPATH_optimizeInitVals
require(MRPATH) ### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] Nreps = 10 # Testing # Nrep = 30 # Takes longer, but more stable results verbose = TRUE # Print output K_range = 1:3 # Set range of K modSelectionObj = MRPATH_selectModel(hdl_chd, K_range = K_range, Nreps = Nreps, verbose = verbose) # See optimal K modSelectionObj$bestK # See optimal fit modSelectionObj$bestFit # See vector of complete-data log-likelihood values modSelectionObj$Q # See vector of BIC values modSelectionObj$BIC
require(MRPATH) ### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] Nreps = 10 # Testing # Nrep = 30 # Takes longer, but more stable results verbose = TRUE # Print output K_range = 1:3 # Set range of K modSelectionObj = MRPATH_selectModel(hdl_chd, K_range = K_range, Nreps = Nreps, verbose = verbose) # See optimal K modSelectionObj$bestK # See optimal fit modSelectionObj$bestFit # See vector of complete-data log-likelihood values modSelectionObj$Q # See vector of BIC values modSelectionObj$BIC
Re-samples importance samples to obtain samples of SNP-specific causal effects in MR-PATH
sampleBetas(data, Nsamples = 50000, impt_samples = NULL, MCEM_fit = NULL)
sampleBetas(data, Nsamples = 50000, impt_samples = NULL, MCEM_fit = NULL)
data |
A data frame. (see Details in |
Nsamples |
Number of desired samples. |
impt_samples |
Importance samples from |
MCEM_fit |
MC-EM fit from |
If data
contains a column named SNP
, the rows of the output matrix will be labeled by SNP name.
Returns a p
by Nsamples
matrix containing samples of SNP-specific causal effects.
https://arxiv.org/abs/2007.06476
MR_PATH
, computeClusterMembProb
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) # Sample SNP-specific causal effects beta_samples = sampleBetas(hdl_chd, MCEM_fit = MCEM_fit)
### Load data data(hdl_chd) ### Filter weak instruments hdl_chd = hdl_chd[hdl_chd$pval.selection < 5e-8,] ### Set your own K. # For data-driven model selection, use MRPATH_selectModel K = 2 ### Set your own initial values. # For automatic initial value selection, use MRPATH_optimizeInitVals initVals = list("m_X" = mean(hdl_chd$beta.exposure), "lambdaX" = sd(hdl_chd$beta.exposure), "pis" = rep(1/K, K), "mus" = c(-.9, .4), "sds" = c(.9, .4)) ### Run MC-EM algorithm MCEM_fit = MR_PATH(K, hdl_chd, initVals) # Sample SNP-specific causal effects beta_samples = sampleBetas(hdl_chd, MCEM_fit = MCEM_fit)