Title: | gsmr2 - a tool for causal inference between complex traits |
---|---|
Description: | GSMR2 (Generalised Summary-data-based Mendelian Randomisation v2) is an improved version of GSMR, which uses GWAS summary statistics to test for a putative causal association between two phenotypes (e.g., a modifiable risk factor and a disease) based on a multi-SNP model. This version implements a global heterogeneity test to remove invalid instrumental variables and provides a causal estimation that is more robust to directional pleiotropy. |
Authors: | Zhihong Zhu, Angli Xue, Zhili Zheng, Futao Zhang, Jian Yang |
Maintainer: | Zhihong Zhu <[email protected]>, Angli Xue <[email protected]>, Jian Yang <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 1.1.1 |
Built: | 2025-01-07 04:29:53 UTC |
Source: | https://github.com/jianyanglab/gsmr2 |
Perform Generalized Summary-data-based Mendelian Randomization analysis (GSMR) and HEterogeneity In Dependent Instruments analysis to remove pleiotropic outliers (HEIDI-outlier).
Zhihong Zhu <[email protected]>
Zhili Zheng <[email protected]>
Futao Zhang <[email protected]>
Jian Yang <[email protected]>
Zhu, Z. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nature Communications, in press. An early verison of the manuscript is available at bioRxiv, 168674.
Bi-directional GSMR analysis is composed of a forward-GSMR analysis and a reverse-GSMR analysis that uses SNPs associated with the disease (e.g. at < 5e-8) as the instruments to test for putative causal effect of the disease on the risk factor.
bi_gsmr(bzx, bzx_se, bzx_pval, bzy, bzy_se, bzy_pval, ldrho, snpid, heidi_outlier_flag=T, gwas_thresh=5e-8, single_snp_heidi_thresh=0.01, multi_snp_heidi_thresh=0.01, nsnps_thresh=10, ld_r2_thresh=0.05, ld_fdr_thresh=0.05, gsmr2_beta=0)
bi_gsmr(bzx, bzx_se, bzx_pval, bzy, bzy_se, bzy_pval, ldrho, snpid, heidi_outlier_flag=T, gwas_thresh=5e-8, single_snp_heidi_thresh=0.01, multi_snp_heidi_thresh=0.01, nsnps_thresh=10, ld_r2_thresh=0.05, ld_fdr_thresh=0.05, gsmr2_beta=0)
bzx |
vector, SNP effects on risk factor |
bzx_se |
vector, standard errors of bzx |
bzx_pval |
vector, p values for bzx |
bzy |
vector, SNP effects on disease |
bzy_se |
vector, standard errors of bzy |
bzy_pval |
vector, p values for bzy |
ldrho |
LD correlation matrix of the SNPs |
snpid |
genetic instruments |
n_ref |
sample size of the reference sample |
heidi_outlier_flag |
flag for HEIDI-outlier analysis |
gwas_thresh |
threshold p-value to select instruments from GWAS for risk factor |
single_snp_heidi_thresh |
p-value threshold for single-SNP-based HEIDI-outlier analysis |
multi_snp_heidi_thresh |
p-value threshold for multi-SNP-based HEIDI-outlier analysis |
nsnps_thresh |
the minimum number of instruments required for the GSMR analysis (we do not recommend users to set this number smaller than 10) |
ld_r2_thresh |
LD r2 threshold to remove SNPs in high LD |
ld_fdr_thresh |
FDR threshold to remove the chance correlations between SNP instruments |
gsmr2_beta |
GSMR2 beta version, including a new HEIDI-outlier method (used in a GSMR analysis) that is currently under development and subject to future changes, 0 - the original HEIDI-outlier method, 1 - the new HEIDI-outlier method |
Estimate of causative effect of risk factor on disease (forward_bxy), the corresponding standard error (forward_bxy_se), p-value (forward_bxy_pval) and SNP index (forward_index), and estimate of causative effect of disease on risk factor (reverse_bxy), the corresponding standard error (reverse_bxy_se), p-value (reverse_bxy_pval), SNP index (reverse_index), SNPs with missing values, with non-significant p-values and those in LD.
data("gsmr") gsmr_result = bi_gsmr(gsmr_data$bzx, gsmr_data$bzx_se, gsmr_data$bzx_pval, gsmr_data$bzy, gsmr_data$bzy_se, gsmr_data$bzy_pval, ldrho, gsmr_data$SNP, n_ref, T, 5e-8, 0.01, 0.01, 10, 0.05, 0.05, 0)
data("gsmr") gsmr_result = bi_gsmr(gsmr_data$bzx, gsmr_data$bzx_se, gsmr_data$bzx_pval, gsmr_data$bzy, gsmr_data$bzy_se, gsmr_data$bzy_pval, ldrho, gsmr_data$SNP, n_ref, T, 5e-8, 0.01, 0.01, 10, 0.05, 0.05, 0)
GSMR (Generalised Summary-data-based Mendelian Randomisation) is a flexible and powerful approach that utilises multiple genetic instruments to test for causal association between a risk factor and disease using summary-level data from independent genome-wide association studies.
gsmr(bzx, bzx_se, bzx_pval, bzy, bzy_se, ldrho, snpid, heidi_outlier_flag=T, gwas_thresh=5e-8, single_heidi_thresh=0.01, multi_heidi_thresh=0.01, nsnps_thresh=10, ld_r2_thresh=0.05, ld_fdr_thresh=0.05, gsmr2_beta=0)
gsmr(bzx, bzx_se, bzx_pval, bzy, bzy_se, ldrho, snpid, heidi_outlier_flag=T, gwas_thresh=5e-8, single_heidi_thresh=0.01, multi_heidi_thresh=0.01, nsnps_thresh=10, ld_r2_thresh=0.05, ld_fdr_thresh=0.05, gsmr2_beta=0)
bzx |
vector, SNP effects on risk factor |
bzx_se |
vector, standard errors of bzx |
bzx_pval |
vector, p values for bzx |
bzy |
vector, SNP effects on disease |
bzy_se |
vector, standard errors of bzy |
ldrho |
LD correlation matrix of the SNPs |
snpid |
genetic instruments |
n_ref |
sample size of the reference sample |
heidi_outlier_flag |
flag for HEIDI-outlier analysis |
gwas_thresh |
threshold p-value to select instruments from GWAS for risk factor |
nsnps_thresh |
the minimum number of instruments required for the GSMR analysis (we do not recommend users to set this number smaller than 10) |
ld_r2_thresh |
LD r2 threshold to remove SNPs in high LD |
ld_fdr_thresh |
FDR threshold to remove the chance correlations between SNP instruments |
gsmr2_beta |
GSMR2 beta version, including a new HEIDI-outlier method (used in a GSMR analysis) that is currently under development and subject to future changes, 0 - the original HEIDI-outlier method, 1 - the new HEIDI-outlier method |
single_heidi_thresh |
p-value threshold for single-SNP-based HEIDI-outlier analysis |
multi_heidi_thresh |
p-value threshold for multi-SNP-based HEIDI-outlier analysis |
Estimate of causative effect of risk factor on disease (bxy), the corresponding standard error (bxy_se), p-value (bxy_pval), SNP index (used_index), SNPs with missing values, with non-significant p-values and those in LD.
data("gsmr") gsmr_result = gsmr(gsmr_data$bzx, gsmr_data$bzx_se, gsmr_data$bzx_pval, gsmr_data$bzy, gsmr_data$bzy_se, ldrho, gsmr_data$SNP, n_ref, T, 5e-8, 0.01, 0.01, 10, 0.1, 0.05, 0)
data("gsmr") gsmr_result = gsmr(gsmr_data$bzx, gsmr_data$bzx_se, gsmr_data$bzx_pval, gsmr_data$bzy, gsmr_data$bzy_se, ldrho, gsmr_data$SNP, n_ref, T, 5e-8, 0.01, 0.01, 10, 0.1, 0.05, 0)
Standardization of SNP effect and its standard error using z-statistic, allele frequency and sample size
std_effect(snp_freq, b, se, n)
std_effect(snp_freq, b, se, n)
snp_freq |
vector, allele frequencies |
b |
vector, SNP effects on risk factor |
se |
vector, standard errors of b |
n |
vector, per-SNP sample sizes for GWAS of the risk factor |
Standardised effect (b) and standard error (se)
data("gsmr") std_effects = std_effect(gsmr_data$a1_freq, gsmr_data$bzx, gsmr_data$bzx_se, gsmr_data$bzx_n)
data("gsmr") std_effects = std_effect(gsmr_data$a1_freq, gsmr_data$bzx, gsmr_data$bzx_se, gsmr_data$bzx_n)