| Title: | Causal Network Inference via Mendelian Randomization with cML and Network Deconvolution |
|---|---|
| Description: | Combines Mendelian randomization (constrained maximum likelihood, cML) and network deconvolution for inference of causal networks from GWAS summary data, as described in Lin, Xue, and Pan (2023) <doi:10.1371/journal.pgen.1010762>. |
| Authors: | Zhaotong Lin [aut, cre], Haoran Xue [aut], Wei Pan [aut] |
| Maintainer: | Zhaotong Lin <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-06-06 05:48:40 UTC |
| Source: | https://github.com/ZhaotongL/GraphMRcML |
Coordinate-descent estimation of the causal effect under the cML model with a fixed number of invalid IVs.
cML_estimate_O( b_exp, b_out, se_exp, se_out, K, initial_theta = 0, initial_mu = rep(0, length(b_exp)), maxit = 100, rho = 0, t = 0 )cML_estimate_O( b_exp, b_out, se_exp, se_out, K, initial_theta = 0, initial_mu = rep(0, length(b_exp)), maxit = 100, rho = 0, t = 0 )
b_exp |
Numeric vector of SNP-exposure effect estimates. |
b_out |
Numeric vector of SNP-outcome effect estimates. |
se_exp |
Numeric vector of standard errors for 'b_exp'. |
se_out |
Numeric vector of standard errors for 'b_out'. |
K |
Integer; number of invalid IVs to allow. |
initial_theta |
Numeric; starting value for the causal effect. |
initial_mu |
Numeric vector; starting values for the SNP-exposure effects (length matching 'b_exp'). |
maxit |
Integer; maximum number of coordinate-descent iterations. |
rho |
Numeric; correlation between 'b_exp' and 'b_out' due to sample overlap. |
t |
Numeric; instrument-selection threshold on the z-scale. |
A list with elements 'theta' (estimated causal effect), 'b_vec' (estimated true SNP-exposure effects) and 'r_vec' (estimated invalid-IV effects).
Runs 'cML_estimate_O' from one or more random starting points and returns the solution achieving the smallest negative log-likelihood.
cML_estimate_random_O( b_exp, b_out, se_exp, se_out, K, random_start = 0, maxit = 100, rho = 0, t = 0, var_est = 1 )cML_estimate_random_O( b_exp, b_out, se_exp, se_out, K, random_start = 0, maxit = 100, rho = 0, t = 0, var_est = 1 )
b_exp |
Numeric vector of SNP-exposure effect estimates. |
b_out |
Numeric vector of SNP-outcome effect estimates. |
se_exp |
Numeric vector of standard errors for 'b_exp'. |
se_out |
Numeric vector of standard errors for 'b_out'. |
K |
Integer; number of invalid IVs to allow. |
random_start |
Integer; number of additional random starts (in addition to the default zero start). |
maxit |
Integer; maximum number of coordinate-descent iterations. |
rho |
Correlation between 'b_exp' and 'b_out' due to sample overlap. |
t |
Instrument-selection threshold on the z-scale. |
var_est |
Integer in '1:4' selecting the variance estimator returned by 'cML_SdTheta_O': '1' cMLE, '2' cMLE robust, '3' MPLE, '4' MPLE robust. |
A list with elements 'theta', 'se', 'l' (negative log-likelihood) and 'r_est' (invalid-IV indicator vector).
Computes model-based and robust (sandwich) standard errors for both the constrained MLE and the profile MLE of the causal effect.
cML_SdTheta_O(b_exp, b_out, se_exp, se_out, theta, b_vec, r_vec, rho, t)cML_SdTheta_O(b_exp, b_out, se_exp, se_out, theta, b_vec, r_vec, rho, t)
b_exp |
Numeric vector of SNP-exposure effect estimates. |
b_out |
Numeric vector of SNP-outcome effect estimates. |
se_exp |
Numeric vector of standard errors for 'b_exp'. |
se_out |
Numeric vector of standard errors for 'b_out'. |
theta |
Estimated causal effect from 'cML_estimate_O'. |
b_vec |
Estimated true SNP-exposure effects from 'cML_estimate_O'. |
r_vec |
Estimated invalid-IV effects from 'cML_estimate_O'. |
rho |
Correlation between 'b_exp' and 'b_out' due to sample overlap. |
t |
Instrument-selection threshold on the z-scale. |
A list with elements 'cMLE_se', 'cMLE_robust_se', 'MPLE_se' and 'MPLE_robust_se' (any of which may be 'NaN' if the corresponding variance is non-positive or non-invertible).
Adds correlated Gaussian noise to 'b_mat' using the per-block square-root matrices from 'Graph_Screen', preserving cross-trait correlation and within-block LD.
Generate_Perturb(b_mat, se_mat, n_vec, rho_mat, DP_mat_list)Generate_Perturb(b_mat, se_mat, n_vec, rho_mat, DP_mat_list)
b_mat |
Numeric matrix of GWAS effect estimates (rows = SNPs, columns = traits). |
se_mat |
Numeric matrix of standard errors matching 'b_mat'. |
n_vec |
Integer vector of GWAS sample sizes (one per trait). |
rho_mat |
'N x N' correlation matrix between GWAS estimates across traits. |
DP_mat_list |
List of LD-block perturbation matrices produced by 'Graph_Screen'. |
A numeric matrix of the same shape as 'b_mat' containing the perturbed effect estimates.
For every pair of traits '(i, j)', runs 'mr_cML_O' in both directions using the screened IV indices, fills in the corresponding off-diagonal entries of the observed-effect graph, and applies network deconvolution to obtain the direct-effect graph.
Graph_Estimate( b_mat, se_mat, n_vec, rho_mat, IJ_snp_list, t, random_start = 10 )Graph_Estimate( b_mat, se_mat, n_vec, rho_mat, IJ_snp_list, t, random_start = 10 )
b_mat |
Numeric matrix of GWAS effect estimates (rows = SNPs, columns = traits). |
se_mat |
Numeric matrix of standard errors matching 'b_mat'. |
n_vec |
Integer vector of GWAS sample sizes (one per trait). |
rho_mat |
'N x N' correlation matrix between GWAS estimates across traits. |
IJ_snp_list |
Per-pair list of screened IV indices, as returned by 'Graph_Screen'. |
t |
Instrument-selection threshold on the z-scale, passed through to 'mr_cML_O'. |
random_start |
Integer; number of random starts per cML fit. |
A list with 'obs_graph', 'obs_graph_se', 'obs_graph_pval' and the network-deconvolved 'dir_graph'.
Top-level routine: screens IVs once via 'Graph_Screen', then repeatedly generates LD-aware perturbed datasets and fits the pairwise causal-effect graph on each one. Perturbation replicates are run in parallel via [pbmcapply::pbmclapply()].
Graph_Perturb( b_mat, se_mat, n_vec, rho_mat, IV_list, R_list, c_vec = rep(1, length(n_vec)), sig.cutoff = 5e-08, num_pert = 100, random_start = 10, seed = 0, trait_vec = NULL, curse = F )Graph_Perturb( b_mat, se_mat, n_vec, rho_mat, IV_list, R_list, c_vec = rep(1, length(n_vec)), sig.cutoff = 5e-08, num_pert = 100, random_start = 10, seed = 0, trait_vec = NULL, curse = F )
b_mat |
Numeric matrix of GWAS effect estimates (rows = SNPs, columns = traits). Row names must be SNP rsids. |
se_mat |
Numeric matrix of standard errors matching 'b_mat'. |
n_vec |
Integer vector of GWAS sample sizes (one per trait). |
rho_mat |
'N x N' correlation matrix between GWAS estimates across traits. |
IV_list |
List of length 'N*(N-1)/2' giving candidate IV rsids for each trait pair. |
R_list |
List of LD blocks (see 'Graph_Screen'). |
c_vec |
Numeric vector of LDSC inflation factors (length 'N'). |
sig.cutoff |
Numeric; GWAS-significance threshold for instrument selection. |
num_pert |
Integer; number of data-perturbation replicates. |
random_start |
Integer; number of random starts per cML fit. |
seed |
Integer; RNG seed. |
trait_vec |
Optional character vector of trait names; defaults to 'colnames(b_mat)'. |
curse |
Logical; if 'TRUE', applies the "curse-of-winner" correction by setting the cML threshold to '-qnorm(sig.cutoff/2)'. |
A list of per-perturbation observed graphs, their standard errors and p-values, the deconvolved direct-effect graphs, and 'trait_vec'.
For each pair of traits, restricts the candidate IV set to SNPs that are GWAS-significant for the exposure (and not more strongly associated with the outcome), and pre-computes the LD-block square-root matrices used to generate joint perturbations of the GWAS estimates. Optionally inflates the standard errors using LDSC-style factors.
Graph_Screen( b_mat, se_mat, n_vec, IV_list, R_list, rho_mat, c_vec = rep(1, length(n_vec)), sig.cutoff = 5e-08 )Graph_Screen( b_mat, se_mat, n_vec, IV_list, R_list, rho_mat, c_vec = rep(1, length(n_vec)), sig.cutoff = 5e-08 )
b_mat |
Numeric matrix of GWAS effect estimates with one column per trait. Row names must be SNP rsids. |
se_mat |
Numeric matrix of standard errors matching 'b_mat'. |
n_vec |
Integer vector of GWAS sample sizes (one per trait). |
IV_list |
List of length 'N*(N-1)/2' giving candidate IV rsids for each trait pair, in order '(1,2), (1,3), ..., (N-1,N)'. |
R_list |
List of LD blocks. Each element has 'R' (an LD correlation matrix with rsid row/column names) and 'snp' (the rsids in the block). |
rho_mat |
'N x N' correlation matrix between GWAS estimates across traits (e.g. from bivariate LDSC). |
c_vec |
Numeric vector of inflation factors (length 'N') applied as ‘sqrt(max(1, c))' to each trait’s standard errors. |
sig.cutoff |
Numeric; GWAS-significance threshold for instrument selection. |
A list with the per-pair screened IV indices ('IJ_snp_list'), the per-block perturbation square-root matrices ('DP_mat_list') and the (possibly inflated) 'b_mat' and 'se_mat'.
Computes the (profile) log-likelihood used by the constrained maximum likelihood (cML) estimator for two-sample Mendelian randomization with overlapping samples.
loglik(b_exp, b_out, se_exp, se_out, b_t, theta_t, r_vec_t, rho, t = 0)loglik(b_exp, b_out, se_exp, se_out, b_t, theta_t, r_vec_t, rho, t = 0)
b_exp |
Numeric vector of SNP-exposure effect estimates. |
b_out |
Numeric vector of SNP-outcome effect estimates. |
se_exp |
Numeric vector of standard errors for 'b_exp'. |
se_out |
Numeric vector of standard errors for 'b_out'. |
b_t |
Numeric vector of true SNP-exposure effects (length matching 'b_exp'). |
theta_t |
Numeric scalar; candidate causal effect. |
r_vec_t |
Numeric vector of invalid-IV pleiotropic effects. |
rho |
Numeric scalar; correlation between 'b_exp' and 'b_out' due to sample overlap. |
t |
Numeric scalar; significance threshold (on the z-scale) used for instrument selection. Defaults to '0'. |
A numeric scalar giving the log-likelihood value.
Runs 'mr_cML_O' and supplements the result with data-perturbation (DP) standard errors and p-values that account for selection of the invalid-IV set.
mr_cML_DP_O( b_exp, b_out, se_exp, se_out, K_vec = 0:(length(b_exp) - 2), random_start = 0, random_start_pert = 0, maxit = 100, num_pert = 100, random_seed = 0, n, rho = 0, c1 = 1, c2 = 1, t = 0 )mr_cML_DP_O( b_exp, b_out, se_exp, se_out, K_vec = 0:(length(b_exp) - 2), random_start = 0, random_start_pert = 0, maxit = 100, num_pert = 100, random_seed = 0, n, rho = 0, c1 = 1, c2 = 1, t = 0 )
b_exp |
Numeric vector of SNP-exposure effect estimates. |
b_out |
Numeric vector of SNP-outcome effect estimates. |
se_exp |
Numeric vector of standard errors for 'b_exp'. |
se_out |
Numeric vector of standard errors for 'b_out'. |
K_vec |
Integer vector; candidate numbers of invalid IVs. |
random_start |
Integer; number of random starts on the original data. |
random_start_pert |
Integer; number of random starts per perturbed dataset. |
maxit |
Integer; maximum coordinate-descent iterations. |
num_pert |
Integer; number of data-perturbation replicates. |
random_seed |
Integer; if non-zero, used to seed the RNG. |
n |
Integer; GWAS sample size used in the BIC penalty. |
rho |
Correlation between 'b_exp' and 'b_out' due to sample overlap. |
c1, c2
|
Numeric inflation factors (>= 1) applied to 'se_exp' and 'se_out' respectively. |
t |
Instrument-selection threshold on the z-scale. |
A list combining the original 'mr_cML_O' output with the DP point estimates, standard errors and p-values for both cML-BIC and cML-MA-BIC.
Runs the cML estimator across a grid of candidate values for the number of invalid IVs and combines the results with BIC and BIC model averaging (cML-BIC and cML-MA-BIC).
mr_cML_O( b_exp, b_out, se_exp, se_out, K_vec = 0:(length(b_exp) - 2), random_start = 0, maxit = 100, random_seed = 0, n, rho = 0, t = 0, var_est = 1 )mr_cML_O( b_exp, b_out, se_exp, se_out, K_vec = 0:(length(b_exp) - 2), random_start = 0, maxit = 100, random_seed = 0, n, rho = 0, t = 0, var_est = 1 )
b_exp |
Numeric vector of SNP-exposure effect estimates. |
b_out |
Numeric vector of SNP-outcome effect estimates. |
se_exp |
Numeric vector of standard errors for 'b_exp'. |
se_out |
Numeric vector of standard errors for 'b_out'. |
K_vec |
Integer vector; candidate numbers of invalid IVs. |
random_start |
Integer; number of random starts per 'K'. |
maxit |
Integer; maximum coordinate-descent iterations. |
random_seed |
Integer; if non-zero, used to seed the RNG. |
n |
Integer; GWAS sample size used in the BIC penalty. |
rho |
Correlation between 'b_exp' and 'b_out' due to sample overlap. |
t |
Instrument-selection threshold on the z-scale. |
var_est |
Integer in '1:4' selecting the variance estimator (see 'cML_estimate_random_O'). |
A list with cML-MA-BIC and cML-BIC point estimates, standard errors and p-values, the indices of the selected invalid IVs, and the negative log-likelihood and BIC sequences over 'K_vec'.
Turns a GraphMRcML mean-effect matrix and matching p-value matrix into a directed 'igraph' graph, colouring edges by sign and significance and suggesting per-edge widths. By default, edges with p-value below '0.05 / Me' are drawn in solid colour; edges with p-value below 'thres1' are drawn in a lighter shade.
plot_graph(G_mean, G_pval, Me, thres1 = 0.05, Bonferroni = FALSE)plot_graph(G_mean, G_pval, Me, thres1 = 0.05, Bonferroni = FALSE)
G_mean |
Numeric matrix of mean effects, e.g. 'obs_graph_mean' or 'dir_graph_mean' from 'subset_Graph_d1'. |
G_pval |
Numeric matrix of p-values matching 'G_mean'. |
Me |
Integer; effective number of tests from 'subset_Graph_d1'. Used to set the primary significance threshold '0.05 / Me' when 'Bonferroni = FALSE'. |
thres1 |
Numeric; secondary p-value threshold for light-coloured edges. Set to '0' to suppress. |
Bonferroni |
Logical; if 'TRUE' (or 'Me' is 'NULL'), use the Bonferroni threshold '0.05 / (N^2 - N)' instead. |
A list with 'p' (an 'igraph' graph) and 'edge_width' (a numeric vector of suggested edge widths).
Subsets the per-perturbation graphs from 'Graph_Perturb' to a chosen set of traits, iteratively re-runs network deconvolution to enforce zero diagonals on the direct-effect graph, drops replicates whose deconvolved graph has a spectral radius above one, and returns means, standard deviations and p-values for both the observed-effect and direct-effect graphs together with an effective number of tests 'Me'.
subset_Graph_d1( dp_list, keep_trait, B = 2000, check = TRUE, show = TRUE, maxit = 10000 )subset_Graph_d1( dp_list, keep_trait, B = 2000, check = TRUE, show = TRUE, maxit = 10000 )
dp_list |
A list as returned by 'Graph_Perturb'. |
keep_trait |
Character vector of trait names to retain. |
B |
Integer; number of perturbation replicates to sample (without replacement) from 'dp_list'. |
check |
Logical; if 'TRUE', drop replicates whose direct-effect graph has a spectral radius greater than one. |
show |
Logical; if 'FALSE' and every replicate fails the spectral check, the direct-effect summaries are returned as 'NULL'. |
maxit |
Integer; maximum iterations of the diagonal-zeroing loop in network deconvolution. |
A list with 'dp_res' (summaries), 'dp_list' (filtered per-replicate graphs), 'warning', 'warn_len', 'conv_len' and 'total_B'.