Title: | Two Sample MR Functions and Interface to MR Base Database |
---|---|
Description: | A package for performing Mendelian randomization using GWAS summary data. It uses the IEU GWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis. You can use the MR-Base web app <https://www.mrbase.org/> to try out a limited range of the functionality in this package, but for any serious work we strongly recommend using this R package. |
Authors: | Gibran Hemani [aut, cre] , Philip Haycock [aut] , Jie Zheng [aut] , Tom Gaunt [aut] , Ben Elsworth [aut] , Tom Palmer [aut] |
Maintainer: | Gibran Hemani <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.8 |
Built: | 2024-10-26 06:14:22 UTC |
Source: | https://github.com/MRCIEU/TwoSampleMR |
Previously the meta data was returned alongside association information. This is mostly unnecessary as it is needlessly repeating information. This is a convenience function that reinstates that information. Can be applied to either exposure data, outcome data, or harmonised data
add_metadata(dat, cols = c("sample_size", "ncase", "ncontrol", "unit", "sd"))
add_metadata(dat, cols = c("sample_size", "ncase", "ncontrol", "unit", "sd"))
dat |
Either exposure data, outcome data or harmonised data |
cols |
Which metadata fields to add. Default = |
Data frame
Can be applied to exposure_dat, outcome_dat or harmonised_data.
Note that it will be beneficial in some circumstances to add the meta data to
the data object using add_metadata()
before running this function.
Also adds effective sample size for case control data.
add_rsq(dat)
add_rsq(dat)
dat |
exposure_dat, outcome_dat or harmonised_data |
data frame
Estimate allele frequency from SNP
allele_frequency(g)
allele_frequency(g)
g |
Vector of 0/1/2 |
Allele frequency
Get list of studies with available GWAS summary statistics through API
available_outcomes(opengwas_jwt = ieugwasr::get_opengwas_jwt())
available_outcomes(opengwas_jwt = ieugwasr::get_opengwas_jwt())
opengwas_jwt |
Used to authenticate protected endpoints. Login to https://api.opengwas.io to obtain a jwt. Provide the jwt string here, or store in .Renviron under the keyname OPENGWAS_JWT. |
Dataframe of details for all available studies
Uses PLINK clumping method, where SNPs in LD within a particular window will be pruned. The SNP with the lowest p-value is retained.
clump_data( dat, clump_kb = 10000, clump_r2 = 0.001, clump_p1 = 1, clump_p2 = 1, pop = "EUR", bfile = NULL, plink_bin = NULL )
clump_data( dat, clump_kb = 10000, clump_r2 = 0.001, clump_p1 = 1, clump_p2 = 1, pop = "EUR", bfile = NULL, plink_bin = NULL )
dat |
Output from |
clump_kb |
Clumping window, default is |
clump_r2 |
Clumping r2 cutoff. Note that this default value has recently changed from |
clump_p1 |
Clumping sig level for index SNPs, default is |
clump_p2 |
Clumping sig level for secondary SNPs, default is |
pop |
Super-population to use as reference panel. Default = |
bfile |
If this is provided then will use the API. Default = |
plink_bin |
If |
This function interacts with the OpenGWAS API, which houses LD reference panels for the 5 super-populations in the 1000 genomes reference panel. It includes only bi-allelic SNPs with MAF > 0.01, so it's quite possible that a variant you want to include in the clumping process will be absent. If it is absent, it will be automatically excluded from the results.
You can check if your variants are present in the LD reference panel using ieugwasr::ld_reflookup()
.
This function does put load on the OpenGWAS servers, which makes life more difficult for other users.
We have implemented a method and made available the LD reference panels to perform clumping locally, see ieugwasr::ld_clump()
and related vignettes for details.
Data frame
This function combines results of mr()
, mr_heterogeneity()
, mr_pleiotropy_test()
and mr_singlesnp()
into a single data frame.
It also merges the results with outcome study level characteristics in available_outcomes()
.
If desired it also exponentiates results (e.g. if the user wants log odds ratio converted into odds ratios with 95 percent confidence intervals).
The exposure and outcome columns from the output from mr()
contain both the trait names and trait ids.
The combine_all_mrresults()
function splits these into separate columns by default.
combine_all_mrresults( res, het, plt, sin, ao_slc = TRUE, Exp = FALSE, split.exposure = FALSE, split.outcome = FALSE )
combine_all_mrresults( res, het, plt, sin, ao_slc = TRUE, Exp = FALSE, split.exposure = FALSE, split.outcome = FALSE )
res |
Results from |
het |
Results from |
plt |
Results from |
sin |
Results from |
ao_slc |
Logical; if set to |
Exp |
Logical; if set to |
split.exposure |
Logical; if set to |
split.outcome |
Logical; if set to |
data frame
Taking exposure or outcome data (returned from format_data()
)
combine multiple datasets together so they can be analysed in one
batch. Removes duplicate SNPs, preferentially keeping those usable
in MR analysis.
combine_data(x)
combine_data(x)
x |
List of data frames returned from |
data frame
Columns are the case and control frequencies. Rows are the frequencies for allele 1 and allele 2.
contingency(af, prop, odds_ratio, eps = 1e-15)
contingency(af, prop, odds_ratio, eps = 1e-15)
af |
Allele frequency of effect allele. |
prop |
Proportion of cases. |
odds_ratio |
Odds ratio. |
eps |
tolerance, default is |
2x2 contingency table as matrix
Helper function to convert results from extract_outcome_data()
to exposure_dat
format.
convert_outcome_to_exposure(outcome_dat)
convert_outcome_to_exposure(outcome_dat)
outcome_dat |
Output from |
data frame
The MendelianRandomization package offers MR methods that can be used with the same data used in the TwoSampleMR package. This function converts from the TwoSampleMR format to the MRInput class.
dat_to_MRInput(dat, get_correlations = FALSE, pop = "EUR")
dat_to_MRInput(dat, get_correlations = FALSE, pop = "EUR")
dat |
Output from the |
get_correlations |
Default |
pop |
If |
List of MRInput objects for each exposure/outcome combination
Creates a list of RadialMR format datasets for each exposure-outcome pair.
dat_to_RadialMR(dat)
dat_to_RadialMR(dat)
dat |
Output from |
List of RadialMR format datasets
The default is list(test_dist = "z", nboot = 1000, Cov = 0, penk = 20, phi = 1, alpha = 0.05, Qthresh = 0.05, over.dispersion = TRUE, loss.function = "huber")
.
default_parameters()
default_parameters()
A statistical test for whether the assumption that exposure causes outcome is valid.
directionality_test(dat)
directionality_test(dat)
dat |
Harmonised exposure and outcome data. Output from |
List
Taken from https://www.nature.com/articles/nprot.2014.071
effective_n(ncase, ncontrol)
effective_n(ncase, ncontrol)
ncase |
Vector of number of cases |
ncontrol |
Vector of number of controls |
Vector of effective sample size
Perform enrichment analysis
enrichment(dat, method_list = enrichment_method_list()$obj)
enrichment(dat, method_list = enrichment_method_list()$obj)
dat |
Harmonised exposure and outcome data. Output from |
method_list |
List of methods to use in analysis. Default is |
data frame
Get list of available p-value enrichment methods
enrichment_method_list()
enrichment_method_list()
Data frame
Assumes that sample size and allele frequency is correct for each SNP, and that allele frequency gives a reasonable estimate of the variance of the SNP.
estimate_trait_sd(b, se, n, p)
estimate_trait_sd(b, se, n, p)
b |
vector of effect sizes. |
se |
vector of standard errors. |
n |
vector of sample sizes. |
p |
vector of allele frequencies. |
Vector of sd estimates for each association.
This function searches for GWAS significant SNPs (for a given p-value) for a specified set of outcomes. It then performs LD based clumping to return only independent significant associations.
extract_instruments( outcomes, p1 = 5e-08, clump = TRUE, p2 = 5e-08, r2 = 0.001, kb = 10000, opengwas_jwt = ieugwasr::get_opengwas_jwt(), force_server = FALSE )
extract_instruments( outcomes, p1 = 5e-08, clump = TRUE, p2 = 5e-08, r2 = 0.001, kb = 10000, opengwas_jwt = ieugwasr::get_opengwas_jwt(), force_server = FALSE )
outcomes |
Array of outcome IDs (see |
p1 |
Significance threshold. The default is |
clump |
Logical; whether to clump results. The default is |
p2 |
Secondary clumping threshold. The default is |
r2 |
Clumping r2 cut off. The default is |
kb |
Clumping distance cutoff. The default is |
opengwas_jwt |
Used to authenticate protected endpoints. Login to https://api.opengwas.io to obtain a jwt. Provide the jwt string here, or store in .Renviron under the keyname OPENGWAS_JWT. |
force_server |
Force the analysis to extract results from the server rather than the MRInstruments package. |
data frame
read_exposure_data()
and all the SNPs therein will be queried against the requested outcomes in remote database using API.Supply the output from read_exposure_data()
and all the SNPs therein will be queried against the requested outcomes in remote database using API.
extract_outcome_data( snps, outcomes, proxies = TRUE, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3, opengwas_jwt = ieugwasr::get_opengwas_jwt(), splitsize = 10000, proxy_splitsize = 500 )
extract_outcome_data( snps, outcomes, proxies = TRUE, rsq = 0.8, align_alleles = 1, palindromes = 1, maf_threshold = 0.3, opengwas_jwt = ieugwasr::get_opengwas_jwt(), splitsize = 10000, proxy_splitsize = 500 )
snps |
Array of SNP rs IDs. |
outcomes |
Array of IDs (see |
proxies |
Look for LD tags? Default is |
rsq |
Minimum LD rsq value (if proxies = 1). Default = |
align_alleles |
Try to align tag alleles to target alleles (if proxies = 1). |
palindromes |
Allow palindromic SNPs (if proxies = 1). |
maf_threshold |
MAF threshold to try to infer palindromic SNPs. The default is |
opengwas_jwt |
Used to authenticate protected endpoints. Login to https://api.opengwas.io to obtain a jwt. Provide the jwt string here, or store in .Renviron under the keyname OPENGWAS_JWT. |
splitsize |
The default is |
proxy_splitsize |
The default is |
Dataframe of summary statistics for all available outcomes
Fisher's combined test
fishers_combined_test(pval)
fishers_combined_test(pval)
pval |
Vector of outcome p-values |
List with the following elements:
MR estimate
Standard error
p-value
Perform MR of multiple exposures and multiple outcomes. This plots the results.
forest_plot( mr_res, exponentiate = FALSE, single_snp_method = "Wald ratio", multi_snp_method = "Inverse variance weighted", group_single_categories = TRUE, by_category = TRUE, in_columns = FALSE, threshold = NULL, xlab = "", xlim = NULL, trans = "identity", ao_slc = TRUE, priority = "Cardiometabolic" )
forest_plot( mr_res, exponentiate = FALSE, single_snp_method = "Wald ratio", multi_snp_method = "Inverse variance weighted", group_single_categories = TRUE, by_category = TRUE, in_columns = FALSE, threshold = NULL, xlab = "", xlim = NULL, trans = "identity", ao_slc = TRUE, priority = "Cardiometabolic" )
mr_res |
Results from |
exponentiate |
Convert effects to OR? Default is |
single_snp_method |
Which of the single SNP methosd to use when only 1 SNP was used to estimate the causal effect? The default is |
multi_snp_method |
Which of the multi-SNP methods to use when there was more than 1 SNPs used to estimate the causal effect? The default is |
group_single_categories |
If there are categories with only one outcome, group them together into an "Other" group. The default is |
by_category |
Separate the results into sections by category? The default is |
in_columns |
Separate the exposures into different columns. The default is |
threshold |
p-value threshold to use for colouring points by significance level. If |
xlab |
x-axis label. If |
xlim |
limit x-axis range. Provide vector of length 2, with lower and upper bounds. The default is |
trans |
Transformation to apply to x-axis. e.g. |
ao_slc |
retrieve sample size and subcategory from |
priority |
Name of category to prioritise at the top of the forest plot. The default is |
grid plot object
Plot results from an analysis of multiple exposures against a single outcome or a single exposure against multiple outcomes.
Plots effect estimates and 95 percent confidence intervals.
The ordering of results in the plot is determined by the order supplied by the user.
Users may find sort_1_to_many()
helpful for sorting their results prior to using the 1-to-many forest plot. The plot function works best for 50 results and is not designed to handle more than 100 results.
forest_plot_1_to_many( mr_res = "mr_res", b = "b", se = "se", TraitM = "outcome", col1_width = 1, col1_title = "", exponentiate = FALSE, trans = "identity", ao_slc = TRUE, lo = NULL, up = NULL, by = NULL, xlab = "Effect (95% confidence interval)", addcols = NULL, addcol_widths = NULL, addcol_titles = "", subheading_size = 6, shape_points = 15, colour_scheme = "black", col_text_size = 5, weight = NULL )
forest_plot_1_to_many( mr_res = "mr_res", b = "b", se = "se", TraitM = "outcome", col1_width = 1, col1_title = "", exponentiate = FALSE, trans = "identity", ao_slc = TRUE, lo = NULL, up = NULL, by = NULL, xlab = "Effect (95% confidence interval)", addcols = NULL, addcol_widths = NULL, addcol_titles = "", subheading_size = 6, shape_points = 15, colour_scheme = "black", col_text_size = 5, weight = NULL )
mr_res |
Data frame of results supplied by the user. The default is |
b |
Name of the column specifying the effect of the exposure on the outcome. The default is |
se |
Name of the column specifying the standard error for b. The default is |
TraitM |
The column specifying the names of the traits. Corresponds to 'many' in the 1-to-many forest plot. The default is |
col1_width |
Width of Y axis label for the column specified by the TraitM argument. The default is |
col1_title |
Title for the column specified by the TraitM argument. The default is |
exponentiate |
Convert log odds ratios to odds ratios? Default is |
trans |
Specify x-axis scale. e.g. "identity", "log2", etc. If set to "identity" an additive scale is used. If set to log2 the x-axis is plotted on a multiplicative / doubling scale (preferable when plotting odds ratios). Default is |
ao_slc |
Logical; retrieve trait subcategory information using available_outcomes(). Default is |
lo |
Lower limit of X axis to plot. |
up |
upper limit of X axis to plot. |
by |
Name of the grouping variable to stratify results on. Default is |
xlab |
X-axis label, default is |
addcols |
Name of additional columns to plot. Character vector. The default is |
addcol_widths |
Widths of Y axis labels for additional columns specified by the addcols argument. Numeric vector. The default is |
addcol_titles |
Titles of additional columns specified by the addcols argument. Character vector. The default is |
subheading_size |
text size for the subheadings specified in by argument. The default is |
shape_points |
the shape of the data points to pass to |
colour_scheme |
the general colour scheme for the plot. Default is to make all text and data points |
col_text_size |
The default is |
weight |
The default is |
grid plot object
This function is used to create a basic forest plot.
It requires the output from format_1_to_many()
.
forest_plot_basic2( dat, section = NULL, colour_group = NULL, colour_group_first = TRUE, xlab = NULL, bottom = TRUE, trans = "identity", xlim = NULL, lo = lo, up = up, subheading_size = subheading_size, colour_scheme = "black", shape_points = 15 )
forest_plot_basic2( dat, section = NULL, colour_group = NULL, colour_group_first = TRUE, xlab = NULL, bottom = TRUE, trans = "identity", xlim = NULL, lo = lo, up = up, subheading_size = subheading_size, colour_scheme = "black", shape_points = 15 )
dat |
Output from |
section |
Which category in dat to plot. If |
colour_group |
Which exposure to plot. If |
colour_group_first |
The default is |
xlab |
x-axis label. Default= |
bottom |
Show x-axis? Default= |
trans |
x-axis scale. |
xlim |
x-axis limits. |
lo |
Lower limit of x axis. |
up |
Upper limit of x axis. |
subheading_size |
text size for the subheadings. The subheadings correspond to the values of the section argument. |
colour_scheme |
the general colour scheme for the plot. Default is to make all text and data points |
shape_points |
the shape of the data points to pass to |
ggplot object
This function formats user-supplied results for the forest_plot_1_to_many()
function.
The user supplies their results in the form of a data frame.
The data frame is assumed to contain at least three columns of data:
effect estimates, from an analysis of the effect of an exposure on an outcome;
standard errors for the effect estimates; and
a column of trait names, corresponding to the 'many' in a 1-to-many forest plot.
format_1_to_many( mr_res, b = "b", se = "se", exponentiate = FALSE, ao_slc = FALSE, by = NULL, TraitM = "outcome", addcols = NULL, weight = NULL )
format_1_to_many( mr_res, b = "b", se = "se", exponentiate = FALSE, ao_slc = FALSE, by = NULL, TraitM = "outcome", addcols = NULL, weight = NULL )
mr_res |
Data frame of results supplied by the user. |
b |
Name of the column specifying the effect of the exposure on the outcome. Default = |
se |
Name of the column specifying the standard error for b. Default = |
exponentiate |
Convert log odds ratios to odds ratios? Default= |
ao_slc |
Logical; retrieve trait subcategory information using |
by |
Name of the column indicating a grouping variable to stratify results on. Default= |
TraitM |
The column specifying the names of the traits. Corresponds to 'many' in the 1-to-many forest plot. Default= |
addcols |
Name of any additional columns to add to the plot. Character vector. The default is |
weight |
The default is |
data frame.
See format_data()
.
format_aries_mqtl(aries_mqtl_subset, type = "exposure")
format_aries_mqtl(aries_mqtl_subset, type = "exposure")
aries_mqtl_subset |
Selected rows from |
type |
Are these data used as |
Data frame
Reads in exposure data. Checks and organises columns for use with MR or enrichment tests. Infers p-values when possible from beta and se.
format_data( dat, type = "exposure", snps = NULL, header = TRUE, phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, z_col = "z", info_col = "info", chr_col = "chr", pos_col = "pos", log_pval = FALSE )
format_data( dat, type = "exposure", snps = NULL, header = TRUE, phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, z_col = "z", info_col = "info", chr_col = "chr", pos_col = "pos", log_pval = FALSE )
dat |
Data frame. Must have header with at least SNP column present. |
type |
Is this the exposure or the outcome data that is being read in? The default is |
snps |
SNPs to extract. If NULL then doesn't extract any and keeps all. The default is |
header |
The default is |
phenotype_col |
Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value |
snp_col |
Required name of column with SNP rs IDs. The default is |
beta_col |
Required for MR. Name of column with effect sizes. The default is |
se_col |
Required for MR. Name of column with standard errors. The default is |
eaf_col |
Required for MR. Name of column with effect allele frequency. The default is |
effect_allele_col |
Required for MR. Name of column with effect allele. Must contain only the characters "A", "C", "T" or "G". The default is |
other_allele_col |
Required for MR. Name of column with non effect allele. Must contain only the characters "A", "C", "T" or "G". The default is |
pval_col |
Required for enrichment tests. Name of column with p-value. The default is |
units_col |
Optional column name for units. The default is |
ncase_col |
Optional column name for number of cases. The default is |
ncontrol_col |
Optional column name for number of controls. The default is |
samplesize_col |
Optional column name for sample size. The default is |
gene_col |
Optional column name for gene name. The default is |
id_col |
The default is |
min_pval |
Minimum allowed p-value. The default is |
z_col |
The default is |
info_col |
The default is |
chr_col |
The default is |
pos_col |
The default is |
log_pval |
The pval is -log10(P). The default is |
data frame
See format_data()
.
format_gtex_eqtl(gtex_eqtl_subset, type = "exposure")
format_gtex_eqtl(gtex_eqtl_subset, type = "exposure")
gtex_eqtl_subset |
Selected rows from |
type |
Are these data used as |
Data frame
DEPRECATED. Please use format_data()
instead.
format_gwas_catalog(gwas_catalog_subset, type = "exposure")
format_gwas_catalog(gwas_catalog_subset, type = "exposure")
gwas_catalog_subset |
The GWAS catalog subset. |
type |
The default is |
Data frame
## Not run: require(MRInstruments) data(gwas_catalog) bmi <- subset(gwas_catalog, Phenotype=="Body mass index" & Year==2010 & grepl("kg", Units)) bmi <- format_data(bmi) ## End(Not run)
## Not run: require(MRInstruments) data(gwas_catalog) bmi <- subset(gwas_catalog, Phenotype=="Body mass index" & Year==2010 & grepl("kg", Units)) bmi <- format_data(bmi) ## End(Not run)
See format_data()
.
format_metab_qtls(metab_qtls_subset, type = "exposure")
format_metab_qtls(metab_qtls_subset, type = "exposure")
metab_qtls_subset |
Selected rows from |
type |
Are these data used as |
Data frame
This function takes the results from mr()
and is particularly useful
if the MR has been applied using multiple exposures and multiple outcomes.
It creates a new data frame with the following:
Variables: exposure, outcome, category, outcome sample size, effect, upper ci, lower ci, pval, nsnp
only one estimate for each exposure-outcome
exponentiated effects if required
format_mr_results( mr_res, exponentiate = FALSE, single_snp_method = "Wald ratio", multi_snp_method = "Inverse variance weighted", ao_slc = TRUE, priority = "Cardiometabolic" )
format_mr_results( mr_res, exponentiate = FALSE, single_snp_method = "Wald ratio", multi_snp_method = "Inverse variance weighted", ao_slc = TRUE, priority = "Cardiometabolic" )
mr_res |
Results from |
exponentiate |
Convert effects to OR? The default is |
single_snp_method |
Which of the single SNP methods to use when only 1 SNP was used to estimate the causal effect? The default is |
multi_snp_method |
Which of the multi-SNP methods to use when there was more than 1 SNPs used to estimate the causal effect? The default is |
ao_slc |
Logical; retrieve sample size and subcategory using |
priority |
Name of category to prioritise at the top of the forest plot. The default is |
By default it uses the available_outcomes()
function to retrieve the study level characteristics for the outcome trait,
including sample size and outcome category.
This assumes the MR analysis was performed using outcome GWAS(s) contained in MR-Base.
If ao_slc
is set to TRUE
then the user must supply their own study level characteristics.
This is useful when the user has supplied their own outcome GWAS results (i.e. they are not in MR-Base).
data frame.
See format_data()
.
format_proteomic_qtls(proteomic_qtls_subset, type = "exposure")
format_proteomic_qtls(proteomic_qtls_subset, type = "exposure")
proteomic_qtls_subset |
Selected rows from |
type |
Are these data used as |
Data frame
This function takes b and se from mr()
and generates odds ratios and 95 percent confidence intervals.
generate_odds_ratios(mr_res)
generate_odds_ratios(mr_res)
mr_res |
Results from |
data frame
Calculate p-value from R-squared and sample size
get_p_from_r2n(r2, n)
get_p_from_r2n(r2, n)
r2 |
Rsq |
n |
Sample size |
P-value
Estimate the allele frequency in population from case/control summary data
get_population_allele_frequency(af, prop, odds_ratio, prevalence)
get_population_allele_frequency(af, prop, odds_ratio, prevalence)
af |
Effect allele frequency (or MAF) |
prop |
Proportion of samples that are cases |
odds_ratio |
Odds ratio |
prevalence |
Population disease prevalence |
Population allele frequency
Estimate R-squared from beta, standard error and sample size
get_r_from_bsen(b, se, n)
get_r_from_bsen(b, se, n)
b |
Array of effect sizes |
se |
Array of standard errors |
n |
Array of (effective) sample sizes |
Vector of signed r values
This uses equation 10 in Lee et al. A Better Coefficient of Determination for Genetic Profile Analysis. Genetic Epidemiology 36: 214–224 (2012) doi:10.1002/gepi.21614.
get_r_from_lor( lor, af, ncase, ncontrol, prevalence, model = "logit", correction = FALSE )
get_r_from_lor( lor, af, ncase, ncontrol, prevalence, model = "logit", correction = FALSE )
lor |
Vector of Log odds ratio. |
af |
Vector of allele frequencies. |
ncase |
Vector of Number of cases. |
ncontrol |
Vector of Number of controls. |
prevalence |
Vector of Disease prevalence in the population. |
model |
Is the effect size estimated from the |
correction |
Scale the estimated r by correction value. The default is |
Vector of signed r values
This method is an approximation, and may be numerically unstable.
Ideally you should estimate r directly from independent replication samples.
Use get_r_from_lor()
for binary traits.
get_r_from_pn(p, n)
get_r_from_pn(p, n)
p |
Array of pvals |
n |
Array of sample sizes |
Vector of r values (all arbitrarily positive)
Get SE from effect size and p-value
get_se(eff, pval)
get_se(eff, pval)
eff |
effect size |
pval |
p-values |
array
In order to perform MR the effect of a SNP on an outcome and exposure must be harmonised to be relative to the same allele.
harmonise_data(exposure_dat, outcome_dat, action = 2)
harmonise_data(exposure_dat, outcome_dat, action = 2)
exposure_dat |
Output from |
outcome_dat |
Output from |
action |
Level of strictness in dealing with SNPs.
|
Expects data in the format generated by read_exposure_data()
and extract_outcome_data()
.
This means the inputs must be dataframes with the following columns:
outcome_dat
:
SNP
beta.outcome
se.outcome
effect_allele.outcome
other_allele.outcome
eaf.outcome
outcome
exposure_dat
:
SNP
beta.exposure
se.exposure
effect_allele.exposure
other_allele.exposure
eaf.exposure
The function tries to harmonise INDELs. If they are coded as sequence strings things work more smoothly. If they are coded as D/I in one dataset it will try to convert them to sequences if the other dataset has adequate information. If coded as D/I in one dataset and as a variant with equal length INDEL alleles in the other, the variant is dropped. If one or both the datasets only has one allele (i.e. the effect allele) then harmonisation is naturally going to be more ambiguous and more variants will be dropped.
Data frame with harmonised effects and alleles
LD matrix returns with rsid_ea_oa identifiers. Make sure that they are oriented to the same effect allele as the summary dataset. Summary dataset can be exposure dataset or harmonised dartaset.
harmonise_ld_dat(x, ld)
harmonise_ld_dat(x, ld)
x |
Exposure dataset or harmonised dataset |
ld |
Output from |
List of exposure dataset and harmonised LD matrix
This function calculates the statistic.
To use it for the
metric ensure that the effects are all the same sign (e.g.
abs(y)
).
Isq(y, s)
Isq(y, s)
y |
Vector of effects. |
s |
Vector of standard errors. |
Isq value
This function takes a list of SNPs and searches for them in a specified super-population in the 1000 Genomes phase 3 reference panel. It then creates an LD matrix of r values (signed, and not squared). All LD values are with respect to the major alleles in the 1000G dataset. You can specify whether the allele names are displayed.
ld_matrix(snps, with_alleles = TRUE, pop = "EUR")
ld_matrix(snps, with_alleles = TRUE, pop = "EUR")
snps |
List of SNPs. |
with_alleles |
Whether to append the allele names to the SNP names. The default is |
pop |
Super-population to use as reference panel. Default = |
The data used for generating the LD matrix includes only bi-allelic SNPs with MAF > 0.01, so it's quite possible that a variant you want to include will be absent. If it is absent, it will be automatically excluded from the results.
You can check if your variants are present in the LD reference panel using ieugwasr::ld_reflookup()
.
This function does put load on the OpenGWAS servers, which makes life more difficult for other users,
and has been limited to analyse only up to 500 variants at a time.
We have implemented a method and made available the LD reference panels to perform the operation locally,
see ieugwasr::ld_matrix()
and related vignettes for details.
Matrix of LD r values
Imported here to help estimate sample overlap between studies
ldsc_h2(id, ancestry = "infer", snpinfo = NULL, splitsize = 20000)
ldsc_h2(id, ancestry = "infer", snpinfo = NULL, splitsize = 20000)
id |
ID to analyse |
ancestry |
ancestry of traits 1 and 2 (AFR, AMR, EAS, EUR, SAS) or 'infer' (default) in which case it will try to guess based on allele frequencies |
snpinfo |
Output from |
splitsize |
How many SNPs to extract at one time. Default= |
model fit
Bulik-Sullivan,B.K. et al. (2015) An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241.
Guo,B. and Wu,B. (2018) Principal component based adaptive association test of multiple traits using GWAS summary statistics. bioRxiv 269597; doi: 10.1101/269597
Gua,B. and Wu,B. (2019) Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach. Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.
https://github.com/baolinwu/MTAR
Imported here to help estimate sample overlap between studies
ldsc_rg(id1, id2, ancestry = "infer", snpinfo = NULL, splitsize = 20000)
ldsc_rg(id1, id2, ancestry = "infer", snpinfo = NULL, splitsize = 20000)
id1 |
ID 1 to analyse |
id2 |
ID 2 to analyse |
ancestry |
ancestry of traits 1 and 2 (AFR, AMR, EAS, EUR, SAS) or 'infer' (default) in which case it will try to guess based on allele frequencies |
snpinfo |
Output from |
splitsize |
How many SNPs to extract at one time. Default= |
model fit
Convenient function to create a harmonised dataset.
make_dat( exposures = c("ieu-a-2", "ieu-a-301"), outcomes = c("ieu-a-7", "ieu-a-1001"), proxies = TRUE )
make_dat( exposures = c("ieu-a-2", "ieu-a-301"), outcomes = c("ieu-a-7", "ieu-a-1001"), proxies = TRUE )
exposures |
The default is |
outcomes |
The default is |
proxies |
Look for proxies? Default = |
Harmonised data frame
Perform all Mendelian randomization tests
mr( dat, parameters = default_parameters(), method_list = subset(mr_method_list(), use_by_default)$obj )
mr( dat, parameters = default_parameters(), method_list = subset(mr_method_list(), use_by_default)$obj )
dat |
Harmonised exposure and outcome data. Output from |
parameters |
Parameters to be used for various MR methods. Default is output from |
method_list |
List of methods to use in analysis. See |
List with the following elements:
Table of MR results
Table of extra results
Density plot
mr_density_plot( singlesnp_results, mr_results, exponentiate = FALSE, bandwidth = "nrd0" )
mr_density_plot( singlesnp_results, mr_results, exponentiate = FALSE, bandwidth = "nrd0" )
singlesnp_results |
from |
mr_results |
Results from |
exponentiate |
Plot on exponentiated scale. The default is |
bandwidth |
Density bandwidth parameter. |
List of plots
Egger's regression for Mendelian randomization
mr_egger_regression(b_exp, b_out, se_exp, se_out, parameters)
mr_egger_regression(b_exp, b_out, se_exp, se_out, parameters)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List of with the following elements:
MR estimate
Standard error of MR estimate
p-value of MR estimate
Estimate of horizontal pleiotropy (intercept)
Standard error of intercept
p-value of intercept
Heterogeneity stats
Summary of regression
Original data used for MR Egger regression
Run bootstrap to generate standard errors for MR
mr_egger_regression_bootstrap(b_exp, b_out, se_exp, se_out, parameters)
mr_egger_regression_bootstrap(b_exp, b_out, se_exp, se_out, parameters)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. Specifically, the |
List of with the following elements:
MR estimate
Standard error of MR estimate
p-value of MR estimate
Estimate of horizontal pleiotropy (intercept)
Standard error of intercept
p-value of intercept
Summary of regression
Original data used for MR Egger regression
Forest plot
mr_forest_plot(singlesnp_results, exponentiate = FALSE)
mr_forest_plot(singlesnp_results, exponentiate = FALSE)
singlesnp_results |
from |
exponentiate |
Plot on exponential scale. The default is |
List of plots
Create funnel plot from single SNP analyses.
mr_funnel_plot(singlesnp_results)
mr_funnel_plot(singlesnp_results)
singlesnp_results |
from |
List of plots
Get heterogeneity statistics.
mr_heterogeneity( dat, parameters = default_parameters(), method_list = subset(mr_method_list(), heterogeneity_test & use_by_default)$obj )
mr_heterogeneity( dat, parameters = default_parameters(), method_list = subset(mr_method_list(), heterogeneity_test & use_by_default)$obj )
dat |
Harmonised exposure and outcome data. Output from |
parameters |
Parameters to be used for various MR methods. Default is output from |
method_list |
List of methods to use in analysis. See |
Data frame
The default multiplicative random effects IVW estimate.
The standard error is corrected for under dispersion
Use the mr_ivw_mre()
function for estimates that don't correct for under dispersion.
mr_ivw(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
mr_ivw(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
MR estimate
Standard error
p-value
Heterogeneity stats
Inverse variance weighted regression (fixed effects)
mr_ivw_fe(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
mr_ivw_fe(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
MR estimate
Standard error
p-value
Heterogeneity stats
Same as mr_ivw()
but no correction for under dispersion.
mr_ivw_mre(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
mr_ivw_mre(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
MR estimate
Standard error
p-value
Heterogeneity stats
Radial IVW analysis
mr_ivw_radial(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
mr_ivw_radial(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
causal effect estimate
standard error
p-value
Leave one out sensitivity analysis
mr_leaveoneout(dat, parameters = default_parameters(), method = mr_ivw)
mr_leaveoneout(dat, parameters = default_parameters(), method = mr_ivw)
dat |
Output from |
parameters |
List of parameters. |
method |
Choose which method to use. The default is |
List of data frames
Plot results from leaveoneout analysis.
mr_leaveoneout_plot(leaveoneout_results)
mr_leaveoneout_plot(leaveoneout_results)
leaveoneout_results |
Output from |
List of plots
MR median estimators
mr_median(dat, parameters = default_parameters())
mr_median(dat, parameters = default_parameters())
dat |
Output from |
parameters |
List of parameters. The default is |
data frame
Perform 2 sample IV using fixed effects meta analysis and delta method for standard errors
mr_meta_fixed(b_exp, b_out, se_exp, se_out, parameters)
mr_meta_fixed(b_exp, b_out, se_exp, se_out, parameters)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
causal effect estimate
standard error
p-value
Heterogeneity stats
Perform 2 sample IV using simple standard error
mr_meta_fixed_simple(b_exp, b_out, se_exp, se_out, parameters)
mr_meta_fixed_simple(b_exp, b_out, se_exp, se_out, parameters)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
causal effect estimate
standard error
p-value
Perform 2 sample IV using random effects meta analysis and delta method for standard errors
mr_meta_random(b_exp, b_out, se_exp, se_out, parameters)
mr_meta_random(b_exp, b_out, se_exp, se_out, parameters)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
causal effect estimate
standard error
p-value
Heterogeneity stats
Get list of available MR methods
mr_method_list()
mr_method_list()
character vector of method names
Perform simple, weighted, penalised modes, as well as versions that use the NOME assumption.
mr_mode(dat, parameters = default_parameters(), mode_method = "all")
mr_mode(dat, parameters = default_parameters(), mode_method = "all")
dat |
Output from |
parameters |
List of parameters. The default is |
mode_method |
The default is |
data frame
Based on the method described here https://www.biorxiv.org/content/10.1101/173682v2. Once all MR methods have been applied to a summary set, you can then use the mixture of experts to predict the method most likely to be the most accurate.
mr_moe(res, rf)
mr_moe(res, rf)
res |
Output from |
rf |
The trained random forest for the methods. This is available to download at https://www.dropbox.com/s/5la7y38od95swcf/rf.rdata?dl=0. |
The mr_moe()
function modifies the estimates
item in the list of results from the mr_wrapper()
function. It does three things:
Adds the MOE column, which is a predictor for each method for how well it performs in terms of high power and low type 1 error (scaled 0-1, where 1 is best performance).
It renames the methods to be the estimating method + the instrument selection method. There are 4 instrument selection methods: Tophits (i.e. no filtering), directional filtering (DF, an unthresholded version of Steiger filtering), heterogeneity filtering (HF, removing instruments that make substantial (p < 0.05) contributions to Cochran's Q statistic), and DF + HF which is where DF is applied and the HF applied on top of that.
It orders the table to be in order of best performing method.
Note that the mixture of experts has only been trained on datasets with at least 5 SNPs. If your dataset has fewer than 5 SNPs this function might return errors.
List
## Not run: # Example of body mass index on coronary heart disease # Extract and harmonise data a <- extract_instruments("ieu-a-2") b <- extract_outcome_data(a$SNP, 7) dat <- harmonise_data(a, b) # Apply all MR methods r <- mr_wrapper(dat) # Load the rf object containing the trained models load("rf.rdata") # Update the results with mixture of experts r <- mr_moe(r, rf) # Now you can view the estimates, and see that they have # been sorted in order from most likely to least likely to # be accurate, based on MOE prediction r[[1]]$estimates ## End(Not run)
## Not run: # Example of body mass index on coronary heart disease # Extract and harmonise data a <- extract_instruments("ieu-a-2") b <- extract_outcome_data(a$SNP, 7) dat <- harmonise_data(a, b) # Apply all MR methods r <- mr_wrapper(dat) # Load the rf object containing the trained models load("rf.rdata") # Update the results with mixture of experts r <- mr_moe(r, rf) # Now you can view the estimates, and see that they have # been sorted in order from most likely to least likely to # be accurate, based on MOE prediction r[[1]]$estimates ## End(Not run)
Modification to standard weighted median MR Updated based on Burgess 2016 "Robust instrumental variable methods using multiple candidate instruments with application to Mendelian randomization"
mr_penalised_weighted_median( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
mr_penalised_weighted_median( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
b_exp |
Vector of genetic effects on exposure |
b_out |
Vector of genetic effects on outcome |
se_exp |
Standard errors of genetic effects on exposure |
se_out |
Standard errors of genetic effects on outcome |
parameters |
List containing |
List with the following elements:
MR estimate
Standard error
p-value
Performs MR Egger and returns intercept values.
mr_pleiotropy_test(dat)
mr_pleiotropy_test(dat)
dat |
Harmonised exposure and outcome data. Output from |
data frame
Robust adjusted profile score
mr_raps(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
mr_raps(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
A list of parameters. Specifically, |
This function calls the mr.raps
package. Please refer to the documentation of that package for more detail.
List with the following elements:
MR estimate
Standard error
p-value
Number of SNPs
Qingyuan Zhao, Jingshu Wang, Jack Bowden, Dylan S. Small. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Forthcoming.
Using the output from the mr()
function this report will generate a report containing tables and graphs summarising the results.
A separate report is produced for each exposure - outcome pair that was analysed.
mr_report( dat, output_path = ".", output_type = "html", author = "Analyst", study = "Two Sample MR", path = system.file("reports", package = "TwoSampleMR"), ... )
mr_report( dat, output_path = ".", output_type = "html", author = "Analyst", study = "Two Sample MR", path = system.file("reports", package = "TwoSampleMR"), ... )
dat |
Output from |
output_path |
Directory in which reports should be saved. |
output_type |
Choose |
author |
Author name. |
study |
Study title. |
path |
The filepath to the report template. |
... |
Extra options to be passed to |
MR Rucker framework.
mr_rucker(dat, parameters = default_parameters())
mr_rucker(dat, parameters = default_parameters())
dat |
Output from |
parameters |
List of Qthresh for determining transition between models, and alpha values for calculating confidence intervals. Defaults to 0.05 for both in |
list
Run Rucker with bootstrap estimates.
mr_rucker_bootstrap(dat, parameters = default_parameters())
mr_rucker_bootstrap(dat, parameters = default_parameters())
dat |
Output from |
parameters |
List of parameters. The default is |
List
Uses Cook's distance D > 4/nsnp to iteratively remove outliers.
mr_rucker_cooksdistance(dat, parameters = default_parameters())
mr_rucker_cooksdistance(dat, parameters = default_parameters())
dat |
Output from |
parameters |
List of parameters. The default is |
List
Run rucker with jackknife estimates.
mr_rucker_jackknife(dat, parameters = default_parameters())
mr_rucker_jackknife(dat, parameters = default_parameters())
dat |
Output from harmonise_data. |
parameters |
List of parameters. The default is |
List
Requires dev version of ggplot2
mr_scatter_plot(mr_results, dat)
mr_scatter_plot(mr_results, dat)
mr_results |
Output from |
dat |
Output from |
List of plots
Tests how often the SNP-exposure and SNP-outcome signs are concordant. This is to avoid the problem of averaging over all SNPs, which can suffer bias due to outliers with strong effects; and to avoid excluding SNPs which is implicit in median and mode based estimators. The effect estimate here is not to be interpreted as the effect size - it is the proportion of SNP-exposure and SNP-outcome effects that have concordant signs. e.g. +1 means all have the same sign, -1 means all have opposite signs, and 0 means that there is an equal number of concordant and discordant signs. Restricted to only work if there are 6 or more valid SNPs.
mr_sign(b_exp, b_out, se_exp = NULL, se_out = NULL, parameters = NULL)
mr_sign(b_exp, b_out, se_exp = NULL, se_out = NULL, parameters = NULL)
b_exp |
Vector of genetic effects on exposure |
b_out |
Vector of genetic effects on outcome |
se_exp |
Not required |
se_out |
Not required |
parameters |
Not required |
List with the following elements:
Concordance (see description)
NA
p-value
Number of SNPs (excludes NAs and effect estimates that are 0)
Perform MR using summary statistics. Bootstraps used to calculate standard error.
mr_simple_median( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
mr_simple_median( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
The number of bootstrap replications used to calculate the SE can be set through |
List with the following elements:
MR estimate
Standard error
p-value
The number of SNPs
MR simple mode estimator
mr_simple_mode(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
mr_simple_mode(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
b_exp |
Vector of genetic effects on exposure |
b_out |
Vector of genetic effects on outcome |
se_exp |
Standard errors of genetic effects on exposure |
se_out |
Standard errors of genetic effects on outcome |
parameters |
List containing |
List with the following elements:
MR estimate
Standard error
p-value
MR simple mode estimator (NOME).
mr_simple_mode_nome( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
mr_simple_mode_nome( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
b_exp |
Vector of genetic effects on exposure |
b_out |
Vector of genetic effects on outcome |
se_exp |
Standard errors of genetic effects on exposure |
se_out |
Standard errors of genetic effects on outcome |
parameters |
List containing |
List with the following elements:
MR estimate
Standard error
p-value
Perform 2 sample MR on each SNP individually
mr_singlesnp( dat, parameters = default_parameters(), single_method = "mr_wald_ratio", all_method = c("mr_ivw", "mr_egger_regression") )
mr_singlesnp( dat, parameters = default_parameters(), single_method = "mr_wald_ratio", all_method = c("mr_ivw", "mr_egger_regression") )
dat |
Output from |
parameters |
List of parameters. The default is |
single_method |
Function to use for MR analysis. The default is |
all_method |
Functions to use for MR analysis. The default is |
List of data frames
A statistical test for whether the assumption that exposure causes outcome is valid
mr_steiger(p_exp, p_out, n_exp, n_out, r_exp, r_out, r_xxo = 1, r_yyo = 1, ...)
mr_steiger(p_exp, p_out, n_exp, n_out, r_exp, r_out, r_xxo = 1, r_yyo = 1, ...)
p_exp |
Vector of p-values of SNP-exposure |
p_out |
Vector of p-values of SNP-outcome |
n_exp |
Sample sizes for p_exp |
n_out |
Sample sizes for p_out |
r_exp |
Vector of absolute correlations for SNP-exposure |
r_out |
Vector of absolute correlations for SNP-outcome |
r_xxo |
Measurememt precision of exposure |
r_yyo |
Measurement precision of outcome |
... |
Further arguments to be passed to |
List with the following elements:
Estimated variance explained in x
Estimated variance explained in y
Predicted variance explained in x accounting for estimated measurement error
Predicted variance explained in y accounting for estimated measurement error
TRUE
/FALSE
p-value for inference of direction
TRUE
/FALSE
, direction of causality for given measurement error parameters
p-value for inference of direction of causality for given measurement error parameters
Total volume of the error parameter space
Volume of the parameter space that gives the incorrect answer
Volume of the paramtere space that gives the correct answer
Ratio of vz1/vz0. Higher means inferred direction is less susceptible to measurement error
Plot of parameter space of causal directions and measurement error
A statistical test for whether the assumption that exposure causes outcome is valid
mr_steiger2(r_exp, r_out, n_exp, n_out, r_xxo = 1, r_yyo = 1, ...)
mr_steiger2(r_exp, r_out, n_exp, n_out, r_xxo = 1, r_yyo = 1, ...)
r_exp |
Vector of correlations of SNP-exposure |
r_out |
Vector of correlations of SNP-outcome |
n_exp |
Sample sizes for p_exp |
n_out |
Sample sizes for p_out |
r_xxo |
Measurememt precision of exposure |
r_yyo |
Measurement precision of outcome |
... |
Further arguments to be passed to |
List with the following elements:
Estimated variance explained in x
Estimated variance explained in y
Predicted variance explained in x accounting for estimated measurement error
Predicted variance explained in y accounting for estimated measurement error
TRUE
/FALSE
p-value for inference of direction
TRUE
/FALSE
, direction of causality for given measurement error parameters
p-value for inference of direction of causality for given measurement error parameters
Total volume of the error parameter space
Volume of the parameter space that gives the incorrect answer
Volume of the paramtere space that gives the correct answer
Ratio of vz1/vz0. Higher means inferred direction is less susceptible to measurement error
Plot of parameter space of causal directions and measurement error
Maximum likelihood MR method
mr_two_sample_ml(b_exp, b_out, se_exp, se_out, parameters)
mr_two_sample_ml(b_exp, b_out, se_exp, se_out, parameters)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
causal effect estimate
standard error
p-value
Heterogeneity stats
The default multiplicative random effects IVW estimate.
The standard error is corrected for under dispersion
Use the mr_ivw_mre()
function for estimates that don't correct for under dispersion.
mr_uwr(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
mr_uwr(b_exp, b_out, se_exp, se_out, parameters = default_parameters())
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. The default is |
List with the following elements:
MR estimate
Standard error
p-value
Heterogeneity stats
Perform 2 sample IV using Wald ratio.
mr_wald_ratio(b_exp, b_out, se_exp, se_out, parameters)
mr_wald_ratio(b_exp, b_out, se_exp, se_out, parameters)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
List of parameters. |
List with the following elements:
causal effect estimate
standard error
p-value
1
Perform MR using summary statistics. Bootstraps used to calculate standard error.
mr_weighted_median( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
mr_weighted_median( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
parameters |
The default is |
List with the following elements:
MR estimate
Standard error
p-value
MR weighted mode estimator
mr_weighted_mode( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
mr_weighted_mode( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
b_exp |
Vector of genetic effects on exposure |
b_out |
Vector of genetic effects on outcome |
se_exp |
Standard errors of genetic effects on exposure |
se_out |
Standard errors of genetic effects on outcome |
parameters |
List containing |
List with the following elements:
MR estimate
Standard error
p-value
Weighted mode estimator
mr_weighted_mode_nome( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
mr_weighted_mode_nome( b_exp, b_out, se_exp, se_out, parameters = default_parameters() )
b_exp |
Vector of genetic effects on exposure |
b_out |
Vector of genetic effects on outcome |
se_exp |
Standard errors of genetic effects on exposure |
se_out |
Standard errors of genetic effects on outcome |
parameters |
List containing |
List with the following elements:
MR estimate
Standard error
p-value
Perform full set of MR analyses
mr_wrapper(dat, parameters = default_parameters())
mr_wrapper(dat, parameters = default_parameters())
dat |
Output from |
parameters |
Parameters to pass to MR functions. Output from |
list
Performs initial multivariable MR analysis from Burgess et al 2015. For each exposure the outcome is residualised for all the other exposures, then unweighted regression is applied.
mv_basic(mvdat, pval_threshold = 5e-08)
mv_basic(mvdat, pval_threshold = 5e-08)
mvdat |
Output from |
pval_threshold |
P-value threshold to include instruments. The default is |
List of results
Requires a list of IDs from available_outcomes. For each ID, it extracts instruments. Then, it gets the full list of all instruments and extracts those SNPs for every exposure. Finally, it keeps only the SNPs that are a) independent and b) present in all exposures, and harmonises them to be all on the same strand.
mv_extract_exposures( id_exposure, clump_r2 = 0.001, clump_kb = 10000, harmonise_strictness = 2, opengwas_jwt = ieugwasr::get_opengwas_jwt(), find_proxies = TRUE, force_server = FALSE, pval_threshold = 5e-08, pop = "EUR", plink_bin = NULL, bfile = NULL )
mv_extract_exposures( id_exposure, clump_r2 = 0.001, clump_kb = 10000, harmonise_strictness = 2, opengwas_jwt = ieugwasr::get_opengwas_jwt(), find_proxies = TRUE, force_server = FALSE, pval_threshold = 5e-08, pop = "EUR", plink_bin = NULL, bfile = NULL )
id_exposure |
Array of IDs (e.g. c(299, 300, 302) for HDL, LDL, trigs) |
clump_r2 |
The default is |
clump_kb |
The default is |
harmonise_strictness |
See the |
opengwas_jwt |
Used to authenticate protected endpoints. Login to https://api.opengwas.io to obtain a jwt. Provide the jwt string here, or store in .Renviron under the keyname OPENGWAS_JWT. |
find_proxies |
Look for proxies? This slows everything down but is more accurate. The default is |
force_server |
Whether to search through pre-clumped dataset or to re-extract and clump directly from the server. The default is |
pval_threshold |
Instrument detection p-value threshold. Default = |
pop |
Which 1000 genomes super population to use for clumping when using the server |
plink_bin |
If |
bfile |
If this is provided then will use the API. Default = |
data frame in exposure_dat
format
Allows you to read in summary data from text files to format the multivariable exposure dataset.
mv_extract_exposures_local( filenames_exposure, sep = " ", phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, log_pval = FALSE, pval_threshold = 5e-08, plink_bin = NULL, bfile = NULL, clump_r2 = 0.001, clump_kb = 10000, pop = "EUR", harmonise_strictness = 2 )
mv_extract_exposures_local( filenames_exposure, sep = " ", phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, log_pval = FALSE, pval_threshold = 5e-08, plink_bin = NULL, bfile = NULL, clump_r2 = 0.001, clump_kb = 10000, pop = "EUR", harmonise_strictness = 2 )
filenames_exposure |
Filenames for each exposure dataset. Must have header with at least SNP column present. Following arguments are used for determining how to read the filename and clumping etc. |
sep |
Specify delimeter in file. The default is space, i.e. |
phenotype_col |
Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value |
snp_col |
Required name of column with SNP rs IDs. The default is |
beta_col |
Required for MR. Name of column with effect sizes. THe default is |
se_col |
Required for MR. Name of column with standard errors. The default is |
eaf_col |
Required for MR. Name of column with effect allele frequency. The default is |
effect_allele_col |
Required for MR. Name of column with effect allele. Must be "A", "C", "T" or "G". The default is |
other_allele_col |
Required for MR. Name of column with non effect allele. Must be "A", "C", "T" or "G". The default is |
pval_col |
Required for enrichment tests. Name of column with p-value. The default is |
units_col |
Optional column name for units. The default is |
ncase_col |
Optional column name for number of cases. The default is |
ncontrol_col |
Optional column name for number of controls. The default is |
samplesize_col |
Optional column name for sample size. The default is |
gene_col |
Optional column name for gene name. The default is |
id_col |
Optional column name to give the dataset an ID. Will be generated automatically if not provided for every trait / unit combination. The default is |
min_pval |
Minimum allowed p-value. The default is |
log_pval |
The pval is -log10(P). The default is |
pval_threshold |
Default= |
plink_bin |
If |
bfile |
If this is provided then will use the API. Default = |
clump_r2 |
Default= |
clump_kb |
Default= |
pop |
Which 1000 genomes super population to use for clumping when using the server |
harmonise_strictness |
See action argument in |
Note that you can provide an array of column names for each column, which is of length filenames_exposure
List
Harmonise exposure and outcome for multivariable MR
mv_harmonise_data(exposure_dat, outcome_dat, harmonise_strictness = 2)
mv_harmonise_data(exposure_dat, outcome_dat, harmonise_strictness = 2)
exposure_dat |
Output from |
outcome_dat |
Output from |
harmonise_strictness |
See the |
List of vectors and matrices required for mv analysis.
a matrix of beta coefficients, in which rows correspond to SNPs and columns correspond to exposures.
is the same as exposure_beta
, but for standard errors.
the same as exposure_beta
, but for p-values.
A data frame with two variables, id.exposure
and exposure
which are character strings.
an array of effects for the outcome, corresponding to the SNPs in exposure_beta
.
an array of standard errors for the outcome.
an array of p-values for the outcome.
A data frame with two variables, id.outcome
and outcome
which are character strings.
Performs modified multivariable MR analysis. For each exposure the instruments are selected then all exposures for those SNPs are regressed against the outcome together, weighting for the inverse variance of the outcome.
mv_ivw(mvdat, pval_threshold = 5e-08)
mv_ivw(mvdat, pval_threshold = 5e-08)
mvdat |
Output from |
pval_threshold |
P-value threshold to include instruments. The default is |
List of results
Apply LASSO feature selection to mvdat object
mv_lasso_feature_selection(mvdat)
mv_lasso_feature_selection(mvdat)
mvdat |
Output from |
data frame of retained features
Performs modified multivariable MR analysis. For each exposure the instruments are selected then all exposures for those SNPs are regressed against the outcome together, weighting for the inverse variance of the outcome.
mv_multiple( mvdat, intercept = FALSE, instrument_specific = FALSE, pval_threshold = 5e-08, plots = FALSE )
mv_multiple( mvdat, intercept = FALSE, instrument_specific = FALSE, pval_threshold = 5e-08, plots = FALSE )
mvdat |
Output from |
intercept |
Should the intercept by estimated ( |
instrument_specific |
Should the estimate for each exposure be obtained by using all instruments from all exposures ( |
pval_threshold |
P-value threshold to include instruments. The default is |
plots |
Create plots? The default is |
List of results
Performs initial multivariable MR analysis from Burgess et al 2015. For each exposure the outcome is residualised for all the other exposures, then unweighted regression is applied.
mv_residual( mvdat, intercept = FALSE, instrument_specific = FALSE, pval_threshold = 5e-08, plots = FALSE )
mv_residual( mvdat, intercept = FALSE, instrument_specific = FALSE, pval_threshold = 5e-08, plots = FALSE )
mvdat |
Output from |
intercept |
Should the intercept by estimated ( |
instrument_specific |
Should the estimate for each exposure be obtained by using all instruments from all exposures ( |
pval_threshold |
P-value threshold to include instruments. The default is |
plots |
Create plots? The default is |
List of results
The function proceeds as follows:
Select features (by default this is done using LASSO feature selection).
Subset the mvdat to only retain relevant features and instruments.
Perform MVMR on remaining data.
mv_subset( mvdat, features = mv_lasso_feature_selection(mvdat), intercept = FALSE, instrument_specific = FALSE, pval_threshold = 5e-08, plots = FALSE )
mv_subset( mvdat, features = mv_lasso_feature_selection(mvdat), intercept = FALSE, instrument_specific = FALSE, pval_threshold = 5e-08, plots = FALSE )
mvdat |
Output from |
features |
Dataframe of features to retain, must have column with name 'exposure' that has list of exposures to retain from mvdat. The default is |
intercept |
Should the intercept by estimated ( |
instrument_specific |
Should the estimate for each exposure be obtained by using all instruments from all exposures ( |
pval_threshold |
P-value threshold to include instruments. The default is |
plots |
Create plots? The default is |
List of results
When there are duplicate summary sets for a particular exposure-outcome combination, this function keeps the exposure-outcome summary set with the highest expected statistical power. This can be done by dropping the duplicate summary sets with the smaller sample sizes. Alternatively, the pruning procedure can take into account instrument strength and outcome sample size. The latter is useful, for example, when there is considerable variation in SNP coverage between duplicate summary sets (e.g. because some studies have used targeted or fine mapping arrays). If there are a large number of SNPs available to instrument an exposure, the outcome GWAS with the better SNP coverage may provide better power than the outcome GWAS with the larger sample size.
power_prune(dat, method = 1, dist.outcome = "binary")
power_prune(dat, method = 1, dist.outcome = "binary")
dat |
Results from |
method |
Should the duplicate summary sets be pruned on the basis of sample size alone ( |
dist.outcome |
The distribution of the outcome. Can either be |
data.frame with duplicate summary sets removed
Reads in exposure data. Checks and organises columns for use with MR or enrichment tests. Infers p-values when possible from beta and se.
read_exposure_data( filename, clump = FALSE, sep = " ", phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, log_pval = FALSE, chr_col = "chr", pos_col = "pos" )
read_exposure_data( filename, clump = FALSE, sep = " ", phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, log_pval = FALSE, chr_col = "chr", pos_col = "pos" )
filename |
Filename. Must have header with at least SNP column present. |
clump |
Whether to perform LD clumping with |
sep |
Specify delimeter in file. The default is a space, i.e. |
phenotype_col |
Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value "Outcome". The default is |
snp_col |
Required name of column with SNP rs IDs. The default is |
beta_col |
Required for MR. Name of column with effect sizes. The default is |
se_col |
Required for MR. Name of column with standard errors. The default is |
eaf_col |
Required for MR. Name of column with effect allele frequency. The default is |
effect_allele_col |
Required for MR. Name of column with effect allele. Must be "A", "C", "T" or "G". The default is |
other_allele_col |
Required for MR. Name of column with non effect allele. Must be "A", "C", "T" or "G". The default is |
pval_col |
Required for enrichment tests. Name of column with p-value. The default is |
units_col |
Optional column name for units. The default is |
ncase_col |
Optional column name for number of cases. The default is |
ncontrol_col |
Optional column name for number of controls. The default is |
samplesize_col |
Optional column name for sample size. The default is |
gene_col |
Optional column name for gene name. The default is |
id_col |
Optional column name to give the dataset an ID. Will be generated automatically if not provided for every trait / unit combination. The default is |
min_pval |
Minimum allowed p-value. The default is |
log_pval |
The p-value is -log10(P). The default is |
chr_col |
Optional column name for chromosome. Default is |
pos_col |
Optional column name for genetic position Default is |
data frame
Reads in outcome data. Checks and organises columns for use with MR or enrichment tests. Infers p-values when possible from beta and se.
read_outcome_data( filename, snps = NULL, sep = " ", phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, log_pval = FALSE, chr_col = "chr", pos_col = "pos" )
read_outcome_data( filename, snps = NULL, sep = " ", phenotype_col = "Phenotype", snp_col = "SNP", beta_col = "beta", se_col = "se", eaf_col = "eaf", effect_allele_col = "effect_allele", other_allele_col = "other_allele", pval_col = "pval", units_col = "units", ncase_col = "ncase", ncontrol_col = "ncontrol", samplesize_col = "samplesize", gene_col = "gene", id_col = "id", min_pval = 1e-200, log_pval = FALSE, chr_col = "chr", pos_col = "pos" )
filename |
Filename. Must have header with at least SNP column present. |
snps |
SNPs to extract. If |
sep |
Specify delimeter in file. The default is space, i.e. |
phenotype_col |
Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value |
snp_col |
Required name of column with SNP rs IDs. The default is |
beta_col |
Required for MR. Name of column with effect sizes. THe default is |
se_col |
Required for MR. Name of column with standard errors. The default is |
eaf_col |
Required for MR. Name of column with effect allele frequency. The default is |
effect_allele_col |
Required for MR. Name of column with effect allele. Must be "A", "C", "T" or "G". The default is |
other_allele_col |
Required for MR. Name of column with non effect allele. Must be "A", "C", "T" or "G". The default is |
pval_col |
Required for enrichment tests. Name of column with p-value. The default is |
units_col |
Optional column name for units. The default is |
ncase_col |
Optional column name for number of cases. The default is |
ncontrol_col |
Optional column name for number of controls. The default is |
samplesize_col |
Optional column name for sample size. The default is |
gene_col |
Optional column name for gene name. The default is |
id_col |
Optional column name to give the dataset an ID. Will be generated automatically if not provided for every trait / unit combination. The default is |
min_pval |
Minimum allowed p-value. The default is |
log_pval |
The pval is -log10(P). The default is |
chr_col |
Optional column name for chromosome. Default is |
pos_col |
Optional column name for genetic position Default is |
data frame
See https://github.com/rondolab/MR-PRESSO for more details.
run_mr_presso(dat, NbDistribution = 1000, SignifThreshold = 0.05)
run_mr_presso(dat, NbDistribution = 1000, SignifThreshold = 0.05)
dat |
Output from |
NbDistribution |
Number of bootstrap replications. The default is |
SignifThreshold |
Outlier significance threshold. The default is |
List of results for every exposure/outcome combination
See https://github.com/gqi/MRMix for more details.
run_mrmix(dat)
run_mrmix(dat)
dat |
Output from |
List of results, with one list item for every exposure/outcome pair in dat object
Whens there are duplicate summary sets for a particular exposure-outcome combination, this function drops the duplicates with the smaller total sample size (for binary outcomes, the number of cases is used instead of total sample size).
size.prune(dat)
size.prune(dat)
dat |
Results from |
data frame
This function sorts user-supplied results for the forest_plot_1_to_many()
function. The user supplies their results in the form of a data frame.
sort_1_to_many( mr_res, b = "b", trait_m = "outcome", sort_action = 4, group = NULL, priority = NULL )
sort_1_to_many( mr_res, b = "b", trait_m = "outcome", sort_action = 4, group = NULL, priority = NULL )
mr_res |
Data frame of results supplied by the user. |
b |
Name of the column specifying the effect of the exposure on the outcome. The default is |
trait_m |
The column specifying the names of the traits. Corresponds to 'many' in the 1-to-many forest plot. The default is |
sort_action |
Choose how to sort results.
|
group |
Name of grouping variable in |
priority |
If |
data frame.
This function takes the exposure column from the results generated by mr()
and splits it into separate columns for 'exposure name' and 'id'.
split_exposure(mr_res)
split_exposure(mr_res)
mr_res |
Results from |
data frame
This function takes the outcome column from the results generated by mr()
and splits it into separate columns for 'outcome name' and 'id'.
split_outcome(mr_res)
split_outcome(mr_res)
mr_res |
Results from |
data frame
Uses estimate_trait_sd()
.
standardise_units(dat)
standardise_units(dat)
dat |
Output from |
Data frame
This function takes an object from harmonise_data()
and does the following:
If there is no rsq.exposure or rsq.outcome column it will try to estimate it.
This is done differently for traits that have "log odds" units.
To estimate rsq for quantitative traits there must be either p-values and sample sizes for each SNP,
or effect sizes and standard errors AND the units are in SD units (the column must contain "SD").
To estimate rsq for binary traits the units must be called "log odds" and there must be beta.exposure,
eaf.exposure, ncase.exposure, ncontrol.exposure, prevalence.exposure.
The same principles apply for calculating the rsq for the outcome trait, except column names are beta.outcome etc.
If prevalence isn't supplied then it uses 0.1 by default.
steiger_filtering(dat)
steiger_filtering(dat)
dat |
Output from |
Once rsq is calculated for the exposure and outcome, it will then perform the Steiger test for each SNP to see if the rsq of the exposure is larger than the rsq of the outcome.
Note that Steiger filtering, while useful, does have its own pitfalls. Try to use replication effect estimates for the exposure (which are not biased by winner's curse), and note that if there is strong antagonistic confounding or differential measurement error between the exposure and outcome then the causal directions could be inferred incorrectly.
harmonise_data()
style data frame with additional columns rsq.exposure, rsq.outcome, steiger_dir (which is TRUE
if the rsq.exposure is larger than rsq.outcome) and steiger_pval which is a test to see if rsq.exposure is significantly larger than rsq.outcome.
Evaluate the Steiger test's sensitivity to measurement error
steiger_sensitivity(rgx_o, rgy_o, ...)
steiger_sensitivity(rgx_o, rgy_o, ...)
rgx_o |
Observed variance of exposure explained by SNPs |
rgy_o |
Observed variance of outcome explained by SNPs |
... |
Further arguments to be passed to |
List with the following elements:
Total volume of the error parameter space
Volume of the parameter space that gives the incorrect answer
Volume of the paramtere space that gives the correct answer
Ratio of vz1/vz0. Higher means inferred direction is less susceptible to measurement error
plot of parameter space
This function takes MR results from mr()
and restricts to a single method per exposure x disease combination.
subset_on_method( mr_res, single_snp_method = "Wald ratio", multi_snp_method = "Inverse variance weighted" )
subset_on_method( mr_res, single_snp_method = "Wald ratio", multi_snp_method = "Inverse variance weighted" )
mr_res |
Results from |
single_snp_method |
Which of the single SNP methods to use when only 1 SNP was used to estimate the causal effect? The default is |
multi_snp_method |
Which of the multi-SNP methods to use when there was more than 1 SNPs used to estimate the causal effect? The default is |
data frame.
Trim function to remove leading and trailing blank spaces
trim(x)
trim(x)
x |
Character or array of character |
Character or array of character
New method from Jack
weighted_median(b_iv, weights)
weighted_median(b_iv, weights)
b_iv |
Wald ratios |
weights |
Weights of each SNP |
MR estimate
Based on new script for weighted median confidence interval, update 31 July 2015.
weighted_median_bootstrap(b_exp, b_out, se_exp, se_out, weights, nboot)
weighted_median_bootstrap(b_exp, b_out, se_exp, se_out, weights, nboot)
b_exp |
Vector of genetic effects on exposure. |
b_out |
Vector of genetic effects on outcome. |
se_exp |
Standard errors of genetic effects on exposure. |
se_out |
Standard errors of genetic effects on outcome. |
weights |
Weights to apply to each SNP. |
nboot |
Number of bootstrap replications. The default is |
Empirical standard error