To use the IEU GWAS database for MR analysis, see the TwoSampleMR R package.
Here we’ll demonstrate how to achieve the same data extractions using the GWAS VCF files.
We’ll use the example of LDL cholesterol ieu-a-300 and coronary heart disease ieu-a-7.
Load libraries:
suppressPackageStartupMessages(suppressWarnings({
library(TwoSampleMR)
library(gwasglue)
library(gwasvcf)
library(ieugwasr)
library(dplyr)
}))This is a simple procedure for MR using the TwoSampleMR package:
# Extract the instruments for LDL
expd1 <- TwoSampleMR::extract_instruments("ieu-a-300")
# Extract those SNP effects for CHD
outd1 <- TwoSampleMR::extract_outcome_data(expd1$SNP, "ieu-a-7", proxies=FALSE)
# Harmonise the exposure and outcome data
dat1 <- TwoSampleMR::harmonise_data(expd1, outd1)
# Perform MR
TwoSampleMR::mr(dat1)Note that this extraction process can be simplified with:
Let’s do the same with the vcf files (and the indexes). Download from here:
wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz
wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz.tbi
wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz
wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz.tbiFirst get the tophits for LDL cholesterol
gwasvcf::set_bcftools()
expd2 <- gwasvcf::query_gwas("ieu-a-300.vcf.gz", chrompos=paste0(expd1$chr.exposure, ":", expd1$pos.exposure))Convert to TwoSampleMR format:
Extract those SNPs from the outcome vcf file and convert to TwoSampleMR format
outd2 <- gwasvcf::query_gwas("ieu-a-7.vcf.gz", chrompos = paste0(expd1$chr.exposure, ":", expd1$pos.exposure))
outd2 <- gwasglue::gwasvcf_to_TwoSampleMR(outd2, "outcome")Proceed with harmonising and performing MR
If we want to extract top hits based on a threshold and clump locally. First download the LD reference dataset:
Now extract the top hits based on a p-value threshold
Convert to TwoSampleMR format:
Get a list of SNPs to retain after clumping and subset the data
This only works if you extract on rsids at the moment, and is quite slow. But here it is:
A number of other MR methods are available. The current framework for using them is to:
Examples of formats that you can convert to within TwoSampleMR:
All data in OpenGWAS is stored on bc4 in the form of GWAS VCF files. You can create harmonised datasets easily on bc4 with these files.
Determine the locations of the GWAS VCF files and a number of other reference datasets and binaries:
Now simply run:
This can be run in parallel for large combinations of exposures and outcomes, e.g.:
dat <- make_TwoSampleMR_dat(
id1=c("ieu-a-300", "ieu-a-302", "ieu-a-299"),
id2=c("ieu-a-7", "ieu-a-2"),
nthreads=6
)This will lookup all instruments in the exposures (id1) in both exposures and outcomes, and harmonise all exposure-outcome pairs, parallelised across 6 threads.
Note: please make sure to run these analyses in batch mode by submitting to the slurm scheduler, not interactively on the head nodes!