--- title: "Mendelian randomization" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Mendelian randomization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval=FALSE ) ``` To use the [IEU GWAS database](https://gwas.mrcieu.ac.uk/) for MR analysis, see the [TwoSampleMR](https://mrcieu.github.io/TwoSampleMR/) R package. Here we'll demonstrate how to achieve the same data extractions using the [GWAS VCF](https://github.com/MRCIEU/gwas_vcf_spec) files. We'll use the example of LDL cholesterol [ieu-a-300](https://gwas.mrcieu.ac.uk/datasets/ieu-a-300/) and coronary heart disease [ieu-a-7](https://gwas.mrcieu.ac.uk/datasets/ieu-a-7/). Load libraries: ```{r} suppressPackageStartupMessages(suppressWarnings({ library(TwoSampleMR) library(gwasglue) library(gwasvcf) library(ieugwasr) library(dplyr) })) ``` ## Using TwoSampleMR This is a simple procedure for MR using the TwoSampleMR package: ```{r} # Extract the instruments for LDL expd1 <- TwoSampleMR::extract_instruments("ieu-a-300") # Extract those SNP effects for CHD outd1 <- TwoSampleMR::extract_outcome_data(expd1$SNP, "ieu-a-7", proxies=FALSE) # Harmonise the exposure and outcome data dat1 <- TwoSampleMR::harmonise_data(expd1, outd1) # Perform MR TwoSampleMR::mr(dat1) ``` Note that this extraction process can be simplified with: ```{r, eval=FALSE} dat1 <- make_dat("ieu-a-300", "ieu-a-7") ``` ## Using GWAS VCF files Let's do the same with the vcf files (and the indexes). Download from here: ```bash wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz.tbi wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz.tbi ``` First get the tophits for LDL cholesterol ```{r} gwasvcf::set_bcftools() expd2 <- gwasvcf::query_gwas("ieu-a-300.vcf.gz", chrompos=paste0(expd1$chr.exposure, ":", expd1$pos.exposure)) ``` Convert to TwoSampleMR format: ```{r} expd2 <- gwasglue::gwasvcf_to_TwoSampleMR(expd2, type="exposure") ``` Extract those SNPs from the outcome vcf file and convert to TwoSampleMR format ```{r} outd2 <- gwasvcf::query_gwas("ieu-a-7.vcf.gz", chrompos = paste0(expd1$chr.exposure, ":", expd1$pos.exposure)) outd2 <- gwasglue::gwasvcf_to_TwoSampleMR(outd2, "outcome") ``` Proceed with harmonising and performing MR ```{r} dat2 <- TwoSampleMR::harmonise_data(expd2, outd2) TwoSampleMR::mr(dat2) ``` ## Other options ### Clumping vcf files If we want to extract top hits based on a threshold and clump locally. First download the LD reference dataset: ```bash wget http://fileserve.mrcieu.ac.uk/ld/data_maf0.01_rs_ref.tgz tar xzvf data_maf0.01_rs_ref.tgz ``` Now extract the top hits based on a p-value threshold ```{r} gwasvcf::set_bcftools() expd3 <- gwasvcf::query_gwas("ieu-a-300.vcf.gz", pval=5e-8) expd3 ``` Convert to TwoSampleMR format: ```{r} expd3 <- gwasglue::gwasvcf_to_TwoSampleMR(expd3, type="exposure") ``` Get a list of SNPs to retain after clumping and subset the data ```{r} retain_snps <- expd3 %>% dplyr::select(rsid=SNP, pval=pval.exposure) %>% ieugwasr::ld_clump(., plink_bin=genetics.binaRies::get_plink_binary(), bfile="data_maf0.01_rs_ref") %>% {.$rsid} expd3 <- subset(expd3, SNP %in% retain_snps) ``` ### Extracting outcome data with LD proxies This only works if you extract on rsids at the moment, and is quite slow. But here it is: ```{r} gwasvcf::set_plink() outd2 <- gwasvcf::query_gwas("ieu-a-7.vcf.gz", rsid = expd3$SNP, proxies="yes", bfile="data_maf0.01_rs_ref") outd2 <- gwasglue::gwasvcf_to_TwoSampleMR(outd2, "outcome") ``` ## Further MR methods A number of other MR methods are available. The current framework for using them is to: 1. Convert your data to TwoSampleMR format 2. Use the TwoSampleMR package to convert to other formats Examples of formats that you can convert to within TwoSampleMR: - [MR-PRESSO](https://github.com/rondolab/MR-PRESSO) - [RadialMR](https://github.com/WSpiller/RadialMR) - [MRMix](https://github.com/gqi/MRMix/) - [MendelianRandomization](https://cran.r-project.org/web/packages/MendelianRandomization/index.html) - [MR-RAPS](https://cran.r-project.org/web/packages/mr.raps/) ## Bluecrystal4 users All data in OpenGWAS is stored on bc4 in the form of GWAS VCF files. You can create harmonised datasets easily on bc4 with these files. Determine the locations of the GWAS VCF files and a number of other reference datasets and binaries: ```{r, eval=FALSE} set_bcftools() set_plink() set_bc4_files() ``` Now simply run: ```{r, eval=FALSE} dat <- make_TwoSampleMR_dat("ieu-a-300", "ieu-a-7") ``` This can be run in parallel for large combinations of exposures and outcomes, e.g.: ```{r, eval=FALSE} dat <- make_TwoSampleMR_dat( id1=c("ieu-a-300", "ieu-a-302", "ieu-a-299"), id2=c("ieu-a-7", "ieu-a-2"), nthreads=6 ) ``` This will lookup all instruments in the exposures (id1) in both exposures and outcomes, and harmonise all exposure-outcome pairs, parallelised across 6 threads. Note: please make sure to run these analyses in batch mode by submitting to the slurm scheduler, not interactively on the head nodes!