--- title: "Conditional analysis of VCF files" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Conditional analysis of VCF files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( eval=FALSE, collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(gwasglue) library(gwasvcf) ``` Conditional analysis of VCF files can be performed using GCTA's COJO routine. The procedure implemented here is as follows 1. Obtain clumped top-hits 2. Assign each top-hit to an LD region. The LD regions are demarkated using [this approach](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4731402/). 3. Perform finemapping within each LD region that has a top-hit, retaining a representative variant for every credible set 4. For each LD region that has multiple finemapped loci, perform conditional analysis. e.g. If there are three finemapped loci in a particular region, three conditional analyses will be performed. First, obtain the effects of variant 1 conditional on variants 2 and 3; then variant 2 conditional on variants 1 and 3; then variant 3 conditional on variants 1 and 2. Ultimately, a list of results will be returned where every fine-mapped variant has a regional set of summary data that is conditionally independent of all neighbouring fine-mapped variants. ## Finemapping pipeline 1. Clump dataset 2. Map clumps to LD regions 3. Perform fine mapping in each LD region Setup: ```{r} vcffile <- "ieu-a-300.vcf.gz" ldref <- "/Users/gh13047/repo/mr-base-api/app/ld_files/EUR" gwasvcf::set_bcftools() ``` Perform susieR pipeline: ```{r} out <- susieR_pipeline( vcffile=vcffile, bfile=ldref, plink_bin=genetics.binaRies::get_plink_binary(), pop="EUR", threads=1, L=10, estimate_residual_variance=TRUE, estimate_prior_variance=TRUE, check_R=FALSE, z_ld_weight=1/500 ) ``` Each detected region now has a finemapped object stored against it. You can see them for example like this: ```{r} summary(out$res[[1]]$susieR) susieR::susie_plot(out$res[[1]]$susieR, y="PIP") ``` For each region we can extract the variants with the highest posterior inclusion probability per credible set, e.g.: ```{r} out$res[[1]]$susieR$fmset ``` ## Conditional analysis pipeline Now we can perform conditional analysis at each region using knowledge of the finemapped variants. The `cojo_cond` function does the following 1. Creates temporary directory to store files 2. Writes vcf file to summary stats file in COJO format 3. Determines regions that have multiple fine-mapped variants 4. For each fine-mapped variant, obtains summary stats conditional on other fine-mapped variants in the region The result is a list of regions, with a set of conditional summary stats for every fine-mapped variant in that region. ```{r} out2 <- cojo_cond( vcffile=vcffile, bfile=ldref, pop="EUR", snplist=unlist(sapply(out$res, function(x) x$susieR$fmset)) ) ``` TODO - Make sure finemapped variants are in reference panel - Improve speed of cojo by implementing within R so don't have to use GCTA - Determine how to combine cojo with coloc