--- title: "Tutorial 5: Alternative regional genotype-phenotype map example, with meta-analyses included" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Tutorial 5: Alternative regional genotype-phenotype map example, with meta-analyses included} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ```{r eval= FALSE} remotes::install_github('ritarasteiro/hyprcoloc', build_opts = c('--resave-data', '--no-manual'), build_vignettes = TRUE) remotes::install_github('ritarasteiro/susieR') ``` ```{r setup } #library(gwasglue2) devtools::load_all("../") # this was added just for development library(ieugwasr) library(susieR) # fork that takes Summaryset library(hyprcoloc) # fork that takes Dataset library(dplyr) ``` This tutorial aims to study the use of statins, a drug to reduce cholesterol, on coronary heart disease and type 2 diabetes. There is some evidence that using statins can increase blood sugar, which can put people who use statins at higher risk of developing type 2 diabetes. We will be generating a regional genotype-phenotype map for the following traits: chd - coronary heart disease; ldl - low-density lipoprotein; hdl - high-density lipoprotein; trig - triglycerides and t2d - type 2 diabetes around different genes. In this example, meta-analysis will be performed for the chd trait using two independent studies `ieu-a-7` and `ukb-d-I9_IHD`. First, we set the path to where the reference population plink files are located to build the LD matrix ```{r} bed_ref <- "../data/ld/EUR" ``` and choose the data for each trait and set the IEU IDs. Note that we are performing a meta-analysis for the chd trait. ```{r} chd_id <- c("ieu-a-7", "ukb-d-I9_IHD") ldl_id <- "ieu-b-110" hdl_id <- "ieu-b-109" trig_id <- "ieu-b-111" t2d_id <- "ebi-a-GCST006867" ``` # 1. Setting the metadata We create a `chd_metadata` object for the meta-analysis we are going to perform on the chd trait. ```{r} chd_metadata <- lapply(seq_along(chd_id), function(i){ m <- create_metadata(ieugwasr::gwasinfo(chd_id[i])) }) ``` and then the `metadata` objects for each of the other trait objects ```{r} ids <- c(ldl_id, hdl_id, trig_id, t2d_id) # get metadata and create metadata objects metadata <- lapply(seq_along(ids), function(i){ m <- create_metadata(ieugwasr::gwasinfo(ids[i])) }) ``` We are going to use these same `chd_metadata` and `metadata` objects for each analyse for each gene. Note, that in the code above both `chd_metadata` and `metadata` objects are lists that contain metadata information for each trait. Eg., `chd_metadata` will have metadata information for both `ieu-a-7` and `ukb-d-I9_IHD` studies, retrieved using `ieugwasr::gwasinfo()`. # 2. HMGCR (3-hydroxy-3-methylglutaryl-CoA reductase) gene region Set the gene region: ```{r} hmgcr_chrpos <- "5:74132993-75132993" ``` ## 2.1. Meta-analysis for the chd trait The code below performs meta-analysis including two studies (`ieu-a-7` and `ukb-d-I9_IHD`, see above) for the chd trait and the end result is one `hmgcr_chd` summaryset. First, it creates the summary sets for each of the studies followed by the dataset which includes them. Finally, performs the meta-analysis to create a new summary set. ```{r} hmgcr_chd <- lapply(seq_along(chd_id), function(i){ # create summarysets s <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = chd_id[i]), metadata = chd_metadata[[i]]) }) %>% # create dataset and harmonise create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # meta-analysis to create a new summary set meta_analysis(.) ``` ## 2.2. Creating a dataset for all the traits Here, we create an harmonised dataset named `hmgcr`, which includes all the traits in `ids` and the `hmgcr_chd` summaryset created above. We start by creating the summarysets for each trait and then the dataset, which comprise all the summarysets (note that we are not saving the SummarySet objects separately, but filling them directly to the DataSet). Then we add the `hmgcr_chd` summaryset and harmonise trait against trait followed by each trait against a LD matrix. ```{r} # create an harmonised gwasglue2 DataSet object hmgcr <- lapply(seq_along(ids), function(i){ # create summarysets dt <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = ids[i]), metadata = metadata[[i]]) }) %>% # create dataset create_dataset(., harmonise = FALSE) %>% # add chd summaryset and harmonise add_summaryset(hmgcr_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # harmonise dataset against LD matrix harmonise_ld(., bfile = bed_ref , plink_bin = "plink") ``` Before starting with the fine mapping and colocalisation analyses, we are going to look at the p-values for each variant in each trait. ```{r fig.dim = c(8, 12), fig.cap="Manhattan plot for each trait in the HMGCR region", fig.align = "center"} # plot gwasglue2 DataSet object (hmgcr) plot_gwasglue(hmgcr, type="manhattan", title = "HMGCR") ``` ## 2.3. Finemapping with SusieR After summary sets are harmonised, marginalise each summary set independently and create a new dataset with all marginalised summary sets merged ```{r} # do finemapping with susie ntraits <- getLength(hmgcr) hmgcr_marginalised <- lapply(1:ntraits, function(trait) { # Takes in 1 SS # Outputs 1 DS (with at least 1 SS) ds <- susie_rss(R = getLDMatrix(hmgcr), summaryset = getSummarySet(hmgcr, trait)) }) ``` Now we are going to merge the datasets. Note that the `merge_datasets()` function is just going to merge the marginalised datasets. ```{r} hmgcr_marginalised <- merge_datasets(hmgcr_marginalised) ``` ## 2.4. Colocolisation with hypercoloc Try hyprcoloc with raw datasets. (Use fork of hypercoloc that allows a dataset object to be provided) ```{r} res_hmgcr <- hyprcoloc(dataset = hmgcr) #hyprcoloc is using gwasglue2 DataSet class object as input print(res_hmgcr) ``` Try hyprcoloc with marginalised datasets ```{r} res_hmgcr_marginalised <- hyprcoloc(dataset = hmgcr_marginalised) print(res_hmgcr_marginalised) ``` # 3. PCSK9 (proprotein convertase subtilisin/kexin type 9) gene region Set the gene region: ```{r} pcsk9_chrpos <- "1:55005221-56005221" ``` ## 3.1. Meta-analysis for the chd trait Like detailed above, in section 2.1, the code below performs meta-analysis for the chd trait and the end result is one `pcsk9_chd` summaryset. ```{r} pcsk9_chd <- lapply(seq_along(chd_id), function(i){ # create summarysets s <- create_summaryset(ieugwasr::associations(variants = pcsk9_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]]) }) %>% # create dataset and harmonise create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # meta-analysis to create a new summary set meta_analysis(.) ``` ## 3.2. Creating a dataset for all the traits Like detailed in section 2.2, here we create another harmonised dataset named `pcsk9`, this time for the PCSK9 gene region. ```{r} # create an harmonised gwasglue2 DataSet object pcsk9 <- lapply(seq_along(ids), function(i){ # create summarysets dt <- create_summaryset(data = ieugwasr::associations(variants = pcsk9_chrpos, id = ids[i]), metadata = metadata[[i]]) }) %>% # create dataset create_dataset(., harmonise = FALSE) %>% # add chd summaryset and harmonise add_summaryset(pcsk9_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # harmonise dataset against LD matrix harmonise_ld(., bfile = bed_ref , plink_bin = "plink") ``` Manhattan plots for each trait in the DataSet. ```{r fig.dim = c(8, 12)} # plot gwasglue2 DataSet object (pcsk9) plot_gwasglue(pcsk9, type="manhattan", title = "PCSK9") ``` ## 3.3. Finemapping with SusieR Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged. ```{r} # do finemapping with susie ntraits <- getLength(pcsk9) pcsk9_marginalised <- lapply(1:ntraits, function(trait) { # Takes in 1 SS # Outputs 1 DS (with at least 1 SS) ds <- susie_rss(R = getLDMatrix(pcsk9), summaryset = getSummarySet(pcsk9, trait)) }) ``` Merge the marginalised datasets: ```{r} pcsk9_marginalised <- merge_datasets(pcsk9_marginalised) ``` ## 3.4. Colocolisation with hypercoloc Try hyprcoloc with raw datasets. ```{r} res_pcsk9 <- hyprcoloc(dataset = pcsk9) print(res_pcsk9) ``` Try hyprcoloc with marginalised datasets ```{r} res_pcsk9_marginalised <- hyprcoloc(dataset =pcsk9_marginalised) print(res_pcsk9_marginalised) ``` # 4. NPC1L1 (NPC1 like intracellular cholesterol transporter 1) gene region Set the region: ```{r} npc1l1_chrpos <- "7:44052134-45052134" ``` ## 4.1. Meta-analysis for the chd trait Like section 2.1, we'll do a meta-analysis for the chd trait and the end result is one `npc1l1_chd` summaryset. ```{r} npc1l1_chd <- lapply(seq_along(chd_id), function(i){ # create summarysets s <- create_summaryset(ieugwasr::associations(variants = npc1l1_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]]) }) %>% # create dataset and harmonise create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # meta-analysis to create a new summary set meta_analysis(.) ``` ## 4.2. Creating a dataset for all the traits Like detailed in section 2.2, here we create another harmonised dataset named `npc1l1`, this time for the NPC1L1 gene region. ```{r} # create an harmonised gwasglue2 DataSet object npc1l1 <- lapply(seq_along(ids), function(i){ # create summarysets dt <- create_summaryset(data = ieugwasr::associations(variants = npc1l1_chrpos, id = ids[i]), metadata = metadata[[i]]) }) %>% # create dataset create_dataset(., harmonise = FALSE) %>% # add chd summaryset and harmonise add_summaryset(npc1l1_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # harmonise dataset against LD matrix harmonise_ld(., bfile = bed_ref , plink_bin = "plink") ``` Manhattan plots for each trait in the DataSet. ```{r fig.dim = c(8, 12)} # plot gwasglue2 DataSet object (npc1l1) plot_gwasglue(npc1l1, type="manhattan", title ="NPC1L1") ``` ## 4.3. Finemapping with SusieR Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged ```{r} ntraits <- getLength( npc1l1) npc1l1_marginalised <- lapply(1:ntraits, function(trait) { # Takes in 1 SS # Outputs 1 DS (with at least 1 SS) ds <- susie_rss(R = getLDMatrix(npc1l1), summaryset = getSummarySet( npc1l1, trait)) }) ``` Merge the marginalised datasets: ```{r} npc1l1_marginalised <- merge_datasets(npc1l1_marginalised) ``` ## 4.4. Colocolisation with hypercoloc Try hyprcoloc with raw datasets. ```{r} res_npc1l1 <- hyprcoloc(dataset = npc1l1) print(res_npc1l1) ``` Try hyprcoloc with marginalised datasets ```{r} res_npc1l1_marginalised <- hyprcoloc(dataset =npc1l1_marginalised) print(res_npc1l1_marginalised) ``` # 5. LPA (Lipoprotein(A)) gene region Set the region: ```{r} lpa_chrpos <- "6:160952515-161087407" ``` ## 5.1. Meta-analysis for the chd trait Like section 2.1, we'll do a meta-analysis for the chd trait and the end result is one `lpa_chd` summaryset. ```{r} lpa_chd <- lapply(seq_along(chd_id), function(i){ # create summarysets s <- create_summaryset(ieugwasr::associations(variants = lpa_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]]) }) %>% # create dataset and harmonise create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # meta-analysis to create a new summary set meta_analysis(.) ``` ## 5.2. Creating a dataset for all the traits Like detailed in section 2.2, here we create another harmonised dataset named `lpa`, this time for the NPC1L1 gene region. ```{r} # create an harmonised gwasglue2 DataSet object lpa <- lapply(seq_along(ids), function(i){ # create summarysets dt <- create_summaryset(data = ieugwasr::associations(variants = lpa_chrpos, id = ids[i]), metadata = metadata[[i]]) }) %>% # create dataset create_dataset(., harmonise = FALSE) %>% # add chd summaryset and harmonise add_summaryset(lpa_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>% # harmonise dataset against LD matrix harmonise_ld(., bfile = bed_ref , plink_bin = "plink") ``` Manhattan plots for each trait in the DataSet. ```{r fig.dim = c(8, 12)} # plot gwasglue2 DataSet object (lpa) plot_gwasglue(lpa, type="manhattan", title = "LPA") ``` ## 5.3. Finemapping with SusieR Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged ```{r} ntraits <- getLength(lpa) lpa_marginalised <- lapply(1:ntraits, function(trait) { # Takes in 1 SS # Outputs 1 DS (with at least 1 SS) ds <- susie_rss(R = getLDMatrix(lpa), summaryset = getSummarySet(lpa, trait)) }) ``` Merge the marginalised datasets: ```{r} lpa_marginalised <- merge_datasets(lpa_marginalised) ``` ## 5.4. Colocolisation with hypercoloc Try hyprcoloc with raw datasets. ```{r} res_lpa <- hyprcoloc(dataset = lpa) print(res_lpa) ``` Try hyprcoloc with marginalised datasets ```{r} res_lpa_marginalised <- hyprcoloc(dataset = lpa_marginalised) print(res_lpa_marginalised) ```