---
title: "Tutorial 5: Alternative regional genotype-phenotype map example, with meta-analyses included"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tutorial 5: Alternative regional genotype-phenotype map example, with meta-analyses included}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```


```{r  eval= FALSE}
remotes::install_github('ritarasteiro/hyprcoloc', build_opts = c('--resave-data', '--no-manual'), build_vignettes = TRUE) 
remotes::install_github('ritarasteiro/susieR')
```

```{r setup }
#library(gwasglue2)
devtools::load_all("../") # this was added just for development
library(ieugwasr)
library(susieR)     # fork that takes Summaryset
library(hyprcoloc)  # fork that takes Dataset
library(dplyr)
```

This tutorial aims to study the use of statins, a drug to reduce cholesterol, on coronary heart disease and type 2 diabetes. There  is some evidence that using statins can increase blood sugar, which can put people who use statins at higher risk of developing type 2 diabetes.


We will be generating a regional genotype-phenotype map for the following traits: chd - coronary heart disease; ldl - low-density lipoprotein; hdl - high-density lipoprotein; trig - triglycerides and t2d - type 2 diabetes around different genes. In this example, meta-analysis will be performed for the chd trait using two independent  studies `ieu-a-7` and `ukb-d-I9_IHD`.


First,  we set the path to where the reference population plink files are located to build the LD matrix

```{r}
bed_ref <- "../data/ld/EUR"
```

and choose the data for each trait and set the IEU IDs. Note that we are performing a meta-analysis for the chd trait.

```{r}
chd_id <- c("ieu-a-7",  "ukb-d-I9_IHD")
ldl_id <- "ieu-b-110"
hdl_id <- "ieu-b-109"
trig_id <- "ieu-b-111"
t2d_id <- "ebi-a-GCST006867"
```

# 1. Setting the metadata

We create a `chd_metadata` object for the meta-analysis we are going to perform on the chd trait.

```{r}
chd_metadata <- lapply(seq_along(chd_id), function(i){
  m <- create_metadata(ieugwasr::gwasinfo(chd_id[i])) 
})
```

and then the `metadata` objects for each of the other trait objects

```{r}
ids <- c(ldl_id, hdl_id, trig_id, t2d_id)
# get metadata and create metadata objects
metadata <- lapply(seq_along(ids), function(i){
  m <- create_metadata(ieugwasr::gwasinfo(ids[i])) 
})
```
We are going to use these same `chd_metadata` and `metadata` objects for each analyse for each gene. 

Note, that in the code above both `chd_metadata` and `metadata` objects are lists that contain metadata information for each trait. Eg., `chd_metadata` will have metadata information for both `ieu-a-7` and `ukb-d-I9_IHD` studies, retrieved using `ieugwasr::gwasinfo()`. 

# 2. HMGCR (3-hydroxy-3-methylglutaryl-CoA reductase) gene region

Set the gene region:
```{r}
 hmgcr_chrpos <- "5:74132993-75132993"
```


## 2.1. Meta-analysis for the chd trait

The code below performs meta-analysis including two studies (`ieu-a-7` and `ukb-d-I9_IHD`, see above) for the chd trait and the end result is one `hmgcr_chd` summaryset. First, it creates the summary sets for each of the studies followed by the dataset which includes them. Finally, performs the meta-analysis to create a new summary set.

```{r}  
  hmgcr_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = chd_id[i]), metadata = chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)   
```


## 2.2. Creating a dataset for all the traits

Here, we create an harmonised dataset named `hmgcr`, which includes all the traits in `ids`  and the `hmgcr_chd` summaryset created above. We start by creating the summarysets for each trait and then the dataset, which comprise all the summarysets (note that we are not saving the SummarySet objects separately, but filling them directly to the DataSet). Then we add the `hmgcr_chd` summaryset and harmonise trait against trait followed by each trait against a LD matrix. 


```{r}
# create an harmonised gwasglue2 DataSet object 
  hmgcr <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(hmgcr_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
```

Before starting with the fine mapping and colocalisation analyses, we are going to look at the p-values for each variant in each trait. 

```{r fig.dim = c(8, 12), fig.cap="Manhattan plot for each trait in the HMGCR region", fig.align = "center"}
# plot gwasglue2 DataSet object (hmgcr)
plot_gwasglue(hmgcr, type="manhattan", title = "HMGCR")

```
## 2.3. Finemapping with SusieR
After summary sets are harmonised, marginalise each summary set independently and create a new dataset with all marginalised summary sets merged

```{r}
# do finemapping with susie
ntraits <- getLength(hmgcr)
hmgcr_marginalised <- lapply(1:ntraits, function(trait)
  {
    # Takes in 1 SS
    # Outputs 1 DS (with at least 1 SS)
    ds <- susie_rss(R = getLDMatrix(hmgcr), summaryset = getSummarySet(hmgcr, trait))
})
```

 Now we are going to merge the datasets. Note that the `merge_datasets()` function is just going to merge the marginalised datasets. 

```{r}
hmgcr_marginalised <- merge_datasets(hmgcr_marginalised)

``` 
## 2.4. Colocolisation with hypercoloc
 Try hyprcoloc with raw datasets. (Use fork of hypercoloc that allows a dataset object to be provided)

```{r} 
res_hmgcr <- hyprcoloc(dataset = hmgcr) 
#hyprcoloc is using gwasglue2 DataSet class object as input
print(res_hmgcr)
```

Try hyprcoloc with marginalised datasets

```{r}
res_hmgcr_marginalised <- hyprcoloc(dataset = hmgcr_marginalised)
print(res_hmgcr_marginalised)
```


# 3. PCSK9 (proprotein convertase subtilisin/kexin type 9) gene region

Set the gene region:

```{r}
pcsk9_chrpos <- "1:55005221-56005221"
```

## 3.1. Meta-analysis for the chd trait

Like detailed above, in section 2.1, the code below performs meta-analysis for the chd trait and the end result is one `pcsk9_chd` summaryset. 

```{r}  
  pcsk9_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(ieugwasr::associations(variants = pcsk9_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)
```
 

## 3.2. Creating a dataset for all the traits

Like detailed in section 2.2, here we create another harmonised dataset named `pcsk9`, this time for the PCSK9 gene region.

```{r}
# create an harmonised gwasglue2 DataSet object 
  pcsk9 <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = pcsk9_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(pcsk9_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
```

Manhattan plots for each trait in the DataSet.

```{r fig.dim = c(8, 12)}
# plot gwasglue2 DataSet object (pcsk9)
plot_gwasglue(pcsk9, type="manhattan", title = "PCSK9")
```

## 3.3. Finemapping with SusieR
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged.

```{r}
# do finemapping with susie 
ntraits <- getLength(pcsk9)
pcsk9_marginalised <- lapply(1:ntraits, function(trait)
  {
    # Takes in 1 SS
    # Outputs 1 DS (with at least 1 SS)
    ds <- susie_rss(R = getLDMatrix(pcsk9), summaryset = getSummarySet(pcsk9, trait))
 }) 
```

Merge the marginalised datasets:

```{r}
pcsk9_marginalised <- merge_datasets(pcsk9_marginalised)
```

## 3.4. Colocolisation with hypercoloc
Try hyprcoloc with raw datasets. 

```{r}
res_pcsk9 <- hyprcoloc(dataset = pcsk9)
print(res_pcsk9)
```

Try hyprcoloc with marginalised datasets

```{r}
res_pcsk9_marginalised <- hyprcoloc(dataset =pcsk9_marginalised)
print(res_pcsk9_marginalised)
```

# 4. NPC1L1 (NPC1 like intracellular cholesterol transporter 1) gene region

Set the region:

```{r}
npc1l1_chrpos <- "7:44052134-45052134"
```

## 4.1. Meta-analysis for the chd trait
Like section 2.1, we'll do a meta-analysis for the chd trait and the end result is one `npc1l1_chd` summaryset. 

```{r}  
  npc1l1_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(ieugwasr::associations(variants = npc1l1_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)
```

## 4.2. Creating a dataset for all the traits
Like detailed in section 2.2, here we create another harmonised dataset named `npc1l1`, this time for the NPC1L1 gene region.

```{r}
# create an harmonised gwasglue2 DataSet object 
  npc1l1 <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = npc1l1_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(npc1l1_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
```

Manhattan plots for each trait in the DataSet.

```{r fig.dim = c(8, 12)}
# plot gwasglue2 DataSet object (npc1l1)
plot_gwasglue(npc1l1, type="manhattan", title ="NPC1L1")
```

## 4.3. Finemapping with SusieR
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged 

```{r}
ntraits <- getLength( npc1l1)
npc1l1_marginalised <- lapply(1:ntraits, function(trait)
  {
    # Takes in 1 SS
    # Outputs 1 DS (with at least 1 SS)
    ds <- susie_rss(R = getLDMatrix(npc1l1), summaryset = getSummarySet( npc1l1, trait))
}) 
```

Merge the marginalised datasets:

```{r}
npc1l1_marginalised <- merge_datasets(npc1l1_marginalised)
```

## 4.4. Colocolisation with hypercoloc
Try hyprcoloc with raw datasets. 

```{r}
res_npc1l1 <- hyprcoloc(dataset = npc1l1)
print(res_npc1l1)
```

Try hyprcoloc with marginalised datasets

```{r}
res_npc1l1_marginalised <- hyprcoloc(dataset =npc1l1_marginalised)
print(res_npc1l1_marginalised)
```

# 5. LPA (Lipoprotein(A)) gene region
Set the region:

```{r}
lpa_chrpos <- "6:160952515-161087407"
```

## 5.1. Meta-analysis for the chd trait
Like section 2.1, we'll do a meta-analysis for the chd trait and the end result is one `lpa_chd` summaryset. 

```{r}  
  lpa_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(ieugwasr::associations(variants = lpa_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)
```

## 5.2. Creating a dataset for all the traits
Like detailed in section 2.2, here we create another harmonised dataset named `lpa`, this time for the NPC1L1 gene region.

```{r}
# create an harmonised gwasglue2 DataSet object 
  lpa <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = lpa_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(lpa_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
```

Manhattan plots for each trait in the DataSet.

```{r fig.dim = c(8, 12)}
# plot gwasglue2 DataSet object (lpa)
 plot_gwasglue(lpa, type="manhattan", title = "LPA")
```
## 5.3. Finemapping with SusieR
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged

```{r}
ntraits <- getLength(lpa)
lpa_marginalised <- lapply(1:ntraits, function(trait)
{
  # Takes in 1 SS
  # Outputs 1 DS (with at least 1 SS)
  ds <- susie_rss(R = getLDMatrix(lpa), summaryset = getSummarySet(lpa, trait))
}) 
```

Merge the marginalised datasets:

```{r}
lpa_marginalised <- merge_datasets(lpa_marginalised)
```

## 5.4. Colocolisation with hypercoloc
Try hyprcoloc with raw datasets. 

```{r}
res_lpa <- hyprcoloc(dataset = lpa)
print(res_lpa)
```

Try hyprcoloc with marginalised datasets

```{r}
res_lpa_marginalised <- hyprcoloc(dataset = lpa_marginalised)
print(res_lpa_marginalised)
```