Tutorial 5: Alternative regional genotype-phenotype map example, with meta-analyses included

remotes::install_github('ritarasteiro/hyprcoloc', build_opts = c('--resave-data', '--no-manual'), build_vignettes = TRUE) 
remotes::install_github('ritarasteiro/susieR')
#library(gwasglue2)
devtools::load_all("../") # this was added just for development
library(ieugwasr)
library(susieR)     # fork that takes Summaryset
library(hyprcoloc)  # fork that takes Dataset
library(dplyr)

This tutorial aims to study the use of statins, a drug to reduce cholesterol, on coronary heart disease and type 2 diabetes. There is some evidence that using statins can increase blood sugar, which can put people who use statins at higher risk of developing type 2 diabetes.

We will be generating a regional genotype-phenotype map for the following traits: chd - coronary heart disease; ldl - low-density lipoprotein; hdl - high-density lipoprotein; trig - triglycerides and t2d - type 2 diabetes around different genes. In this example, meta-analysis will be performed for the chd trait using two independent studies ieu-a-7 and ukb-d-I9_IHD.

First, we set the path to where the reference population plink files are located to build the LD matrix

bed_ref <- "../data/ld/EUR"

and choose the data for each trait and set the IEU IDs. Note that we are performing a meta-analysis for the chd trait.

chd_id <- c("ieu-a-7",  "ukb-d-I9_IHD")
ldl_id <- "ieu-b-110"
hdl_id <- "ieu-b-109"
trig_id <- "ieu-b-111"
t2d_id <- "ebi-a-GCST006867"

1. Setting the metadata

We create a chd_metadata object for the meta-analysis we are going to perform on the chd trait.

chd_metadata <- lapply(seq_along(chd_id), function(i){
  m <- create_metadata(ieugwasr::gwasinfo(chd_id[i])) 
})

and then the metadata objects for each of the other trait objects

ids <- c(ldl_id, hdl_id, trig_id, t2d_id)
# get metadata and create metadata objects
metadata <- lapply(seq_along(ids), function(i){
  m <- create_metadata(ieugwasr::gwasinfo(ids[i])) 
})

We are going to use these same chd_metadata and metadata objects for each analyse for each gene.

Note, that in the code above both chd_metadata and metadata objects are lists that contain metadata information for each trait. Eg., chd_metadata will have metadata information for both ieu-a-7 and ukb-d-I9_IHD studies, retrieved using ieugwasr::gwasinfo().

2. HMGCR (3-hydroxy-3-methylglutaryl-CoA reductase) gene region

Set the gene region:

 hmgcr_chrpos <- "5:74132993-75132993"

2.1. Meta-analysis for the chd trait

The code below performs meta-analysis including two studies (ieu-a-7 and ukb-d-I9_IHD, see above) for the chd trait and the end result is one hmgcr_chd summaryset. First, it creates the summary sets for each of the studies followed by the dataset which includes them. Finally, performs the meta-analysis to create a new summary set.

  hmgcr_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = chd_id[i]), metadata = chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)   

2.2. Creating a dataset for all the traits

Here, we create an harmonised dataset named hmgcr, which includes all the traits in ids and the hmgcr_chd summaryset created above. We start by creating the summarysets for each trait and then the dataset, which comprise all the summarysets (note that we are not saving the SummarySet objects separately, but filling them directly to the DataSet). Then we add the hmgcr_chd summaryset and harmonise trait against trait followed by each trait against a LD matrix.

# create an harmonised gwasglue2 DataSet object 
  hmgcr <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(hmgcr_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")

Before starting with the fine mapping and colocalisation analyses, we are going to look at the p-values for each variant in each trait.

# plot gwasglue2 DataSet object (hmgcr)
plot_gwasglue(hmgcr, type="manhattan", title = "HMGCR")

2.3. Finemapping with SusieR

After summary sets are harmonised, marginalise each summary set independently and create a new dataset with all marginalised summary sets merged

# do finemapping with susie
ntraits <- getLength(hmgcr)
hmgcr_marginalised <- lapply(1:ntraits, function(trait)
  {
    # Takes in 1 SS
    # Outputs 1 DS (with at least 1 SS)
    ds <- susie_rss(R = getLDMatrix(hmgcr), summaryset = getSummarySet(hmgcr, trait))
})

Now we are going to merge the datasets. Note that the merge_datasets() function is just going to merge the marginalised datasets.

hmgcr_marginalised <- merge_datasets(hmgcr_marginalised)

2.4. Colocolisation with hypercoloc

Try hyprcoloc with raw datasets. (Use fork of hypercoloc that allows a dataset object to be provided)

res_hmgcr <- hyprcoloc(dataset = hmgcr) 
#hyprcoloc is using gwasglue2 DataSet class object as input
print(res_hmgcr)

Try hyprcoloc with marginalised datasets

res_hmgcr_marginalised <- hyprcoloc(dataset = hmgcr_marginalised)
print(res_hmgcr_marginalised)

3. PCSK9 (proprotein convertase subtilisin/kexin type 9) gene region

Set the gene region:

pcsk9_chrpos <- "1:55005221-56005221"

3.1. Meta-analysis for the chd trait

Like detailed above, in section 2.1, the code below performs meta-analysis for the chd trait and the end result is one pcsk9_chd summaryset.

  pcsk9_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(ieugwasr::associations(variants = pcsk9_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)

3.2. Creating a dataset for all the traits

Like detailed in section 2.2, here we create another harmonised dataset named pcsk9, this time for the PCSK9 gene region.

# create an harmonised gwasglue2 DataSet object 
  pcsk9 <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = pcsk9_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(pcsk9_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")

Manhattan plots for each trait in the DataSet.

# plot gwasglue2 DataSet object (pcsk9)
plot_gwasglue(pcsk9, type="manhattan", title = "PCSK9")

3.3. Finemapping with SusieR

Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged.

# do finemapping with susie 
ntraits <- getLength(pcsk9)
pcsk9_marginalised <- lapply(1:ntraits, function(trait)
  {
    # Takes in 1 SS
    # Outputs 1 DS (with at least 1 SS)
    ds <- susie_rss(R = getLDMatrix(pcsk9), summaryset = getSummarySet(pcsk9, trait))
 }) 

Merge the marginalised datasets:

pcsk9_marginalised <- merge_datasets(pcsk9_marginalised)

3.4. Colocolisation with hypercoloc

Try hyprcoloc with raw datasets.

res_pcsk9 <- hyprcoloc(dataset = pcsk9)
print(res_pcsk9)

Try hyprcoloc with marginalised datasets

res_pcsk9_marginalised <- hyprcoloc(dataset =pcsk9_marginalised)
print(res_pcsk9_marginalised)

4. NPC1L1 (NPC1 like intracellular cholesterol transporter 1) gene region

Set the region:

npc1l1_chrpos <- "7:44052134-45052134"

4.1. Meta-analysis for the chd trait

Like section 2.1, we’ll do a meta-analysis for the chd trait and the end result is one npc1l1_chd summaryset.

  npc1l1_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(ieugwasr::associations(variants = npc1l1_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)

4.2. Creating a dataset for all the traits

Like detailed in section 2.2, here we create another harmonised dataset named npc1l1, this time for the NPC1L1 gene region.

# create an harmonised gwasglue2 DataSet object 
  npc1l1 <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = npc1l1_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(npc1l1_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")

Manhattan plots for each trait in the DataSet.

# plot gwasglue2 DataSet object (npc1l1)
plot_gwasglue(npc1l1, type="manhattan", title ="NPC1L1")

4.3. Finemapping with SusieR

Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged

ntraits <- getLength( npc1l1)
npc1l1_marginalised <- lapply(1:ntraits, function(trait)
  {
    # Takes in 1 SS
    # Outputs 1 DS (with at least 1 SS)
    ds <- susie_rss(R = getLDMatrix(npc1l1), summaryset = getSummarySet( npc1l1, trait))
}) 

Merge the marginalised datasets:

npc1l1_marginalised <- merge_datasets(npc1l1_marginalised)

4.4. Colocolisation with hypercoloc

Try hyprcoloc with raw datasets.

res_npc1l1 <- hyprcoloc(dataset = npc1l1)
print(res_npc1l1)

Try hyprcoloc with marginalised datasets

res_npc1l1_marginalised <- hyprcoloc(dataset =npc1l1_marginalised)
print(res_npc1l1_marginalised)

5. LPA (Lipoprotein(A)) gene region

Set the region:

lpa_chrpos <- "6:160952515-161087407"

5.1. Meta-analysis for the chd trait

Like section 2.1, we’ll do a meta-analysis for the chd trait and the end result is one lpa_chd summaryset.

  lpa_chd <- lapply(seq_along(chd_id), function(i){
    # create summarysets
    s <- create_summaryset(ieugwasr::associations(variants = lpa_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
  }) %>%
    # create dataset and harmonise
    create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # meta-analysis to create a new summary set
    meta_analysis(.)

5.2. Creating a dataset for all the traits

Like detailed in section 2.2, here we create another harmonised dataset named lpa, this time for the NPC1L1 gene region.

# create an harmonised gwasglue2 DataSet object 
  lpa <- lapply(seq_along(ids), function(i){
    # create summarysets
    dt <- create_summaryset(data = ieugwasr::associations(variants = lpa_chrpos, id = ids[i]), metadata = metadata[[i]])
  }) %>%
    # create dataset
    create_dataset(., harmonise = FALSE) %>% 
    # add chd summaryset and harmonise
    add_summaryset(lpa_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
    # harmonise dataset against LD matrix
    harmonise_ld(., bfile = bed_ref , plink_bin = "plink")

Manhattan plots for each trait in the DataSet.

# plot gwasglue2 DataSet object (lpa)
 plot_gwasglue(lpa, type="manhattan", title = "LPA")

5.3. Finemapping with SusieR

Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged

ntraits <- getLength(lpa)
lpa_marginalised <- lapply(1:ntraits, function(trait)
{
  # Takes in 1 SS
  # Outputs 1 DS (with at least 1 SS)
  ds <- susie_rss(R = getLDMatrix(lpa), summaryset = getSummarySet(lpa, trait))
}) 

Merge the marginalised datasets:

lpa_marginalised <- merge_datasets(lpa_marginalised)

5.4. Colocolisation with hypercoloc

Try hyprcoloc with raw datasets.

res_lpa <- hyprcoloc(dataset = lpa)
print(res_lpa)

Try hyprcoloc with marginalised datasets

res_lpa_marginalised <- hyprcoloc(dataset = lpa_marginalised)
print(res_lpa_marginalised)