remotes::install_github('ritarasteiro/hyprcoloc', build_opts = c('--resave-data', '--no-manual'), build_vignettes = TRUE)
remotes::install_github('ritarasteiro/susieR')
#library(gwasglue2)
devtools::load_all("../") # this was added just for development
library(ieugwasr)
library(susieR) # fork that takes Summaryset
library(hyprcoloc) # fork that takes Dataset
library(dplyr)
This tutorial aims to study the use of statins, a drug to reduce cholesterol, on coronary heart disease and type 2 diabetes. There is some evidence that using statins can increase blood sugar, which can put people who use statins at higher risk of developing type 2 diabetes.
We will be generating a regional genotype-phenotype map for the
following traits: chd - coronary heart disease; ldl - low-density
lipoprotein; hdl - high-density lipoprotein; trig - triglycerides and
t2d - type 2 diabetes around different genes. In this example,
meta-analysis will be performed for the chd trait using two independent
studies ieu-a-7
and ukb-d-I9_IHD
.
First, we set the path to where the reference population plink files are located to build the LD matrix
and choose the data for each trait and set the IEU IDs. Note that we are performing a meta-analysis for the chd trait.
chd_id <- c("ieu-a-7", "ukb-d-I9_IHD")
ldl_id <- "ieu-b-110"
hdl_id <- "ieu-b-109"
trig_id <- "ieu-b-111"
t2d_id <- "ebi-a-GCST006867"
We create a chd_metadata
object for the meta-analysis we
are going to perform on the chd trait.
chd_metadata <- lapply(seq_along(chd_id), function(i){
m <- create_metadata(ieugwasr::gwasinfo(chd_id[i]))
})
and then the metadata
objects for each of the other
trait objects
ids <- c(ldl_id, hdl_id, trig_id, t2d_id)
# get metadata and create metadata objects
metadata <- lapply(seq_along(ids), function(i){
m <- create_metadata(ieugwasr::gwasinfo(ids[i]))
})
We are going to use these same chd_metadata
and
metadata
objects for each analyse for each gene.
Note, that in the code above both chd_metadata
and
metadata
objects are lists that contain metadata
information for each trait. Eg., chd_metadata
will have
metadata information for both ieu-a-7
and
ukb-d-I9_IHD
studies, retrieved using
ieugwasr::gwasinfo()
.
Set the gene region:
The code below performs meta-analysis including two studies
(ieu-a-7
and ukb-d-I9_IHD
, see above) for the
chd trait and the end result is one hmgcr_chd
summaryset.
First, it creates the summary sets for each of the studies followed by
the dataset which includes them. Finally, performs the meta-analysis to
create a new summary set.
hmgcr_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = chd_id[i]), metadata = chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
Here, we create an harmonised dataset named hmgcr
, which
includes all the traits in ids
and the
hmgcr_chd
summaryset created above. We start by creating
the summarysets for each trait and then the dataset, which comprise all
the summarysets (note that we are not saving the SummarySet objects
separately, but filling them directly to the DataSet). Then we add the
hmgcr_chd
summaryset and harmonise trait against trait
followed by each trait against a LD matrix.
# create an harmonised gwasglue2 DataSet object
hmgcr <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(hmgcr_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Before starting with the fine mapping and colocalisation analyses, we are going to look at the p-values for each variant in each trait.
After summary sets are harmonised, marginalise each summary set independently and create a new dataset with all marginalised summary sets merged
# do finemapping with susie
ntraits <- getLength(hmgcr)
hmgcr_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(hmgcr), summaryset = getSummarySet(hmgcr, trait))
})
Now we are going to merge the datasets. Note that the
merge_datasets()
function is just going to merge the
marginalised datasets.
Set the gene region:
Like detailed above, in section 2.1, the code below performs
meta-analysis for the chd trait and the end result is one
pcsk9_chd
summaryset.
pcsk9_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(ieugwasr::associations(variants = pcsk9_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
Like detailed in section 2.2, here we create another harmonised
dataset named pcsk9
, this time for the PCSK9 gene
region.
# create an harmonised gwasglue2 DataSet object
pcsk9 <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = pcsk9_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(pcsk9_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Manhattan plots for each trait in the DataSet.
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged.
# do finemapping with susie
ntraits <- getLength(pcsk9)
pcsk9_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(pcsk9), summaryset = getSummarySet(pcsk9, trait))
})
Merge the marginalised datasets:
Set the region:
Like section 2.1, we’ll do a meta-analysis for the chd trait and the
end result is one npc1l1_chd
summaryset.
npc1l1_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(ieugwasr::associations(variants = npc1l1_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
Like detailed in section 2.2, here we create another harmonised
dataset named npc1l1
, this time for the NPC1L1 gene
region.
# create an harmonised gwasglue2 DataSet object
npc1l1 <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = npc1l1_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(npc1l1_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Manhattan plots for each trait in the DataSet.
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged
ntraits <- getLength( npc1l1)
npc1l1_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(npc1l1), summaryset = getSummarySet( npc1l1, trait))
})
Merge the marginalised datasets:
Set the region:
Like section 2.1, we’ll do a meta-analysis for the chd trait and the
end result is one lpa_chd
summaryset.
lpa_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(ieugwasr::associations(variants = lpa_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
Like detailed in section 2.2, here we create another harmonised
dataset named lpa
, this time for the NPC1L1 gene
region.
# create an harmonised gwasglue2 DataSet object
lpa <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = lpa_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(lpa_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Manhattan plots for each trait in the DataSet.
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged
ntraits <- getLength(lpa)
lpa_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(lpa), summaryset = getSummarySet(lpa, trait))
})
Merge the marginalised datasets: