library(gwasglue2)
library(VariantAnnotation)
library( gwasvcf)
library(ieugwasr)
devtools::load_all("../") # this was added just for development
In this tutorial, we will approach the several ways of creating the
two types of gwasglue2 objects: the SummarySet()
and the
DataSet
. Check the Strategy
page for more details on these objects.
To create a SummarySet object we first need GWAS summary data, like
the ones that can be obtained from IEU OpenGWAS through the
ieugwasr
package.
Then we use the create_summaryset()
to create the
SummarySet
.
In the code above, we did not supply any metadata information. The
function will fill the metadata using any information available in the
GWAS summary data (in data1
). It is possible to add
metadata using the metadata
argument.
To create a metadata
object, we use the
create_metadata()
function. We can give metadata
information manually using one or more argument of the function. For
example:
It is also possible for the user to add extra metadata information.
Or to provide a dataframe with metadata information, using the
metadata
argument.
Thus, we can now build a SummarySet
object with metadata
information provided. Note that data1
is a tibble and by
default create_summaryset()
reads tibbles.
Let’s try with another example, using another IEU id.
First the GWAS summary data,
and then the metadata
You will notice that there is no sample_size
information
in the metadata. This absence could cause gwasglue2 further down in the
analyses. We could the sample_size
parameter in
create_metadata()
to add the argument or add it later. We
are going to add this information after creating the
SummarySet
.
The warning tell us that there is a problem with the GWAS summary
data. Specifically, there are variants with the same chromosomal
position and alleles that have different betas, p-values and allele
frequencies. If we change the quality control parameter to
qc = TRUE
, we will let gwasglue2 to deal with this
inconsistencies in the dataset.
Now we will deal with the lack of sample size information in the
metadata. If this information was present in data2
, the
create_summaryset()
function would fill this information
automatically. But this is not the case, and thus we are going to use
the addToMetadata()
method and the ncontrol
information already present in the metadata.
getMetadata(sumset2)
sumset2 = addToMetadata(sumset2, sample_size = (getMetadata(sumset2)$ncontrol + getMetadata(sumset2)$ncases))
getMetadata(sumset2)
In the examples above, the SummarySets were created using a tibble or
more specifically the output of the ieugwasr
package. Now
we are going to use GWAS vcf files to create a SummarySet.
First we use the Bioconductor package VariantAnnotation
to read the vcf file.
Then gwasvcf
is used to query the vcf file and
converting to simple dataframes.
data_vcf <- gwasvcf::query_gwas(vcf, chrompos=c("5:74132993-75132993"))%>%
gwasvcf::vcf_to_granges() %>%
dplyr:: as_tibble()
Then we create the SummarySet using the
create_sumaryset()
function with argument
type = "vcf"
Another and simpler way of creating a SummarySet
from a
vcf file is to use the gwasvcf::gwasvcf_to_summaryset()
function:
A DataSet-class
is a list of two or more harmonised
SummarySets
.
First we need to create a list with all SummarySets
we
want to analyse,
and then use the create_dataset()
function to build the
DataSet
object.