| Title: | Query the OpenGWAS genotype-phenotype map |
|---|---|
| Description: | This package is a simple wrapper around the OpenGWAS genotype-phenotype map API. |
| Authors: | Gibran Hemani [aut, cre] (ORCID: <https://orcid.org/0000-0003-0920-1055>) |
| Maintainer: | Gibran Hemani <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.1.0 |
| Built: | 2026-06-03 13:30:37 UTC |
| Source: | https://github.com/MRCIEU/gpmapr |
Get all genes from the API
all_genes()all_genes()
A dataframe containing all genes with the following columns:
id: the id of the gene
gene: the name of the gene
description: the description of the gene
gene_biotype: the gene biotype
chr: the chromosome of the gene
start: the start position of the gene
stop: the end position of the gene
strand: the strand of the gene
source: the source of the gene
distinct_trait_categories: the number of trait categories that the gene is associated with via coloc groups
distinct_protein_coding_genes: the number of genes that the gene is associated with via coloc groups
num_study_extractions: the number of study extractions for this gene
num_coloc_groups: the number of coloc groups for this gene
num_coloc_studies: the number of studies that have coloc results for this gene
num_rare_groups: the number of rare groups for this gene
Get all traits from the API
all_traits()all_traits()
A dataframe containing all traits with the following columns:
id: the id of the trait
data_type: the data type of the trait
trait: the internal string id of the trait
trait_name: the name of the trait
trait_category: the trait category of the trait
variant_type: the type of variant
sample_size: the sample size of the trait
category: the category of the trait (continuous, categorical)
ancestry: the ancestry of the trait
heritability: the LDSC heritability score of the trait
heritability_se: the standard error of the LDSC heritability score of the trait
num_study_extractions: the number of study extractions for this trait
num_coloc_groups: the number of coloc groups for this trait
num_coloc_studies: the number of studies that have coloc results for this trait
num_rare_results: the number of rare results for this trait
Get associations from the API by SNP id and study id
associations(variant_ids, study_ids)associations(variant_ids, study_ids)
variant_ids |
A vector of numeric values specifying the SNP IDs |
study_ids |
A vector of numeric values specifying the Study IDs |
A dataframe containing the associations
The associations dataframe contains information about which studies have association results. It has the following columns:
variant_id: the id of the SNP associated with this association
study_id: the id of the study associated with this association
beta: the beta value of the association
se: the standard error of the association
p: the p-value of the association
eaf: the estimated allele frequency of the association
imputed: whether the association is imputed
A collection of studies that are associated with a particular gene.
gene( gene_id, include_associations = FALSE, include_coloc_pairs = FALSE, include_trans = TRUE, h4_threshold = 0.8 )gene( gene_id, include_associations = FALSE, include_coloc_pairs = FALSE, include_trans = TRUE, h4_threshold = 0.8 )
gene_id |
A numeric value specifying the gene id |
include_associations |
A logical value specifying whether to include associations (BETA, SE, P), defaults to FALSE |
include_coloc_pairs |
A logical value specifying whether to include coloc pairs, defaults to FALSE |
include_trans |
A logical value specifying whether to include trans genetic effects, defaults to TRUE |
h4_threshold |
A numeric value specifying the h4 threshold for coloc pairs, defaults to 0.8 |
The dataframes returned by this function are as follows:
A list which contains the following elements:
gene: A list containing metadata about the gene, including region, and neighboring genes.
coloc_groups: a dataframe containing information about which studies have coloc results for this gene. See below for details.
study_extractions: a list of dataframes containing the study extractions for this trait. See below for details.
rare_results: (optional) a list of dataframes containing the rare results for this trait
coloc_pairs: (optional) a dataframe containing all pairwise coloc results for this trait.
variants: a dataframe containing the variants for each associated coloc group or rare group.
See below for details.
The coloc_groups dataframe contains information about which studies have coloc results. It has the following columns:
coloc_group_id: the unique id for this group of colocalised results
study_id: the id of the study
study_extraction_id: the id of the study extraction
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The study_extractions dataframe contains information about which studies have coloc results. It has the following columns:
id: the unique id for this study extraction
study_id: the id of the study associated with this study extraction
variant_id: the id of the SNP
snp: the SNP name
ld_block_id: the id of the LD block
unique_study_id: the unique id for this study
study: the study name
file: the file name
svg_file: the SVG file name
file_with_lbfs: the file name with lbfs
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The rare_results dataframe contains information about which studies have coloc results. It has the following columns:
rare_result_group_id: the unique id for this rare result group
study_id: the id of the study associated with this rare result
study_extraction_id: the id of the study extraction associated with this rare result
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
ld_block: the LD block of the SNP
The coloc_pairs dataframe contains information about which studies have coloc pairs. It has the following columns:
study_extraction_a_id: the id of the study extraction associated with this coloc pair
study_extraction_b_id: the id of the study extraction associated with this coloc pair
ld_block_id: the id of the LD block
h3: the h3 value for this coloc pair
h4: the h4 value for this coloc pair
spurious: whether this coloc pair is spurious
The variants dataframe contains variant information that is pulled from the Variant Effect Predictor (VEP) database. It has the following columns, along side many more columns from VEP:
id: the id of the SNP
gene_id: the id of the gene as predicted by VEP
gene: the gene name as predicted by VEP
Get specific genes from the API. The API returns collapsed/combined data for all requested genes.
genes( gene_ids, include_associations = FALSE, include_coloc_pairs = FALSE, include_trans = TRUE, h4_threshold = 0.8 )genes( gene_ids, include_associations = FALSE, include_coloc_pairs = FALSE, include_trans = TRUE, h4_threshold = 0.8 )
gene_ids |
A vector of gene ids (1 or more) |
include_associations |
A logical value specifying whether to include associations (BETA, SE, P), defaults to FALSE |
include_coloc_pairs |
A logical value specifying whether to include coloc pairs, defaults to FALSE |
include_trans |
A logical value specifying whether to include trans genetic effects, defaults to TRUE |
h4_threshold |
A numeric value specifying the h4 threshold for coloc pairs, defaults to 0.8 |
The dataframes returned by this function are as follows:
A list which contains the following elements:
genes: gene metadata for the requested genes
coloc_groups: a dataframe containing information about which studies have coloc results for all genes
study_extractions: a dataframe containing the study extractions for all genes
rare_results: a dataframe containing the rare results for all genes
The coloc_groups dataframe contains information about which studies have coloc results. It has the following columns:
coloc_group_id: the unique id for this group of colocalised results
study_id: the id of the study
study_extraction_id: the id of the study extraction
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The study_extractions dataframe contains information about which studies have coloc results. It has the following columns:
id: the unique id for this study extraction
study_id: the id of the study associated with this study extraction
variant_id: the id of the SNP
snp: the SNP name
ld_block_id: the id of the LD block
unique_study_id: the unique id for this study
study: the study name
file: the file name
svg_file: the SVG file name
file_with_lbfs: the file name with lbfs
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The rare_results dataframe contains information about which studies have coloc results. It has the following columns:
rare_result_group_id: the unique id for this rare result group
study_id: the id of the study associated with this rare result
study_extraction_id: the id of the study extraction associated with this rare result
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
ld_block: the LD block of the SNP
The coloc_pairs dataframe contains information about which studies have coloc pairs. It has the following columns:
study_extraction_a_id: the id of the study extraction associated with this coloc pair
study_extraction_b_id: the id of the study extraction associated with this coloc pair
ld_block_id: the id of the LD block
h3: the h3 value for this coloc pair
h4: the h4 value for this coloc pair
spurious: whether this coloc pair is spurious
Get gene pleiotropy from the API by gene id
get_all_gene_pleiotropies()get_all_gene_pleiotropies()
A list containing the gene pleiotropy
gene_id: the id of the gene
gene: the name of the gene
distinct_trait_categories: the number of trait categories that the gene is associated with via coloc groups
distinct_protein_coding_genes: the number of genes that the gene is associated with via coloc groups
Get all SNP pleiotropies from the API
get_all_variant_pleiotropies()get_all_variant_pleiotropies()
A list containing the SNP pleiotropies
variant_id: the id of the SNP
display_snp: the name of the SNP
distinct_trait_categories: the number of trait categories that the SNP is associated with via coloc groups
distinct_protein_coding_genes: the number of genes that the SNP is associated with via coloc groups
Get a GWAS from the API
get_gwas(gwas_id, include_associations = FALSE, include_summary_stats = FALSE)get_gwas(gwas_id, include_associations = FALSE, include_summary_stats = FALSE)
gwas_id |
The ID of the GWAS |
include_associations |
Whether to include associations |
include_summary_stats |
Whether to include summary statistics |
A list containing the GWAS information
Get the health status of the API
health_api()health_api()
A list containing the health status
Get LD matrix from the API by Variant ID
ld_matrix(variant_ids = c())ld_matrix(variant_ids = c())
variant_ids |
A character string specifying the Variant ID. Variant IDs can be SNP IDs or variant IDs. |
A list containing the LD matrix
The ld dataframe contains information about the LD matrix. It has the following columns:
lead_variant_id: the id of the lead SNP
proxy_variant_id: the id of the variant SNP
ld_block_id: the id of the LD block
r: the r value between the lead and variant SNPs
Get LD proxies from the API by Variant ID
ld_proxies(variant_ids = c())ld_proxies(variant_ids = c())
variant_ids |
A character string specifying the Variant ID. Variant IDs can be SNP IDs or variant IDs. |
A list containing the LD proxies
The ld dataframe contains information about the LD matrix. It has the following columns:
lead_variant_id: the id of the lead SNP
proxy_variant_id: the id of the variant SNP
ld_block_id: the id of the LD block
r: the r value between the lead and variant SNPs
A collection of studies that are associated with a particular region.
region( region_id, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )region( region_id, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )
region_id |
A numeric value specifying the region id |
include_associations |
A logical value specifying whether to include associations (BETA, SE, P), defaults to FALSE |
include_coloc_pairs |
A logical value specifying whether to include coloc pairs, defaults to FALSE |
h4_threshold |
A numeric value specifying the h4 threshold for coloc pairs, defaults to 0.8 |
The dataframes returned by this function are as follows:
A list which contains the following elements:
gene: A list containing metadata about the gene, including region, and neighboring genes.
coloc_groups: a dataframe containing information about which studies have coloc results for this gene. See below for details.
study_extractions: a list of dataframes containing the study extractions for this trait. See below for details.
rare_results: (optional) a list of dataframes containing the rare results for this trait
coloc_pairs: (optional) a dataframe containing all pairwise coloc results for this trait.
variants: a dataframe containing the variants for each associated coloc group or rare group.
See below for details.
The coloc_groups dataframe contains information about which studies have coloc results. It has the following columns:
coloc_group_id: the unique id for this group of colocalised results
study_id: the id of the study
study_extraction_id: the id of the study extraction
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The genes_in_region dataframe contains information about which genes are in a region. It has the following columns:
id: the id of the gene
ensembl_id: the ensembl id of the gene
gene: the name of the gene
description: the description of the gene
gene_biotype: the gene biotype
chr: the chromosome of the gene
start: the start position of the gene
stop: the stop position of the gene
strand: the strand of the gene
source: the source of the gene
The study_extractions dataframe contains information about which studies have coloc results. It has the following columns:
id: the unique id for this study extraction
study_id: the id of the study associated with this study extraction
variant_id: the id of the SNP
snp: the SNP name
ld_block_id: the id of the LD block
unique_study_id: the unique id for this study
study: the study name
file: the file name
svg_file: the SVG file name
file_with_lbfs: the file name with lbfs
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The rare_results dataframe contains information about which studies have coloc results. It has the following columns:
rare_result_group_id: the unique id for this rare result group
study_id: the id of the study associated with this rare result
study_extraction_id: the id of the study extraction associated with this rare result
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
ld_block: the LD block of the SNP
The coloc_pairs dataframe contains information about which studies have coloc pairs. It has the following columns:
study_extraction_a_id: the id of the study extraction associated with this coloc pair
study_extraction_b_id: the id of the study extraction associated with this coloc pair
ld_block_id: the id of the LD block
h3: the h3 value for this coloc pair
h4: the h4 value for this coloc pair
spurious: whether this coloc pair is spurious
The variants dataframe contains variant information that is pulled from the Variant Effect Predictor (VEP) database. It has the following columns, along side many more columns from VEP:
id: the id of the SNP
gene_id: the id of the gene as predicted by VEP
gene: the gene name as predicted by VEP
Search the GP Map for Traits, Genes or Variants
search_gpmap(search_text, rsquared_threshold = 0.8)search_gpmap(search_text, rsquared_threshold = 0.8)
search_text |
A character string specifying the search text |
rsquared_threshold |
A numeric value specifying the rsquared threshold for proxy variants, defaults to 0.8 |
After calling search, you can use call the subsequent data as described in the call column of the
search results.
A dataframe containing the search results with the following columns:
type: the type of the search result: "original_variant", "proxy_variant", "trait", "gene"
name: the name of the search result
type_id: the type_id of the search result. This is the internal id in which the data can be accessed.
call: the call to get the search result: "variant(type_id)", "trait(type_id)", "gene(type_id)"
info: a string containing informaiton about the search result, which may include:
Extractions: the number of extractions
Colocalisation Groups: the number of colocalisation groups
Colocalisation Studies: the number of colocalisation studies
Rare Results: the number of rare results
Rsquared: the rsquared of the proxy variant compared to the original variant
A collection of studies that are associated with a particular phenotype. A trait will include a common study and occasionally a rare study. When trait_id is a GUID (from GWAS upload), fetches the upload result instead.
trait( trait_id, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )trait( trait_id, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )
trait_id |
A numeric value or GUID (from GWAS upload) specifying the trait id |
include_associations |
A logical value specifying whether to include associations (BETA, SE, P), defaults to FALSE |
include_coloc_pairs |
A logical value specifying whether to include coloc pairs, defaults to FALSE |
h4_threshold |
A numeric value specifying the h4 threshold for coloc pairs, defaults to 0.8 |
The dataframes returned by this function are as follows:
A list which contains the following elements:
trait: A list containing metadata about the trait, including common and rare studies associated with the trait
coloc_groups: a dataframe containing information about which studies have coloc results for this trait. See below for details.
study_extractions: a list of dataframes containing the study extractions for this trait. See below for details.
rare_results: (optional) a list of dataframes containing the rare results for this trait
coloc_pairs: (optional) a dataframe containing all pairwise coloc results for this trait.
See below for details.
The coloc_groups dataframe contains information about which studies have coloc results. It has the following columns:
coloc_group_id: the unique id for this group of colocalised results
study_id: the id of the study
study_extraction_id: the id of the study extraction
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The study_extractions dataframe contains information about which studies have coloc results. It has the following columns:
id: the unique id for this study extraction
study_id: the id of the study associated with this study extraction
variant_id: the id of the SNP
snp: the SNP name
ld_block_id: the id of the LD block
unique_study_id: the unique id for this study
study: the study name
file: the file name
svg_file: the SVG file name
file_with_lbfs: the file name with lbfs
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The rare_results dataframe contains information about which studies have coloc results. It has the following columns:
rare_result_group_id: the unique id for this rare result group
study_id: the id of the study associated with this rare result
study_extraction_id: the id of the study extraction associated with this rare result
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
ld_block: the LD block of the SNP
The coloc_pairs dataframe contains information about which studies have coloc pairs. It has the following columns:
study_extraction_a_id: the id of the study extraction associated with this coloc pair
study_extraction_b_id: the id of the study extraction associated with this coloc pair
ld_block_id: the id of the LD block
h3: the h3 value for this coloc pair
h4: the h4 value for this coloc pair
spurious: whether this coloc pair is spurious
The variants dataframe contains variant information that is pulled from the Variant Effect Predictor (VEP) database. It has the following columns, along side many more columns from VEP:
id: the id of the SNP
gene_id: the id of the gene as predicted by VEP
gene: the gene name as predicted by VEP
Get specific traits from the API. The API returns collapsed/combined data for all requested traits. When a trait ID is a GUID (from GWAS upload), fetches the upload result instead.
traits( trait_ids, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )traits( trait_ids, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )
trait_ids |
A vector of trait ids (numeric) or GUIDs (from GWAS upload) |
include_associations |
A logical value specifying whether to include associations (BETA, SE, P), defaults to FALSE |
include_coloc_pairs |
A logical value specifying whether to include coloc pairs, defaults to FALSE. Coloc pairs are fetched from a separate endpoint per trait. |
h4_threshold |
A numeric value specifying the h4 threshold for coloc pairs, defaults to 0.8 |
The dataframes returned by this function are as follows:
A list which contains the following elements:
traits: trait metadata for the requested traits
coloc_groups: a dataframe containing information about which studies have coloc results for all traits. See below for details.
study_extractions: a dataframe containing the study extractions for all traits. See below for details.
rare_results: a dataframe containing the rare results for all traits
coloc_pairs: (optional) a dataframe containing all pairwise coloc results for all traits.
The coloc_groups dataframe contains information about which studies have coloc results. It has the following columns:
coloc_group_id: the unique id for this group of colocalised results
study_id: the id of the study
study_extraction_id: the id of the study extraction
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The study_extractions dataframe contains information about which studies have coloc results. It has the following columns:
id: the unique id for this study extraction
study_id: the id of the study associated with this study extraction
variant_id: the id of the SNP
snp: the SNP name
ld_block_id: the id of the LD block
unique_study_id: the unique id for this study
study: the study name
file: the file name
svg_file: the SVG file name
file_with_lbfs: the file name with lbfs
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The rare_results dataframe contains information about which studies have coloc results. It has the following columns:
rare_result_group_id: the unique id for this rare result group
study_id: the id of the study associated with this rare result
study_extraction_id: the id of the study extraction associated with this rare result
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
ld_block: the LD block of the SNP
The coloc_pairs dataframe contains information about which studies have coloc pairs. It has the following columns:
study_extraction_a_id: the id of the study extraction associated with this coloc pair
study_extraction_b_id: the id of the study extraction associated with this coloc pair
ld_block_id: the id of the LD block
h3: the h3 value for this coloc pair
h4: the h4 value for this coloc pair
spurious: whether this coloc pair is spurious
Upload a GWAS to the API
upload_gwas( file, name, p_value_threshold = 5e-08, column_names = list(), email = NA, category = "continuous", is_published = FALSE, doi = NA, should_be_added = FALSE, ancestry = "EUR", sample_size = NA, reference_build = "GRCh38", compare_with_upload_guids = NA )upload_gwas( file, name, p_value_threshold = 5e-08, column_names = list(), email = NA, category = "continuous", is_published = FALSE, doi = NA, should_be_added = FALSE, ancestry = "EUR", sample_size = NA, reference_build = "GRCh38", compare_with_upload_guids = NA )
file |
The path to the GWAS file, maximum size is 1GB |
name |
The name of the GWAS |
p_value_threshold |
The p-value threshold for the GWAS |
column_names |
A list of column names in the format of: list(CHR = "chr", BP = "pos"...)
|
email |
The email of the user |
category |
The category of the GWAS. Only "continuous" and "categorical" are accepted. |
is_published |
Whether the GWAS is published |
doi |
The DOI of the GWAS |
should_be_added |
Whether the GWAS should be added to the API |
ancestry |
The ancestry of the GWAS. Currently only "EUR" is accepted. |
sample_size |
The sample size of the GWAS |
reference_build |
The reference build of the GWAS. Only "GRCh37" and "GRCh38" are accepted. |
compare_with_upload_guids |
A vector of GUIDs of uploads to compare with |
A list containing the GWAS information
A collection of studies that are associated with a particular variant.
variant( variant_id, include_coloc_pairs = FALSE, h4_threshold = 0.8, include_summary_stats = FALSE )variant( variant_id, include_coloc_pairs = FALSE, h4_threshold = 0.8, include_summary_stats = FALSE )
variant_id |
A character string specifying the SNP ID |
include_coloc_pairs |
A logical value specifying whether to include coloc pairs |
h4_threshold |
A numeric value specifying the cutoff for included coloc pairs, defaults to 0.8. Only used if include_coloc_pairs is TRUE. |
include_summary_stats |
A logical value specifying whether to include summary stats |
The dataframes returned by this function are as follows:
A list which contains the following elements:
variant: named list containing the variant information
coloc_groups: a dataframe containing information about which studies have coloc results for this variant
rare_results: a list of dataframes containing the rare variants
study_extractions: a list of dataframes containing the study extractions
summary_stats (optional): a list of dataframes containing the summary stats for each study, where the name of each element is the study_id. Column names are uppercase (e.g. SNP, BP, BETA, SE, LBF_1).
coloc_pairs (optional): a dataframe containing information about which studies have coloc pairs for this variant where the study_extraction_a_id and study_extraction_b_id are the study_extraction_ids of the two studies. h4_threshold is the cutoff for included coloc pairs, defaults to 0.8
The coloc_groups dataframe contains information about which studies have coloc results. It has the following columns:
coloc_group_id: the unique id for this group of colocalised results
study_id: the id of the study
study_extraction_id: the id of the study extraction
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The rare_results dataframe contains information about which studies have coloc results. It has the following columns:
rare_result_group_id: the unique id for this rare result group
study_id: the id of the study associated with this rare result
study_extraction_id: the id of the study extraction associated with this rare result
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
ld_block: the LD block of the SNP
The study_extractions dataframe contains information about which studies have coloc results. It has the following columns:
id: the unique id for this study extraction
study_id: the id of the study associated with this study extraction
variant_id: the id of the SNP
snp: the SNP name
ld_block_id: the id of the LD block
unique_study_id: the unique id for this study
study: the study name
file: the file name
svg_file: the SVG file name
file_with_lbfs: the file name with lbfs
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The summary_statistics dataframe contains information about which studies have summary statistics. From the API, column names are typically uppercase (SNP, CHR, BP, EA, OA, EAF, Z, BETA, SE, P, LBF_1, etc.). It has the following columns (names may be upper or lower case depending on source):
SNP / variant_id: the id of the SNP
CHR / chr: the chromosome of the SNP
BP / bp: the base pair position of the SNP
EA / ea: the effect allele
OA / oa: the other allele
EAF / eaf: the estimated allele frequency
Z / z: the z-score
BETA / beta: the beta value
SE / se: the standard error
P / p: the p-value
imputed: whether the summary statistics are imputed
LBF_* / lbf_*: all different finemapped log-bayes factors for each credible set. Each credible set is numbered from 1 to 10. If finemapped failed or only returned 1 credible set, the LBF_1 column is just converted directly from the z-score.
The coloc_pairs dataframe contains information about which studies have coloc pairs. It has the following columns:
study_extraction_a_id: the id of the study extraction associated with this coloc pair
study_extraction_b_id: the id of the study extraction associated with this coloc pair
ld_block_id: the id of the LD block
h3: the h3 value for this coloc pair
h4: the h4 value for this coloc pair
spurious: whether this coloc pair is spurious
Get specific variants from the API. The API accepts variant identifiers (variant_ids, rsids, or strings) and returns collapsed/combined data. The API distinguishes between identifier types automatically. Max 10 variants when expand=TRUE.
variants( variants, expand = FALSE, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )variants( variants, expand = FALSE, include_associations = FALSE, include_coloc_pairs = FALSE, h4_threshold = 0.8 )
variants |
A vector of variant identifiers (variant_ids, rsids, or strings) |
expand |
Logical. FALSE (default) returns minimal data. TRUE returns full VariantResponse (max 10) |
include_associations |
Logical. Whether to include associations (BETA, SE, P). Only when expand=TRUE |
include_coloc_pairs |
Logical. Whether to include coloc pairs. Only when expand=TRUE |
h4_threshold |
Numeric. H4 threshold for coloc pairs, defaults to 0.8 |
The dataframes returned by this function are as follows:
A list which contains the following elements:
variants: a dataframe containing the variants for all requested variants
coloc_groups: (if expanded) a dataframe containing the coloc groups for all variants
study_extractions: (if expanded) a dataframe containing the study extractions for all variants
rare_results: (if expanded) a dataframe containing the rare results for all variants
The coloc_groups dataframe contains information about which studies have coloc results. It has the following columns:
coloc_group_id: the unique id for this group of colocalised results
study_id: the id of the study
study_extraction_id: the id of the study extraction
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The study_extractions dataframe contains information about which studies have coloc results. It has the following columns:
id: the unique id for this study extraction
study_id: the id of the study associated with this study extraction
variant_id: the id of the SNP
snp: the SNP name
ld_block_id: the id of the LD block
unique_study_id: the unique id for this study
study: the study name
file: the file name
svg_file: the SVG file name
file_with_lbfs: the file name with lbfs
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
cis_trans: the cis/trans status of the SNP
ld_block: the LD block of the SNP
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
The rare_results dataframe contains information about which studies have coloc results. It has the following columns:
rare_result_group_id: the unique id for this rare result group
study_id: the id of the study associated with this rare result
study_extraction_id: the id of the study extraction associated with this rare result
variant_id: the id of the SNP
ld_block_id: the id of the LD block
chr: the chromosome of the SNP
bp: the base pair position of the SNP
min_p: the minimum p-value related to the study_extraction_id
display_snp: the display SNP name
gene: the gene associated with the SNP
gene_id: the id of the gene
trait_id: the id of the trait
trait_name: the name of the trait
trait_category: the category of the trait
data_type: the data type of the trait
tissue: the tissue of the trait
ld_block: the LD block of the SNP
The summary_statistics dataframe contains information about which studies have summary statistics. From the API, column names are typically uppercase (SNP, CHR, BP, EA, OA, EAF, Z, BETA, SE, P, LBF_1, etc.). It has the following columns (names may be upper or lower case depending on source):
SNP / variant_id: the id of the SNP
CHR / chr: the chromosome of the SNP
BP / bp: the base pair position of the SNP
EA / ea: the effect allele
OA / oa: the other allele
EAF / eaf: the estimated allele frequency
Z / z: the z-score
BETA / beta: the beta value
SE / se: the standard error
P / p: the p-value
imputed: whether the summary statistics are imputed
LBF_* / lbf_*: all different finemapped log-bayes factors for each credible set. Each credible set is numbered from 1 to 10. If finemapped failed or only returned 1 credible set, the LBF_1 column is just converted directly from the z-score.
The coloc_pairs dataframe contains information about which studies have coloc pairs. It has the following columns:
study_extraction_a_id: the id of the study extraction associated with this coloc pair
study_extraction_b_id: the id of the study extraction associated with this coloc pair
ld_block_id: the id of the LD block
h3: the h3 value for this coloc pair
h4: the h4 value for this coloc pair
spurious: whether this coloc pair is spurious