The gpmapr package provides a set of functions to interact with the GPMap API.
The first port of entry is the search_gpmap function.
This function allows you to search for a trait, gene, or variant.
haemoglobin <- search_gpmap("haemoglobin")
knitr::kable(haemoglobin[, c("name", "sample_size", "num_coloc_groups", "call")])| name | sample_size | num_coloc_groups | call | |
|---|---|---|---|---|
| 43431 | Glycated haemoglobin levels | 146806 | 93 | trait(915) |
| 43436 | Haemoglobin concentration | 563946 | 828 | trait(920) |
| 43438 | Mean corpuscular haemoglobin concentration | 491553 | 297 | trait(922) |
| 43447 | Haemoglobin | 408112 | 757 | trait(931) |
| 44152 | Haemoglobin levels | 445373 | 775 | trait(1636) |
| 44156 | Haemoglobin A1c levels | 437749 | 838 | trait(1640) |
| 44499 | Mean corpuscular haemoglobin | 572863 | 936 | trait(1983) |
| 45100 | Haemoglobin concentration (UKB data field 30020) | 396624 | 658 | trait(2584) |
| 45127 | Glycated haemoglobin HbA1c levels (UKB data field 30750) | 389889 | 721 | trait(2611) |
| 47275 | Haemoglobin concentration | 350474 | 497 | trait(4759) |
| 47277 | Mean corpuscular haemoglobin | 350472 | 628 | trait(4761) |
| 47278 | Mean corpuscular haemoglobin concentration | 350468 | 137 | trait(4762) |
| 52158 | Mean corpuscular haemoglobin | 430998 | 628 | trait(4761) |
| 52788 | Mean corpuscular haemoglobin concentration | 430998 | 137 | trait(4762) |
| 53046 | Glycated haemoglobin HbA1c (30750) | 430998 | 0 | trait(42778) |
| 53057 | Haemoglobin concentration | 430998 | 497 | trait(4759) |
You can see information about the different traits that match your
search, how to subsequently get more information (under
call), and metadata about the trait (under
info).
infoYou can also search for a gene (by ENSG ID or gene name) or variant (by CHR:BP or rsID).
knitr::kable(search_gpmap("SLC44A")[, c("name", "sample_size", "num_study_extractions", "num_coloc_groups", "call")])| name | sample_size | num_study_extractions | num_coloc_groups | call | |
|---|---|---|---|---|---|
| 1032 | SLC44A1 (ENSG00000070214) | NA | 206 | 9 | gene(‘SLC44A1’) |
| 5825 | SLC44A2 (ENSG00000129353) | NA | 226 | 14 | gene(‘SLC44A2’) |
| 7196 | SLC44A5 (ENSG00000137968) | NA | 91 | 9 | gene(‘SLC44A5’) |
| 7921 | SLC44A3 (ENSG00000143036) | NA | 361 | 17 | gene(‘SLC44A3’) |
| 17220 | SLC44A4 (ENSG00000204385) | NA | 107 | 12 | gene(‘SLC44A4’) |
| 19708 | SLC44A3-AS1 (ENSG00000224081) | NA | 179 | 12 | gene(‘SLC44A3-AS1’) |
| type | name | type_id | num_coloc_groups | num_coloc_studies | num_rare_results | num_study_extractions | call |
|---|---|---|---|---|---|---|---|
| original_variant | 22:37042914 T/C | 5521294 | 0 | 0 | 0 | NA | variant(5521294) |
The variant search will return the original variant and any proxy
variants that are in LD with the original variant, which also have
colocalisation results. Each variant is marked as either
original_variant or proxy_variant. In this
example, the original variant has no colocalisation results, but the
proxy variant does. The proxy variant is also marked as having a r^2 of
0.96
| type | name | type_id | num_coloc_groups | num_coloc_studies | num_rare_results | num_study_extractions | call |
|---|---|---|---|---|---|---|---|
| original_variant | 3:45579683 A/C | 5758009 | 0 | 0 | 0 | NA | variant(5758009) |
You can get more information about a trait by subsequently calling
the trait(), gene(), or variant()
functions. This will provide you with the colocalisation results, rare
disease results, and study extractions.
Each trait, gene, or variant call have optional parameters:
include_associations: Whether to include associations
(BETA, SE, P) of the variants in which it’s colocalised in the
output.include_coloc_pairs: Whether to include coloc pairs in
the output (which include h3 and h4), as opposed to just the coloc
groups.h4_threshold: The h4 threshold for coloc pairs.hemoglobin_trait <- trait(931)
names(hemoglobin_trait)
#> [1] "trait" "coloc_groups" "study_extractions"
nrow(hemoglobin_trait$coloc_groups)
#> [1] 27999
nrow(hemoglobin_trait$study_extractions)
#> [1] 1342This returns a list of elements, which you can access by name. More detailed explanations of the below are described in the API documentation.
trait: A list containing metadata about the trait,
including common and rare studies associated with the traitcoloc_groups: a dataframe containing information about
which studies have coloc results for this trait.study_extractions: a list of dataframes containing the
study extractions for this trait.rare_results: (if they exist) a list of dataframes
containing the rare results for this traitassociations: (optional) a dataframe containing the
associations for the variants in which the trait is colocalisedcoloc_pairs: (optional) a dataframe containing all
pairwise coloc results for this trait.study_extractions, coloc_groups, and
rare_results relateAll three layers refer to study extractions (GWAS
hits in finemapped LD regions), keyed by id in
study_extractions and by study_extraction_id
in coloc_groups and rare_results.
Think of study_extractions as the
catalogue of extractions linked to the trait or gene in the map: each
row is a study signal at a locus. Not every listed extraction
necessarily has a full colocalisation story or a rare-disease result
row.
coloc_groups holds extractions that
sit inside a colocalisation group (shared coloc
evidence with other studies/traits at that locus). If a study extraction
appears here, it has coloc-structured results for that entity.rare_results holds extractions that
belong to a rare-result group (rare-disease-style
finemapping in the API). The same biological signal can, in principle,
be reflected in both rare and common coloc paths, so overlap between
this set of extraction ids and those in coloc_groups is
possible.study_extractions. They are still tied to
the trait or gene in the genotype–phenotype map (for example via
locus tagging, proximity, or other association logic in
the pipeline), but they do not currently carry a coloc-group row or a
rare-result group row. Treat those as “present on the map at this locus”
rather than “absent by mistake.”The same idea applies when you call
gene() (or other entity-specific
endpoints): study_extractions is the wider catalogue, while
coloc_groups and rare_results subdivide it
where those analyses exist. The table below illustrates counts for the
hemoglobin trait example (trait(931)).
extraction_ids <- function(x) {
if (is.null(x) || !length(x)) {
return(integer(0))
}
id_col <- if ("id" %in% names(x)) "id" else NULL
if (!is.null(id_col)) {
return(unique(stats::na.omit(x[[id_col]])))
}
unique(stats::na.omit(unlist(lapply(x, function(df) {
if (!is.data.frame(df)) {
return(NULL)
}
if ("id" %in% names(df)) {
return(df[["id"]])
}
NULL
}))))
}
cg_ids <- unique(stats::na.omit(hemoglobin_trait$coloc_groups$study_extraction_id))
rr <- hemoglobin_trait$rare_results
rr_ids <- if (is.null(rr) || !length(rr)) {
integer(0)
} else {
unique(stats::na.omit(unlist(lapply(rr, function(d) {
if (!is.data.frame(d) || !"study_extraction_id" %in% names(d)) {
return(NULL)
}
d[["study_extraction_id"]]
}))))
}
se_ids <- extraction_ids(hemoglobin_trait$study_extractions)
only_study_extractions <- setdiff(se_ids, union(cg_ids, rr_ids))
knitr::kable(data.frame(
source = c(
"Distinct study_extraction_id in coloc_groups",
"Distinct study_extraction_id in rare_results",
"In study_extractions only (not in coloc_groups nor rare_results)",
"Total distinct extraction id in study_extractions"
),
n_extractions = c(
length(cg_ids),
length(rr_ids),
length(only_study_extractions),
length(se_ids)
)
))| source | n_extractions |
|---|---|
| Distinct study_extraction_id in coloc_groups | 27999 |
| Distinct study_extraction_id in rare_results | 0 |
| In study_extractions only (not in coloc_groups nor rare_results) | 585 |
| Total distinct extraction id in study_extractions | 1342 |
slc44a1 <- gene("SLC44A1")
names(slc44a1)
#> [1] "gene" "coloc_groups" "rare_results"
#> [4] "variants" "study_extractions"
nrow(slc44a1$coloc_groups)
#> [1] 80
knitr::kable(head(slc44a1$coloc_groups[, c("coloc_group_id", "trait_name", "data_type", "tissue")]))| coloc_group_id | trait_name | data_type | tissue |
|---|---|---|---|
| 94136 | SLC44A1 Skin Sun Exposed Lower leg sQTL chr9:105385502:105389033 | Splice Variant | Skin Sun Exposed Lower leg |
| 94144 | Treatment/medication code: alfuzosin | Phenotype | NA |
| 94123 | Phospholipids to total lipids ratio in medium HDL | Phenotype | NA |
| 94141 | SLC44A1 Brain (Meta) eQTL | Gene Expression | Brain (Meta) |
| 94144 | SLC44A1 Artery Aorta eQTL | Gene Expression | Artery Aorta |
| 94137 | Diagnoses - secondary ICD10: K66.0 Peritoneal adhesions | Phenotype | NA |
Note that by default, the gene endpoint includes trans genetic
effects. You can turn this off with
gene("SLC44A1", include_trans = FALSE), or filter
coloc_groups after the fact (for example
cis_trans == "cis" or a min_p cutoff).
After you subset rows, some coloc_group_id values may be
left with only a single row. Those are no longer meaningful
colocalisation groups, so drop them by keeping only
groups with more than one row:
slc44a1_no_trans <- slc44a1
slc44a1_no_trans$coloc_groups <- dplyr::filter(
slc44a1_no_trans$coloc_groups,
cis_trans == "cis"
)
# or e.g. dplyr::filter(..., min_p <= 5e-8)
slc44a1_no_trans$coloc_groups <- slc44a1_no_trans$coloc_groups |>
dplyr::group_by(coloc_group_id) |>
dplyr::filter(dplyr::n() > 1L) |>
dplyr::ungroup()
nrow(slc44a1_no_trans$coloc_groups)
#> [1] 26
knitr::kable(head(slc44a1_no_trans$coloc_groups[, c("coloc_group_id", "trait_name", "data_type", "tissue")]))| coloc_group_id | trait_name | data_type | tissue |
|---|---|---|---|
| 94136 | SLC44A1 Skin Sun Exposed Lower leg sQTL chr9:105385502:105389033 | Splice Variant | Skin Sun Exposed Lower leg |
| 94144 | SLC44A1 Artery Aorta eQTL | Gene Expression | Artery Aorta |
| 94116 | ABCA1 Blood eQTL | Gene Expression | Blood |
| 94136 | SLC44A1 Skin Sun Exposed Lower leg sQTL chr9:105385502:105438281 | Splice Variant | Skin Sun Exposed Lower leg |
| 94116 | CT70 Testis eQTL | Gene Expression | Testis |
| 94123 | SLC44A1 Blood eQTL | Gene Expression | Blood |
Although coloc groups provide a helpful overview of the
colocalisation results for a gene, you may want to look at the pairwise
relationships between the studies. You can do this by setting
include_coloc_pairs = TRUE in the call.
As there may be many coloc pairs, the results are not as fleshed out as the coloc groups. You may have to filter the results to get the information you need, or join with the study extractions to get the full picture.
slc44a1_coloc_pairs <- gene("SLC44A1", include_coloc_pairs = TRUE)
nrow(slc44a1_coloc_pairs$coloc_pairs)
#> [1] 1096
knitr::kable(head(slc44a1_coloc_pairs$coloc_pairs[, c("study_extraction_a_id", "study_extraction_b_id", "h4")]))| study_extraction_a_id | study_extraction_b_id | h4 |
|---|---|---|
| 2469380 | 2469454 | 0.9971425 |
| 2469380 | 2469455 | 0.9969158 |
| 2469454 | 2469455 | 0.9999984 |
| 2469454 | 2469769 | 0.9071751 |
| 2469454 | 2469816 | 0.8388802 |
| 2469454 | 2469956 | 0.9998948 |
study_extractions_tbl <- dplyr::bind_rows(slc44a1_no_trans$study_extractions)
study_extractions_subset <- dplyr::select(study_extractions_tbl, id, trait_name, min_p)
hydrated_coloc_pairs <- slc44a1_coloc_pairs$coloc_pairs |>
dplyr::left_join(slc44a1_coloc_pairs$study_extractions, by = c("study_extraction_a_id" = "id")) |>
dplyr::rename(trait_a = trait_name) |>
dplyr::left_join(slc44a1_coloc_pairs$study_extractions, by = c("study_extraction_b_id" = "id")) |>
dplyr::rename(trait_b = trait_name)
knitr::kable(head(hydrated_coloc_pairs[, c("trait_a", "trait_b", "h4")]))| trait_a | trait_b | h4 |
|---|---|---|
| HDL cholesterol | Apolipoprotein A1 levels | 0.9971425 |
| HDL cholesterol | HDL cholesterol levels | 0.9969158 |
| Apolipoprotein A1 levels | HDL cholesterol levels | 0.9999984 |
| Apolipoprotein A1 levels | Phospholipids to total lipids ratio in large HDL | 0.9071751 |
| Apolipoprotein A1 levels | Phospholipids to total lipids ratio in medium HDL | 0.8388802 |
| Apolipoprotein A1 levels | SLC44A1 Blood eQTL | 0.9998948 |
This section demonstrates a systematic workflow for using the GPMap to prioritise gene targets for a disease of interest. We use Alzheimer’s disease (AD) as an example.
The logic proceeds through four steps: 1. Get a list of genes mapped to AD via colocalisation. 2. Rank genes by pleiotropy score — low-pleiotropy genes are more specific to the indication. 3. Assess variant-level specificity — if a variant associates with many cis-genes, it provides weaker evidence for any single target. 4. Check consistency across multiple variants and rare results — concordant signals from independent loci and rare variant evidence strengthen the target–indication link.
We start by searching for Alzheimer’s traits and then pulling their colocalisation groups to identify the set of genes that have colocalisation evidence with AD.
ad_search <- search_gpmap("alzheimer")
ad_trait_ids <- ad_search$type_id
ad_coloc <- dplyr::bind_rows(lapply(ad_trait_ids, function(tid) {
tryCatch(trait(tid)$coloc_groups, error = function(e) NULL)
})) |>
dplyr::filter(!grepl("methyl", .data$data_type, ignore.case = TRUE))
ad_genes <- ad_coloc |>
dplyr::filter(!is.na(.data$gene_id)) |>
dplyr::distinct(.data$gene_id, .data$gene)
nrow(ad_genes)
#> [1] 228The GPMap provides a pre-computed pleiotropy score for every gene:
distinct_trait_categories counts how many different
phenotypic categories a gene’s coloc groups span. A low
value means the gene’s associations are concentrated in a narrow
phenotypic space — desirable for a drug target because it predicts fewer
off-target effects.
We can retrieve pleiotropy scores for all genes at once using
get_all_gene_pleiotropies() and join them to our AD gene
list.
all_pleiotropy <- get_all_gene_pleiotropies()
ad_genes_ranked <- ad_genes |>
dplyr::inner_join(
all_pleiotropy[, c("gene_id", "distinct_trait_categories", "distinct_protein_coding_genes")],
by = "gene_id"
) |>
dplyr::arrange(.data$distinct_trait_categories)There are a total of 176 genes mapped to Alzheimer’s disease. We can see some of the genes that are mapped to AD, and the number of colocalisation groups that each gene has.
Genes at the top of this list — those with very few distinct trait categories — are the most “specific” to Alzheimer’s disease. A gene appearing in many trait categories (high pleiotropy) is more likely to produce on-target side effects if therapeutically modulated.
Each colocalisation group is anchored to a variant (an LD block). Within a single coloc group, the same variant may map to expression or protein QTLs of multiple cis-genes. When a variant associates with many cis-genes or trait categories, it provides weaker evidence that the gene would be a good drug target without potential off-target effects.
The strongest evidence for a target-indication pair comes when multiple independent variants (different LD blocks) and rare variant results all point to the same gene. This replication across loci greatly reduces the chance that any single colocalisation is spurious.
Let’s pick some of the top-ranked genes from Step 2 and check how many independent coloc groups and rare result groups support them.
top_genes <- head(ad_genes_ranked, 40)
gene_evidence <- lapply(top_genes$gene, function(g) {
gi <- tryCatch(gene(g, include_associations = TRUE), error = function(e) NULL)
if (is.null(gi)) return(NULL)
cg <- gi$coloc_groups
rr <- gi$rare_results
ad_cg <- if (!is.null(cg) && nrow(cg) > 0) {
cg[cg$trait_id %in% ad_trait_ids, ]
} else {
data.frame()
}
ad_rr <- if (!is.null(rr) && nrow(rr) > 0) {
rr[rr$trait_id %in% ad_trait_ids, ]
} else {
data.frame()
}
n_tissues <- if (!is.null(cg) && nrow(cg) > 0 && "tissue" %in% names(cg)) {
dplyr::n_distinct(cg$tissue[!is.na(cg$tissue)])
} else {
0L
}
data.frame(
gene = g,
n_coloc_groups = dplyr::n_distinct(ad_cg$coloc_group_id),
n_tissues = n_tissues,
n_rare_groups = if (nrow(ad_rr) > 0) dplyr::n_distinct(ad_rr$rare_result_group_id) else 0L
)
}) |> dplyr::bind_rows()
ad_summary <- ad_genes_ranked |>
dplyr::filter(.data$gene %in% gene_evidence$gene) |>
dplyr::left_join(gene_evidence, by = "gene") |>
dplyr::select("gene", "distinct_trait_categories", "distinct_protein_coding_genes",
"n_tissues", "n_coloc_groups", "n_rare_groups") |>
dplyr::arrange(.data$distinct_trait_categories, .data$gene)
ad_summary_long <- ad_summary |>
tidyr::pivot_longer(
cols = c("distinct_trait_categories", "distinct_protein_coding_genes",
"n_tissues", "n_coloc_groups", "n_rare_groups"),
names_to = "metric",
values_to = "value"
) |>
dplyr::mutate(
metric = factor(.data$metric,
levels = c("distinct_trait_categories", "distinct_protein_coding_genes",
"n_tissues", "n_coloc_groups", "n_rare_groups"),
labels = c("Trait categories", "Protein-coding genes",
"Distinct tissues", "Colocalization groups", "Rare result groups")
),
gene = factor(.data$gene, levels = rev(ad_summary$gene))
)ad_summary_plot <- ggplot2::ggplot(ad_summary_long, ggplot2::aes(x = value, y = gene, fill = metric)) +
ggplot2::geom_col(show.legend = FALSE, width = 0.5) +
ggplot2::facet_wrap(~ metric, scales = "free_x", nrow = 1) +
ggplot2::labs(x = NULL, y = NULL, title = "Alzheimer's disease gene targets: pleiotropy and evidence summary") +
ggplot2::theme_bw(base_size = 9) +
ggplot2::theme(
strip.text = ggplot2::element_text(face = "bold"),
panel.grid.minor = ggplot2::element_blank(),
panel.grid.major.y = ggplot2::element_blank(),
axis.text.y = ggplot2::element_text(size = 5)
)
ad_summary_plotGenes with multiple coloc groups and supporting rare result groups have the most robust genetic evidence linking them to Alzheimer’s disease. Combined with low pleiotropy (few trait categories), these represent the highest-priority targets for further investigation.
From the table above, TREM2 stands out as a particularly interesting target for Alzheimer’s disease. It has very low pleiotropy (few trait categories), meaning its genetic associations are highly specific to AD rather than spread across many unrelated phenotypes. At the same time, it is supported by multiple coloc groups and rare result groups — independent lines of genetic evidence converging on the same gene.
Let’s examine TREM2 in more detail by looking at the direction of effect across all its AD-related colocalisation and rare variant signals.
trem2 <- gene("TREM2", include_associations = TRUE)
trem2$gene$distinct_trait_categories
#> [1] 1
trem2$gene$distinct_protein_coding_genes
#> [1] 2trem2_ad_coloc_group_ids <- unique(
trem2$coloc_groups[trem2$coloc_groups$trait_id %in% ad_trait_ids, ]$coloc_group_id
)
trem2_ad_rare_group_ids <- if (!is.null(trem2$rare_results) && nrow(trem2$rare_results) > 0) {
unique(trem2$rare_results[trem2$rare_results$trait_id %in% ad_trait_ids, ]$rare_result_group_id)
} else {
character(0)
}
trem2_ad_groups <- trem2$coloc_groups[
trem2$coloc_groups$coloc_group_id %in% trem2_ad_coloc_group_ids,
]
trem2_ad_rare <- if (length(trem2_ad_rare_group_ids) == 0L) {
trem2$rare_results[0, ]
} else {
trem2$rare_results[
trem2$rare_results$rare_result_group_id %in% trem2_ad_rare_group_ids,
]
}The forest plot below shows the effect sizes (beta with 95% CI) for all colocalisation and rare result entries related to Alzheimer’s disease at TREM2. Panels are labelled with the variant and its MAF (minor allele frequency), and colour distinguishes molecular phenotypes (eQTL, pQTL, sQTL, etc.) from traits (disease GWAS).
trem2_all_rows <- dplyr::bind_rows(
trem2_ad_groups,
if (!is.null(trem2_ad_rare) && nrow(trem2_ad_rare) > 0L) trem2_ad_rare else NULL
)
variant_maf <- trem2_all_rows |>
dplyr::filter(!is.na(.data$eaf)) |>
dplyr::mutate(maf = pmin(.data$eaf, 1 - .data$eaf)) |>
dplyr::group_by(.data$variant_id) |>
dplyr::summarise(maf = min(.data$maf, na.rm = TRUE), .groups = "drop")
make_panel_label <- function(display_snp, variant_id, maf_lookup) {
snp <- dplyr::coalesce(as.character(display_snp), paste0("variant ", as.character(variant_id)))
maf <- maf_lookup$maf[match(variant_id, maf_lookup$variant_id)]
maf_str <- ifelse(is.na(maf), "", paste0(" (MAF=", formatC(maf, format = "f", digits = 3), ")"))
paste0(snp, maf_str)
}
trem2_forest_coloc <- trem2_ad_groups |>
dplyr::filter(!is.na(.data$beta), !is.na(.data$se), .data$se > 0) |>
dplyr::mutate(
lo = .data$beta - 1.96 * .data$se,
hi = .data$beta + 1.96 * .data$se,
row_label = forest_row_label(.data$trait_name, .data$data_type, .data$tissue),
panel_id = make_panel_label(.data$display_snp, .data$variant_id, variant_maf),
data_type_label = label_data_type(.data$data_type)
) |>
dplyr::group_by(.data$coloc_group_id) |>
dplyr::arrange(.data$beta, .by_group = TRUE) |>
dplyr::mutate(row_label = factor(.data$row_label, levels = unique(.data$row_label))) |>
dplyr::ungroup()
trem2_forest_rare <- if (is.null(trem2_ad_rare) || nrow(trem2_ad_rare) == 0L) {
NULL
} else {
rr <- trem2_ad_rare
if (!"tissue" %in% names(rr)) rr$tissue <- NA_character_
if (!"eaf" %in% names(rr)) rr$eaf <- NA_real_
rr |>
dplyr::filter(!is.na(.data$beta), !is.na(.data$se), .data$se > 0) |>
dplyr::mutate(
lo = .data$beta - 1.96 * .data$se,
hi = .data$beta + 1.96 * .data$se,
row_label = forest_row_label(.data$trait_name, .data$data_type, .data$tissue),
panel_id = make_panel_label(.data$display_snp, .data$variant_id, variant_maf),
data_type_label = label_data_type(.data$data_type)
) |>
dplyr::group_by(.data$panel_id) |>
dplyr::arrange(.data$beta, .by_group = TRUE) |>
dplyr::mutate(row_label = factor(.data$row_label, levels = unique(.data$row_label))) |>
dplyr::ungroup()
}
trem2_forest <- dplyr::bind_rows(trem2_forest_coloc, trem2_forest_rare) |>
dplyr::mutate(panel_id = factor(.data$panel_id, levels = unique(.data$panel_id)))
cbf_palette <- c(
"Gene Expression" = "#0072B2",
"Splice Variant" = "#E69F00",
"Protein" = "#009E73",
"Plasma Protein" = "#56B4E9",
"Transcript" = "#CC79A7",
"Methylation" = "#F0E442",
"Metabolome" = "#D55E00",
"Cell Trait" = "#999999",
"Phenotype" = "#000000"
)
ggplot2::ggplot(trem2_forest, ggplot2::aes(x = beta, y = row_label, colour = data_type_label)) +
ggplot2::geom_vline(xintercept = 0, linetype = 2, colour = "grey55") +
ggplot2::geom_errorbarh(
ggplot2::aes(xmin = lo, xmax = hi), height = 0, linewidth = 0.35
) +
ggplot2::geom_point(size = 2) +
ggplot2::facet_wrap(~ panel_id, scales = "free_y", ncol = 1) +
ggplot2::scale_colour_manual(values = cbf_palette) +
ggplot2::labs(
x = "Effect (beta) with 95% CI", y = NULL,
colour = "Data type", title = "TREM2 \u2014 Alzheimer's disease"
) +
ggplot2::theme_bw() +
ggplot2::theme(
strip.text = ggplot2::element_text(face = "bold"),
panel.grid.minor = ggplot2::element_blank(),
legend.position = "bottom",
legend.direction = "vertical"
) +
ggplot2::guides(colour = ggplot2::guide_legend(ncol = 1))
#> Warning: `geom_errorbarh()` was deprecated in ggplot2 4.0.0.
#> ℹ Please use the `orientation` argument of `geom_errorbar()` instead.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `height` was translated to `width`.TREM2 shows colocalisation evidence from multiple independent variants, supported by rare result groups. The direction of effect appears complex — in some instances increased TREM2 expression associates with increased AD risk, while in others the direction is reversed — reflecting the biological complexity of TREM2 signalling in neuroinflammation. Nonetheless, the convergence of evidence from multiple loci and data types makes it a compelling target, and it is indeed currently under active investigation as a therapeutic target for Alzheimer’s disease.
This four-step workflow uses the GPMap to move from a broad disease search to a prioritised shortlist of gene targets:
| Step | Question | GPMap function |
|---|---|---|
| 1 | Which genes colocalise with my disease? | search_gpmap() → trait() |
| 2 | Which are most specific (least pleiotropic)? | get_all_gene_pleiotropies() |
| 3 | Do variants map cleanly to one cis-gene? | coloc_groups$gene_id per
coloc_group_id |
| 4 | Is the evidence replicated across loci? | gene() → count coloc groups, variants, rare
results |
A gene that scores well on all four criteria — colocalises with AD, has low pleiotropy, is supported by high-specificity variants, and is replicated across multiple loci including rare variants — is a strong candidate for target validation and drug development.
You can only access the summary statistics via
variant(), by including
include_summary_stats = TRUE in the call. This is to limit
the amount of data that can be returned from the API for cost reasons.
If you attempt to download too many summary statistics, you may be
blocked from the API.
variant(1669064, include_summary_stats = TRUE)Instead, if you are interested in accessing all the summary statistics, or the whole database of results, you can download all databases and summary statistics the Google Cloud Storage Bucket, and can find an explanation of the files here. Note that the bucket is ‘requester pays’, so you will need to have a Google Cloud account and be logged in to download the files.