--- title: "Meta functionalities of the EpiGraphDB platform" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Meta functionalities of the EpiGraphDB platform} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This RMarkdown document demonstrates how key elements from the [github notebook for the meta functionalities]( https://github.com/MRCIEU/epigraphdb/blob/master/general-examples/platform-meta-functionalities.ipynb ) can be achieved using the R package. For detailed explanations of the case study please refer to the [notebook on github]( https://github.com/MRCIEU/epigraphdb/blob/master/general-examples/platform-meta-functionalities.ipynb ). Here we show the following aspects of the EpiGraphDB platform, and how to use the API to get the information: - Metadata: meta nodes and meta edges, and the overall schema. - Search for a specific node under the meta node. - Cypher: how to query the database directly using Neo4j Cypher For detailed documentation on the API endpoints please visit: - The Swagger interface: https://api.epigraphdb.org - The sections regarding API endpoints on the documentation site: https://docs.epigraphdb.org/api/api-endpoints/ ```r library("magrittr") library("dplyr") library("purrr") #> #> Attaching package: 'purrr' #> The following object is masked from 'package:magrittr': #> #> set_names library("igraph") #> #> Attaching package: 'igraph' #> The following objects are masked from 'package:purrr': #> #> compose, simplify #> The following objects are masked from 'package:dplyr': #> #> as_data_frame, groups, union #> The following objects are masked from 'package:stats': #> #> decompose, spectrum #> The following object is masked from 'package:base': #> #> union library("epigraphdb") ``` ## Metadata Here we query for the metadata information using the endpoint `GET /meta/schema`, which will be used for downstream processing. ```r endpoint <- "/meta/schema" params <- list( graphviz = FALSE, plot = FALSE ) metadata <- query_epigraphdb( route = endpoint, params = params, mode = "raw" ) metadata %>% str(2) #> List of 3 #> $ nodes :List of 12 #> ..$ Disease :List of 2 #> ..$ Pathway :List of 2 #> ..$ LiteratureTerm :List of 2 #> ..$ Gene :List of 2 #> ..$ LiteratureTriple:List of 2 #> ..$ Literature :List of 2 #> ..$ Protein :List of 2 #> ..$ Variant :List of 2 #> ..$ Efo :List of 2 #> ..$ Tissue :List of 2 #> ..$ Drug :List of 2 #> ..$ Gwas :List of 2 #> $ edges :List of 41 #> ..$ OPENTARGETS_DRUG_TO_DISEASE :List of 2 #> ..$ MEDRXIV_SUB :List of 2 #> ..$ GENE_TO_DISEASE :List of 2 #> ..$ GWAS_NLP_EFO :List of 2 #> ..$ SEMMEDDB_PREDICATE :List of 2 #> ..$ STRING_INTERACT_WITH :List of 2 #> ..$ GWAS_TO_LITERATURE_TRIPLE :List of 2 #> ..$ PRS :List of 2 #> ..$ BIORXIV_PREDICATE :List of 2 #> ..$ MONDO_MAP_UMLS :List of 2 #> ..$ SEMMEDDB_SUB :List of 2 #> ..$ MR_EVE_MR :List of 2 #> ..$ OPENGWAS_TOPHITS :List of 2 #> ..$ VARIANT_TO_GENE :List of 2 #> ..$ MEDRXIV_PREDICATE :List of 2 #> ..$ MEDRXIV_OBJ :List of 2 #> ..$ EXPRESSED_IN :List of 2 #> ..$ BIORXIV_OBJ :List of 2 #> ..$ EFO_CHILD_OF :List of 2 #> ..$ BIORXIV_TO_LIT :List of 2 #> ..$ GEN_COR :List of 2 #> ..$ METAMAP_LITE :List of 2 #> ..$ SEMMEDDB_TO_LIT :List of 2 #> ..$ PROTEIN_IN_PATHWAY :List of 2 #> ..$ MONDO_MAP_EFO :List of 2 #> ..$ PATHWAY_CHILD_OF :List of 2 #> ..$ TERM_TO_GENE :List of 2 #> ..$ GWAS_EFO_EBI :List of 2 #> ..$ CPIC :List of 2 #> ..$ XQTL_MULTI_SNP_MR :List of 2 #> ..$ XQTL_SINGLE_SNP_MR_SNP_GENE :List of 2 #> ..$ OBS_COR :List of 2 #> ..$ GWAS_NLP :List of 2 #> ..$ XQTL_SINGLE_SNP_MR_GENE_GWAS:List of 2 #> ..$ MEDRXIV_TO_LIT :List of 2 #> ..$ BIORXIV_SUB :List of 2 #> ..$ OPENTARGETS_DRUG_TO_TARGET :List of 2 #> ..$ GWAS_TO_VARIANT :List of 2 #> ..$ GWAS_TO_LITERATURE :List of 2 #> ..$ GENE_TO_PROTEIN :List of 2 #> ..$ SEMMEDDB_OBJ :List of 2 #> $ connections:List of 41 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 #> ..$ :List of 4 ``` ### Meta nodes We can extract the specific meta node information as a dataframe from the metadata. ```r meta_node_df <- metadata %>% pluck("nodes") %>% { names <- names(.) transpose(.) %>% as_tibble() %>% mutate(meta_node = names) %>% # Hide properties column which does not display well select(meta_node, count) %>% # We also need to flatten count mutate(count = flatten_int(count)) } meta_node_df %>% arrange(meta_node) %>% mutate(count = format(count, big.mark = ",")) #> # A tibble: 12 × 2 #> meta_node count #> #> 1 Disease " 38,960" #> 2 Drug " 2,697" #> 3 Efo " 25,390" #> 4 Gene " 57,737" #> 5 Gwas " 34,494" #> 6 Literature "3,995,672" #> 7 LiteratureTerm " 108,905" #> 8 LiteratureTriple "5,609,945" #> 9 Pathway " 2,441" #> 10 Protein " 20,280" #> 11 Tissue " 54" #> 12 Variant " 99,005" ``` ### Meta relationships and connections We can also extract the meta relationship (edge) information, and the connections. ```r meta_rel_df <- metadata %>% pluck("edges") %>% { names <- names(.) transpose(.) %>% as_tibble() %>% mutate(meta_rel = names) %>% mutate(count = flatten_int(count)) %>% select(meta_rel, count) } %>% inner_join( metadata %>% pluck("connections") %>% { transpose(.) %>% as_tibble() %>% mutate(meta_rel = flatten_chr(rel)) %>% mutate_at(vars(from_node, to_node), flatten_chr) %>% select(meta_rel, from_node, to_node) } ) #> Joining, by = "meta_rel" meta_rel_df %>% arrange(from_node, to_node) %>% mutate(count = format(count, big.mark = ",")) #> # A tibble: 41 × 4 #> meta_rel count from_node to_node #> #> 1 MONDO_MAP_EFO " 2,819" Disease Efo #> 2 MONDO_MAP_UMLS " 8,247" Disease LiteratureTerm #> 3 OPENTARGETS_DRUG_TO_DISEASE " 2,461" Drug Disease #> 4 CPIC " 375" Drug Gene #> 5 OPENTARGETS_DRUG_TO_TARGET " 6,534" Drug Gene #> 6 EFO_CHILD_OF " 43,132" Efo Efo #> 7 GENE_TO_DISEASE " 5,763" Gene Disease #> 8 XQTL_MULTI_SNP_MR " 3,015,233" Gene Gwas #> 9 XQTL_SINGLE_SNP_MR_GENE_GWAS " 8,449,779" Gene Gwas #> 10 GENE_TO_PROTEIN " 19,142" Gene Protein #> # … with 31 more rows ``` ## Search for specific node Users can use [the explorer on the Web UI](https://dev.epigraphdb.org/explore) to search for a specific node by: - fuzzy matching by "name" field. - exact matching by "ID" field if you know the its ID (e.g. the ID to a GWAS from IEU GWAS Database). Here we show how these are done at the API level using `Gwas` nodes as an example. First we need to know what the "ID" and "name" fields are for the meta nodes using `GET /meta/nodes/id-name-schema`: ```r endpoint <- "/meta/nodes/id-name-schema" meta_node_fields <- query_epigraphdb( route = endpoint, params = NULL, mode = "raw" ) meta_node_fields #> $Disease #> $Disease$id #> [1] "id" #> #> $Disease$name #> [1] "label" #> #> #> $Drug #> $Drug$id #> [1] "label" #> #> $Drug$name #> [1] "label" #> #> #> $Efo #> $Efo$id #> [1] "id" #> #> $Efo$name #> [1] "value" #> #> #> $Gene #> $Gene$id #> [1] "ensembl_id" #> #> $Gene$name #> [1] "name" #> #> #> $Gwas #> $Gwas$id #> [1] "id" #> #> $Gwas$name #> [1] "trait" #> #> #> $Literature #> $Literature$id #> [1] "id" #> #> $Literature$name #> [1] "id" #> #> #> $LiteratureTerm #> $LiteratureTerm$id #> [1] "id" #> #> $LiteratureTerm$name #> [1] "name" #> #> #> $LiteratureTriple #> $LiteratureTriple$id #> [1] "id" #> #> $LiteratureTriple$name #> [1] "name" #> #> #> $Pathway #> $Pathway$id #> [1] "id" #> #> $Pathway$name #> [1] "name" #> #> #> $Protein #> $Protein$id #> [1] "uniprot_id" #> #> $Protein$name #> [1] "uniprot_id" #> #> #> $Tissue #> $Tissue$id #> [1] "id" #> #> $Tissue$name #> [1] "name" #> #> #> $Variant #> $Variant$id #> [1] "name" #> #> $Variant$name #> [1] "name" ``` ## Fuzzy matching Here we search for nodes can contain "body mass index" in their traits. ```r name <- "body mass index" endpoint <- "/meta/nodes/Gwas/search" params <- list(name = name) results <- query_epigraphdb( route = endpoint, params = params, mode = "table" ) results #> # A tibble: 10 × 18 #> node.note node._name node.year node.mr node.author node.sex node.pmid #> #> 1 Dominance model… Body mass … 2016.0 0 Wood Males a… 26961502… #> 2 Body mass … 2015.0 1 Locke AE Females 25673413… #> 3 Body mass … 2013.0 1 Randall JC Females 23754948… #> 4 Body mass … 2017.0 1 Akiyama M 28892062… #> 5 Body mass … 2018.0 1 Hoffmann TJ 30108127… #> 6 Body mass … 2019.0 1 Ishigaki K Males 28892062… #> 7 Body mass … 2015.0 1 Locke AE Males a… 25673413… #> 8 Body mass … 2015.0 1 Locke AE Males a… 25673413… #> 9 Body mass … 2015.0 1 Locke AE Males 25673413… #> 10 Body mass … 2019.0 1 Ishigaki K Males a… 28892062… #> # … with 11 more variables: node.population , node.sample_size , #> # node.nsnp , node.build , node.trait , node._source , #> # node.id , node._id , node.subcategory , node.category , #> # node.sd ``` ## Exact matching Similarly, we can exact match a specific node by its ID. ```r id <- "ieu-a-2" endpoint <- "/meta/nodes/Gwas/search" params <- list(id = id) results <- query_epigraphdb( route = endpoint, params = params, mode = "table" ) results #> # A tibble: 1 × 17 #> node._name node.year node.mr node.author node.sex node.pmid node.population #> #> 1 Body mass … 2015.0 1 Locke AE Males and… 25673413… Mixed #> # … with 10 more variables: node.sd , node.sample_size , #> # node.nsnp , node.build , node.trait , node._source , #> # node.id , node._id , node.subcategory , node.category ``` ## Cypher (advanced) Advanced users that are familiar with Neo4j Cypher can query the database using Cypher directly. ```r query <- " MATCH (exposure:Gwas)-[mr:MR]->(outcome:Gwas) WHERE exposure.trait = 'Body mass index' RETURN exposure, outcome, mr LIMIT 2 " endpoint <- "/cypher" params <- list(query = query) # NOTE this is a POST request results <- query_epigraphdb( route = endpoint, params = params, method = "POST", mode = "table" ) results #> # A tibble: 0 × 0 ``` ## `sessionInfo` ```r sessionInfo() #> R version 4.1.2 (2021-11-01) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 20.04.3 LTS #> #> Matrix products: default #> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so #> #> locale: #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C #> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 #> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 #> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C #> [9] LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] igraph_1.2.11 purrr_0.3.4 magrittr_2.0.1 epigraphdb_0.2.3 #> [5] dplyr_1.0.7 #> #> loaded via a namespace (and not attached): #> [1] knitr_1.37 tidyselect_1.1.1 R6_2.5.1 rlang_0.4.12 #> [5] fansi_1.0.0 stringr_1.4.0 httr_1.4.2 tools_4.1.2 #> [9] xfun_0.29 utf8_1.2.2 cli_3.1.0 DBI_1.1.2 #> [13] ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.6 lifecycle_1.0.1 #> [17] crayon_1.4.2 vctrs_0.3.8 curl_4.3.2 glue_1.6.0 #> [21] evaluate_0.14 stringi_1.7.6 compiler_4.1.2 pillar_1.6.4 #> [25] generics_0.1.1 jsonlite_1.7.2 pkgconfig_2.0.3 ```