| Title: | Import EBI Data to OpenGWAS |
|---|---|
| Description: | Determine the new EBI data not present in OpenGWAS. Download dataset and import metadata. Upload processed data and metadata to OpenGWAS. |
| Authors: | Gibran Hemani [aut, cre] (ORCID: <https://orcid.org/0000-0003-0920-1055>), Philip Haycock [aut] (ORCID: <https://orcid.org/0000-0001-5001-3350>), Tom Palmer [aut] (ORCID: <https://orcid.org/0000-0003-4655-4511>) |
| Maintainer: | Gibran Hemani <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.1 |
| Built: | 2026-05-29 19:37:10 UTC |
| Source: | https://github.com/MRCIEU/GwasDataImport |
List of EBI datasets that are currently being processed
being_processed(dat)being_processed(dat)
dat |
Output from |
Updated dat
Create dataset of some hm3 SNPs and their build positions
create_build_reference()create_build_reference()
saves build_ref object
Slow to create these so just make once and save
create_marts()create_marts()
Saves data to data/marts.rdata
Object that downloads, develops and uploads GWAS summary datasets for IEU OpenGWAS database
filenamePath to raw GWAS summary dataset
igd_idID to use for upload. If NULL then the next available ID in batch ieu-b will be used automatically
wdWork directory in which to save processed files. Will be deleted upon completion
gwas_outpath to processed summary file
nsnp_readNumber of SNPs read initially
nsnpNumber of SNPs retained after reading
metadataList of meta-data entries
metadata_testList of outputs from tests of the effect allele, effect allele frequency columns and summary data using CheckSumStats
metadata_filePath to meta-data json file
datainfoList of GWAS file parameters
datainfo_filePath to datainfo json file
paramsInitial column identifiers specified for raw dataset
metadata_uploadedTRUE/FALSE of whether the metadata has been uploaded
gwasdata_uploadedTRUE/FALSE of whether the gwas data has been uploaded
metadata_upload_statusResponse from server about upload process
gwasdata_upload_statusResponse from server about upload process
new()
Initialise
Dataset$new(filename = NULL, wd = tempdir(), igd_id = NULL)
filenamePath to raw GWAS summary data file
wdPath to directory to use as the working directory. Will be deleted upon completion - best to keep as the default randomly generated temporary directory
igd_idOption to provide a specified ID for upload. If none provided then will use the next ieu-a batch ID
new ObtainEbiDataset object
is_new_id()
Check if the specified ID is unique within the database. It checks published GWASs and those currently being processed
Dataset$is_new_id(id = self$igd_id)
idID to check
delete_wd()
Delete working directory
Dataset$delete_wd()
set_wd()
Set working directory (creates)
Dataset$set_wd(wd)
wdworking directory
se_from_bp()
Estimate standard error from beta and p-value
Dataset$se_from_bp(beta, pval, minp = 1e-300)
betaEffect size
pvalp-value
minpMinimum p-value cutoff default = 1e-300
determine_columns()
Specify which columns in the dataset correspond to which fields.
Dataset$determine_columns(params, nrows = 100, gwas_file = self$filename, ...)
paramsList of column identifiers. Identifiers can be numeric position or column header name. Required columns are: c("chr_col", "pos_col", "ea_col", "oa_col", "beta_col", "se_col", "pval_col","rsid_col"). Optional columns are: c("snp_col", "eaf_col", "oaf_col", "ncase_col", "imp_z_col", "imp_info_col", "ncontrol_col").
nrowsHow many rows to read to check that parameters have been specified correctly
gwas_fileFilename to read
...Further arguments to pass to data.table::fread in order to correctly read the dataset
format_dataset()
Process dataset ready for uploading. Determins build and lifts over to hg19/b37 if necessary.
Dataset$format_dataset( gwas_file = self$filename, gwas_out = file.path(self$wd, "format.txt.gz"), params = self$params, metadata_test = self$metadata_test, ... )
gwas_fileGWAS filename
gwas_outFilename to save processed dataset to
paramsColumn specifications (see determine_columns for more info)
metadata_testList of outputs from tests of the effect allele, effect allele frequency columns and summary data using CheckSumStats
...Further arguments to pass to data.table::fread in order to correctly read the dataset
view_metadata_options()
View the specifications for available meta data fields, as taken from https://api.opengwas.io/api/docs
Dataset$view_metadata_options()
get_gwasdata_fields()
Get a list of GWAS data fields and whether or not they are required
Dataset$get_gwasdata_fields()
data.frame
get_metadata_fields()
Get a list of metadata fields and whether or not they are required
Dataset$get_metadata_fields()
data.frame
collect_metadata()
Input metadata
Dataset$collect_metadata(metadata, igd_id = self$igd_id)
metadataList of meta-data fields and their values, see view_metadata_options for which fields need to be inputted.
igd_idID to be used for uploading to the database
check_metadata()
Check that the reported effect allele and effect allele frequency columns are correct.
Dataset$check_metadata( gwas_file = self$filename, params = self$params, metadata = self$metadata )
gwas_fileFilename to read
paramscolumn names from x$determine_columns(). Required columns are: c("snp_col", "ea_col", "oa_col", "eaf_col" )
metadatametadata from x$collect_metadata()
write_metadata()
Write meta data to json file
Dataset$write_metadata( metadata = self$metadata, datainfo = self$datainfo, outdir = self$wd )
metadataList of meta data fields and their values
datainfoList of data column parameters
outdirOutput directory to write json files
api_metadata_upload()
Upload meta data to API
Dataset$api_metadata_upload( metadata = self$metadata, metadata_test = self$metadata_test, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
metadataList of meta data fields and their values
metadata_testList of outputs from tests of the effect allele, effect allele frequency columns and summary data using CheckSumStats
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_metadata_edit()
Upload meta data to API
Dataset$api_metadata_edit( metadata = self$metadata, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
metadataList of meta data fields and their values
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_metadata_check()
View meta-data
Dataset$api_metadata_check( id = self$igd_id, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
idID to check
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_metadata_delete()
Delete a draft dataset. Available until the dataset is submitted for approval. This will force the QC pipeline (if any) to fail, delete uploaded files and QC product etc., and when required, delete the metadata
Dataset$api_metadata_delete( id = self$igd_id, delete_metadata = FALSE, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
idID to delete
delete_metadataIf TRUE, also delete metadata, otherwise just only fail existing QC pipeline and uploaded files but keep the metadata
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_gwasdata_upload()
Upload gwas dataset
Dataset$api_gwasdata_upload( datainfo = self$datainfo, gwasfile = self$gwas_out, metadata_test = self$metadata_test, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
datainfoList of data column parameters
gwasfilePath to processed gwasfile
metadata_testList of outputs from tests of the effect allele, effect allele frequency columns and summary data using CheckSumStats
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_gwasdata_check()
Check status of API processing pipeline
Dataset$api_gwasdata_check( id = self$igd_id, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
idID to check
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_gwasdata_delete()
Delete a dataset. This deletes the metadata AND any uploaded GWAS data (and related processing files). This is a replicate of api_metadata_delete()
Dataset$api_gwasdata_delete( id = self$igd_id, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
idID to delete
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_qc_status()
Check the status of the GWAS QC processing pipeline
Dataset$api_qc_status( id = self$igd_id, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
idID to delete
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_report()
View the html report for a processed dataset
Dataset$api_report( id = self$igd_id, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
idID of report to view
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
api_gwas_release()
Release a dataset
Dataset$api_gwas_release( comments = NULL, passed_qc = "True", id = self$igd_id, opengwas_jwt = ieugwasr::get_opengwas_jwt() )
commentsOptional comments to provide when uploading
passed_qcTrue or False
idID to release
opengwas_jwtOpenGWAS JWT. See https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
clone()
The objects of this class are cloneable with this method.
Dataset$clone(deep = FALSE)
deepWhether to make a deep clone.
Determine build based on reference dataset
determine_build(rsid, chr, pos, build = c(37, 38, 36), fallback = "position")determine_build(rsid, chr, pos, build = c(37, 38, 36), fallback = "position")
rsid |
rsid |
chr |
chr |
pos |
pos |
build |
Builds to try e.g. c(37,38,36) |
fallback |
Whether to try "position" (fast) or "biomart" (more accurate if you have rsids) based approaches instead |
build if detected, or dataframe of matches if not
Determines which build a set of SNPs are on
determine_build_biomart(rsid, chr, pos, build = c(37, 38, 36))determine_build_biomart(rsid, chr, pos, build = c(37, 38, 36))
rsid |
rsid |
chr |
chr |
pos |
pos |
build |
Builds to try e.g. c(37,38,36) |
build if detected, or dataframe of matches if not
A bit sketchy but computationally fast - just assumes that there will be at least 50x more position matches in the true build than either of the others.
determine_build_position(pos, build = c(37, 38, 36))determine_build_position(pos, build = c(37, 38, 36))
pos |
Vector of positions |
build |
c(37,38,36) |
build or if not determined then dataframe
Figure out which datasets are not present in the database
determine_new_datasets( ebi_ftp_url = options()$ebi_ftp_url, blacklist = NULL, exclude_multi_datasets = TRUE )determine_new_datasets( ebi_ftp_url = options()$ebi_ftp_url, blacklist = NULL, exclude_multi_datasets = TRUE )
ebi_ftp_url |
FTP url default=options()$ebi_ftp_url |
blacklist |
List of EBI datasets to ignore default=NULL |
exclude_multi_datasets |
If a EBI ID has more than one dataset then should it be ignored |
data frame
Convert output from listftp into something that is easier to read
ebi_datasets(ebi_ftp_url = options()$ebi_ftp_url)ebi_datasets(ebi_ftp_url = options()$ebi_ftp_url)
ebi_ftp_url |
EBI FTP default=options()$ebi_ftp_url |
data frame
Object that downloads, develops and uploads EBI dataset
Object that downloads, develops and uploads EBI dataset
GwasDataImport::Dataset -> EbiDataset
ebi_idEBI ID to look for
traitnameName of trait
ftp_pathPath to files in EBI FTP
or_flagTRUE/FALSE if had to convert OR to beta
gwas_out1Path to first look at EBI dataset
GwasDataImport::Dataset$api_gwas_release()GwasDataImport::Dataset$api_gwasdata_check()GwasDataImport::Dataset$api_gwasdata_delete()GwasDataImport::Dataset$api_gwasdata_upload()GwasDataImport::Dataset$api_metadata_check()GwasDataImport::Dataset$api_metadata_delete()GwasDataImport::Dataset$api_metadata_edit()GwasDataImport::Dataset$api_metadata_upload()GwasDataImport::Dataset$api_qc_status()GwasDataImport::Dataset$api_report()GwasDataImport::Dataset$check_metadata()GwasDataImport::Dataset$collect_metadata()GwasDataImport::Dataset$delete_wd()GwasDataImport::Dataset$determine_columns()GwasDataImport::Dataset$format_dataset()GwasDataImport::Dataset$get_gwasdata_fields()GwasDataImport::Dataset$get_metadata_fields()GwasDataImport::Dataset$is_new_id()GwasDataImport::Dataset$se_from_bp()GwasDataImport::Dataset$set_wd()GwasDataImport::Dataset$view_metadata_options()GwasDataImport::Dataset$write_metadata()new()
Initialise object
EbiDataset$new(
ebi_id,
wd = tempdir(),
ftp_path = NULL,
igd_id = paste0("ebi-a-", ebi_id),
traitname = NULL
)ebi_ide.g. GCST005522
wdDirectory in which to download and develop dataset. Default=tempdir(). Deleted automatically upon object removal
ftp_pathPre-specified path to data. Default=NULL
igd_idDefaults to "ebi-a-<ebi_id>"
traitnameOption to provide traitname of dataset
A new EbiDataset object
download_dataset()
Download
EbiDataset$download_dataset( ftp_path = self$ftp_path, outdir = self$wd, dl = TRUE )
ftp_pathPre-specified path to data. Default=self$ftp_path
outdirDefault=self$wd
ftp_urlDefault=options()$ebi_ftp_url
format_ebi_dataset()
organise data before formatting. This is slow but doesn't really matter
EbiDataset$format_ebi_dataset( filename = self$filename, output = file.path(self$wd, "step1.txt.gz") )
filenameFilename of GWAS dataset
outputWhere to save formatted dataset
organise_metadata()
Download and parse metadata
EbiDataset$organise_metadata( ebi_id = self$ebi_id, or_flag = self$or_flag, igd_id = self$igd_id, units = NULL, sex = "NA", category = "NA", subcategory = "NA", build = "HG19/GRCh37", group_name = "public", traitname = self$traitname )
ebi_idDefault=self$ebi_id
or_flagDefault=self$or_flag
igd_idDefault=NULL
unitsDefault=NULL
sexDefault="NA"
categoryDefault="NA"
subcategoryDefault="NA"
buildDefault="HG19/GRCh37"
group_nameDefault="public"
traitnameDefault=self$traitname
pipeline()
Once initialised this function will string together everything i.e. downloading, processing and uploading
EbiDataset$pipeline()
clone()
The objects of this class are cloneable with this method.
EbiDataset$clone(deep = FALSE)
deepWhether to make a deep clone.
Get harmonised file for specific EBI ID
get_ftp_path(ebi_id, ebi_ftp_url = options()$ebi_ftp_url)get_ftp_path(ebi_id, ebi_ftp_url = options()$ebi_ftp_url)
ebi_id |
EBI ID e.g. GCST000879 |
ebi_ftp_url |
EBI FTP default=options()$ebi_ftp_url |
ftp path (excluding server)
Lookup positions for given rsids in particular build
get_positions( rsid, build = 37, method = c("opengwas", "biomart")[1], splitsize = 50000 )get_positions( rsid, build = 37, method = c("opengwas", "biomart")[1], splitsize = 50000 )
rsid |
rsid |
build |
build (36, 37 default or 38) |
method |
"opengwas" (fastest) or "biomart" |
splitsize |
Default 50000 |
data frame
Determine GWAS build and liftover to required build
liftover_gwas( dat, build = c(37, 38, 36), to = 37, chr_col = "chr", pos_col = "pos", snp_col = "snp", ea_col = "ea", oa_col = "oa", build_fallback = "position" )liftover_gwas( dat, build = c(37, 38, 36), to = 37, chr_col = "chr", pos_col = "pos", snp_col = "snp", ea_col = "ea", oa_col = "oa", build_fallback = "position" )
dat |
Data frame with chr, pos, snp name, effect allele, non-effect allele columns |
build |
The possible builds to check data against Default = c(37,38,26) |
to |
Which build to lift over to. Default=37 |
chr_col |
Name of chromosome column name. Required |
pos_col |
Name of position column name. Required |
snp_col |
Name of SNP column name. Optional. Uses less certain method of matching if not available |
ea_col |
Name of effect allele column name. Optional. Might lead to duplicated rows if not presented |
oa_col |
Name of other allele column name. Optional. Might lead to duplicated rows if not presented |
build_fallback |
Whether to try "position" (fast) or "biomart" (more accurate if you have rsids) based approaches instead |
Data frame
List all files on the EBI FTP server
listftp(url = options()$ebi_ftp_url, recursive = TRUE)listftp(url = options()$ebi_ftp_url, recursive = TRUE)
url |
FTP url to look up |
recursive |
If false then just the top directory, otherwise list recursively |
Vector of paths