Title: | Tools for Dealing with GWAS Summary Data in VCF Format |
---|---|
Description: | Tools for dealing with GWAS summary data in VCF format. Includes reading, querying, writing, as well as helper functions such as LD proxy searches. |
Authors: | Gibran Hemani [aut, cre] , Tom Palmer [ctb] , Rita Rasteiro [ctb] |
Maintainer: | Gibran Hemani <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2024-11-24 02:56:14 UTC |
Source: | https://github.com/MRCIEU/gwasvcf |
See set_bcftools() for more information
check_bcftools()
check_bcftools()
TRUE or FALSE
See set_plink() for more information
check_plink()
check_plink()
TRUE or FALSE
This is used for looking up proxies
create_ldref_sqlite(bfile, dbname, tag_r2 = 0.6)
create_ldref_sqlite(bfile, dbname, tag_r2 = 0.6)
bfile |
path to plink file |
dbname |
dbname to produce (overwrites existing if exists) |
tag_r2 |
minimum tag r2 |
Create a separate file called <id>.pvali
which is used to speed up p-value queries.
create_pval_index_from_vcf(vcffile, maximum_pval, indexname)
create_pval_index_from_vcf(vcffile, maximum_pval, indexname)
vcffile |
VCF filename |
maximum_pval |
Maximum p-value to include. Default = 0.05 |
indexname |
index file name to create. Deletes existing file if exists. |
Create RSID index from VCF
create_rsidx_index_from_vcf(vcf, indexname)
create_rsidx_index_from_vcf(vcf, indexname)
vcf |
VCF filename |
indexname |
index file name to create. Deletes existing file if exists. |
Note this requires a modified version of plink that allows ld-window-r2 flag for –r option. Available here: https://github.com/explodecomputer/plink-ng
create_rsidx_sub_index(rsid, rsidx, newindex)
create_rsidx_sub_index(rsid, rsidx, newindex)
rsid |
Vector of rsids |
rsidx |
Existing index |
newindex |
New index (Note: will delete existing file if exists) |
NULL, creates new index file
Create GWAS vcf
create_vcf( chrom, pos, nea, ea, snp = NULL, ea_af = NULL, effect = NULL, se = NULL, pval = NULL, n = NULL, ncase = NULL, name = NULL )
create_vcf( chrom, pos, nea, ea, snp = NULL, ea_af = NULL, effect = NULL, se = NULL, pval = NULL, n = NULL, ncase = NULL, name = NULL )
chrom |
chrom vector |
pos |
pos vector |
nea |
nea vector |
ea |
ea vector |
snp |
Optional vector |
ea_af |
Optional vector |
effect |
Optional vector |
se |
Optional vector |
pval |
Optional vector |
n |
Optional vector |
ncase |
Optional vector |
name |
Optional vector |
vcf object
Find LD proxies for a set of SNPs
get_ld_proxies( rsid, bfile, searchspace = NULL, tag_kb = 5000, tag_nsnp = 5000, tag_r2 = 0.6, threads = 1, out = tempfile() )
get_ld_proxies( rsid, bfile, searchspace = NULL, tag_kb = 5000, tag_nsnp = 5000, tag_r2 = 0.6, threads = 1, out = tempfile() )
rsid |
list of rs IDs |
bfile |
ld reference panel |
searchspace |
Optional list of rs IDs to use as potential proxies |
tag_kb |
=5000 Proxy parameter |
tag_nsnp |
=5000 Proxy parameter |
tag_r2 |
=0.6 Proxy parameter |
threads |
Number of threads to use (=1) |
out |
temporary output file |
data frame
Returns a gwasglue2 SummarySet object
gwasvcf_to_summaryset(vcf)
gwasvcf_to_summaryset(vcf)
vcf |
Path or URL to GWAS-VCF file or VCF object e.g. output from |
Returns merged intersection of two VCF objects
merge_vcf(a, b)
merge_vcf(a, b)
a |
VCF object |
b |
VCF object |
SimpleList of VCF objects
Takes data frame or vector of chromosome position ranges and parses to granges object
parse_chrompos(chrompos, radius = NULL)
parse_chrompos(chrompos, radius = NULL)
chrompos |
Either vector of chromosome and position ranges e.g. "1:1000" or "1:1000-2000", or data frame with columns |
radius |
Add radius to the specified positions. Default = NULL |
GRanges object
Finds proxies if necessary
proxy_match( vcf, rsid, bfile = NULL, proxies = "yes", tag_kb = 5000, tag_nsnp = 5000, tag_r2 = 0.6, threads = 1, rsidx = NULL, dbfile = NULL )
proxy_match( vcf, rsid, bfile = NULL, proxies = "yes", tag_kb = 5000, tag_nsnp = 5000, tag_r2 = 0.6, threads = 1, rsidx = NULL, dbfile = NULL )
vcf |
vcf file name |
rsid |
list of rs IDs |
bfile |
ld reference panel (plink) |
proxies |
="yes" If SNPs are absent then look for proxies (yes) or not (no). Can also mask all target SNPs and only return proxies (only), for testing purposes |
tag_kb |
=5000 Proxy parameter |
tag_nsnp |
=5000 Proxy parameter |
tag_r2 |
=0.6 Proxy parameter |
threads |
Number of threads to use (=1) |
rsidx |
Path to rsidx index |
dbfile |
ld tag database (sqlite) |
data frame
Query chromosome and position using bcftools
query_chrompos_bcftools(chrompos, vcffile, id = NULL)
query_chrompos_bcftools(chrompos, vcffile, id = NULL)
chrompos |
Either vector of chromosome and position ranges e.g. "1:1000" or "1:1000-2000", or data frame with columns |
vcffile |
Path to .vcf.gz GWAS summary data file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
vcf object
Query vcf file, extracting by chromosome and position
query_chrompos_file(chrompos, vcffile, id = NULL, build = "GRCh37")
query_chrompos_file(chrompos, vcffile, id = NULL, build = "GRCh37")
chrompos |
Either vector of chromosome and position ranges e.g. "1:1000" or "1:1000-2000", or data frame with columns |
vcffile |
Path to .vcf.gz GWAS summary data file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
build |
Default="GRCh37" Build of vcffile |
VCF object
Query chrompos from vcf object
query_chrompos_vcf(chrompos, vcf, id = NULL)
query_chrompos_vcf(chrompos, vcf, id = NULL)
chrompos |
Either vector of chromosome and position ranges e.g. "1:1000" or "1:1000-2000", or data frame with columns |
vcf |
VCF object (e.g. from readVcf) |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
VCF object
Read in GWAS summary data with filters on datasets (if multiple datasets per file) and/or chromosome/position, rsids or pvalues. Chooses most optimal choice for the detected operating system. Typically chrompos searches are the fastest. On Windows, rsid or pvalue filters from a file will be slow.
query_gwas( vcf, chrompos = NULL, rsid = NULL, pval = NULL, id = NULL, rsidx = NULL, pvali = NULL, build = "GRCh37", os = Sys.info()[["sysname"]], proxies = "no", bfile = NULL, dbfile = NULL, tag_kb = 5000, tag_nsnp = 5000, tag_r2 = 0.6, threads = 1 )
query_gwas( vcf, chrompos = NULL, rsid = NULL, pval = NULL, id = NULL, rsidx = NULL, pvali = NULL, build = "GRCh37", os = Sys.info()[["sysname"]], proxies = "no", bfile = NULL, dbfile = NULL, tag_kb = 5000, tag_nsnp = 5000, tag_r2 = 0.6, threads = 1 )
vcf |
Path or URL to GWAS-VCF file or VCF object e.g. output from |
chrompos |
Either vector of chromosome and position ranges e.g. "1:1000" or "1:1000-2000", or data frame with columns |
rsid |
Vector of rsids |
pval |
P-value threshold (NOT -log10) |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
rsidx |
Path to rsidx index file |
pvali |
Path to pval index file |
build |
="GRCh37" Build of vcffile |
os |
The operating system. Default is as detected. Determines the method used to perform query |
proxies |
="no" If SNPs are absent then look for proxies (yes) or not (no). Can also mask all target SNPs and only return proxies (only), for testing purposes. Currently only possible if querying rsid. |
bfile |
=path to plink bed/bim/fam ld reference panel |
dbfile |
=path to sqlite tag snp database |
tag_kb |
=5000 Proxy parameter |
tag_nsnp |
=5000 Proxy parameter |
tag_r2 |
=0.6 Proxy parameter |
threads |
=1 NUmber of threads |
vcf object
Query p-value using bcftools
query_pval_bcftools(pval, vcffile, id = NULL)
query_pval_bcftools(pval, vcffile, id = NULL)
pval |
P-value threshold (NOT -log10) |
vcffile |
Path to .vcf.gz GWAS summary data file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
vcf object
Query pval from vcf file
query_pval_file(pval, vcffile, id = NULL, build = "GRCh37")
query_pval_file(pval, vcffile, id = NULL, build = "GRCh37")
pval |
P-value threshold (NOT -log10) |
vcffile |
Path to tabix indexed vcf file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
build |
Default="GRCh37" |
VCF object
See create_pvali_index
See create_pvali_index
query_pval_sqlite3(pval, vcffile, id = NULL, pvali) query_pval_sqlite3(pval, vcffile, id = NULL, pvali)
query_pval_sqlite3(pval, vcffile, id = NULL, pvali) query_pval_sqlite3(pval, vcffile, id = NULL, pvali)
pval |
pval threshold |
vcffile |
Path to .vcf.gz GWAS summary data file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
pvali |
Path to pval index file |
vcf object
vcf object
Query based on p-value threshold from vcf
query_pval_vcf(pval, vcf, id = NULL)
query_pval_vcf(pval, vcf, id = NULL)
pval |
P-value threshold (NOT -log10) |
vcf |
VCF object (e.g. from readVcf) |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
VCF object
Query pvali
Query pvali
query_pvali(pval, pvali) query_pvali(pval, pvali)
query_pvali(pval, pvali) query_pvali(pval, pvali)
pval |
pval threshold |
pvali |
Path to pval index file |
data frame
data frame
Query
query_rsid_bcftools(rsid, vcffile, id = NULL)
query_rsid_bcftools(rsid, vcffile, id = NULL)
rsid |
Vector of rsids |
vcffile |
Path to .vcf.gz GWAS summary data file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
VCF object
Query vcf file, extracting by rsid
query_rsid_file(rsid, vcffile, id = NULL, build = "GRCh37")
query_rsid_file(rsid, vcffile, id = NULL, build = "GRCh37")
rsid |
Vector of rsids. Use DBSNP build (???) |
vcffile |
Path to .vcf.gz GWAS summary data file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
build |
Default="GRCh37" Build of vcffile |
VCF object
See create_rsidx_index
query_rsid_rsidx(rsid, vcffile, id = NULL, rsidx)
query_rsid_rsidx(rsid, vcffile, id = NULL, rsidx)
rsid |
Vector of rsids |
vcffile |
Path to .vcf.gz GWAS summary data file |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
rsidx |
Path to rsidx index file |
vcf object
Query rsid from vcf object
query_rsid_vcf(rsid, vcf, id = NULL)
query_rsid_vcf(rsid, vcf, id = NULL)
rsid |
Vector of rsids |
vcf |
VCF object (e.g. from readVcf) |
id |
If multiple GWAS datasets in the vcf file, the name (sample ID) from which to perform the filter |
VCF object
Query rsidx
query_rsidx(rsid, rsidx)
query_rsidx(rsid, rsidx)
rsid |
Vector of rsids |
rsidx |
Path to rsidx index file |
data frame
Set bcftools binary location
set_bcftools(path = "")
set_bcftools(path = "")
path |
If "" (default), then will use the MRCIEU/genetics.binaRies to get binaries that are appropriate for the detected operating system. Otherwise, provide the path to the bcftools binary. If NULL then will set the option to NULL. |
NULL, sets option 'tools_bcftools'
Set plink binary location
set_plink(path = "")
set_plink(path = "")
path |
If "" (default), then will use the MRCIEU/genetics.binaRies to get binaries that are appropriate for the detected operating system. Otherwise, provide the path to the plink binary. If NULL then will set the option to NULL. |
NULL, sets option 'tools_plink'
Lookup LD proxies from sqlite database
sqlite_ld_proxies(rsids, dbfile, tag_r2)
sqlite_ld_proxies(rsids, dbfile, tag_r2)
rsids |
List of rsids |
dbfile |
path to dbfile |
tag_r2 |
minimum r2 value |
data frame
Convert vcf format to granges format
vcf_to_granges(vcf, id = NULL)
vcf_to_granges(vcf, id = NULL)
vcf |
Output from readVcf |
id |
Only accepts one ID, so specify here if there are multiple GWAS datasets in the vcf |
GRanges object
Convert vcf format to tibble (data frame)
vcf_to_tibble(vcf, id = NULL)
vcf_to_tibble(vcf, id = NULL)
vcf |
Output from readVcf |
id |
Only accepts one ID, so specify here if there are multiple GWAS datasets in the vcf |
GRanges object
Reduce list of VCFs to intersecting regions
vcflist_overlaps(vcflist, chrompos)
vcflist_overlaps(vcflist, chrompos)
vcflist |
List of VCF objects, or list of VCF filenames, or mix of VCF objects and filenames |
chrompos |
Either vector of chromosome and position ranges e.g. "1:1000" or "1:1000-2000", or data frame with columns |
List of VCFs