Title: | GWAS summary data sources connected to analytical tools |
---|---|
Description: | Many tools exist that use GWAS summary data for colocalisation, fine mapping, Mendelian randomization, visualisation, etc. This package is a conduit that connects R packages that can retrieve GWAS summary data to various tools for analysing those data. |
Authors: | Gibran Hemani [aut, cre] |
Maintainer: | Gibran Hemani <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9000 |
Built: | 2024-08-25 03:51:42 UTC |
Source: | https://github.com/MRCIEU/gwasglue |
<full description>
clump_gwasvcf( vcf, clump_kb = 1000, clump_r2 = 0.001, clump_p = 5e-08, pop = NULL, bfile = NULL, plink_bin = NULL, access_token = NULL )
clump_gwasvcf( vcf, clump_kb = 1000, clump_r2 = 0.001, clump_p = 5e-08, pop = NULL, bfile = NULL, plink_bin = NULL, access_token = NULL )
vcf |
VCF file or VCF object |
clump_kb |
Clumping kb window. Default is very strict, 10000 |
clump_r2 |
Clumping r2 threshold. Default is very strict, 0.001 |
clump_p |
Clumping sig level for index variants. Default = 1 (i.e. no threshold) |
pop |
Super-population to use as reference panel. Default = "EUR". Options are EUR, SAS, EAS, AFR, AMR. 'legacy' also available - which is a previously used verison of the EUR panel with a slightly different set of markers |
bfile |
If this is provided then will use the API. Default = NULL |
plink_bin |
If null and bfile is not null then will detect packaged plink binary for specific OS. Otherwise specify path to plink binary. Default = NULL |
access_token |
Google OAuth2 access token. Used to authenticate level of access to data |
data frame of clumped results
For a list of fine-mapped rsids, will assign to regions and generate conditionally independent summary stats for each rsid
cojo_cond( vcffile, bfile, snplist, pop, gcta = genetics.binaRies::get_gcta_binary(), workdir = tempdir(), threads = 1 )
cojo_cond( vcffile, bfile, snplist, pop, gcta = genetics.binaRies::get_gcta_binary(), workdir = tempdir(), threads = 1 )
vcffile |
Path to vcffile |
bfile |
LD reference panel |
snplist |
List of rsids |
pop |
EUR, ASN or AFR |
gcta |
Path to gcta binary. For convenience can use default=genetics.binaRies::get_gcta_binary() |
workdir |
Location to store temporary files. Default=tempdir() |
threads |
Number of parallel threads. Default=1 |
List of independent summary stats
Write vcf file to cojo sumstat file
cojo_sumstat_file(vcffile, outfile)
cojo_sumstat_file(vcffile, outfile)
vcffile |
Path to vcf file |
outfile |
Path to output file |
vcf object
Convert coloc dataset to gassocplot dataset
coloc_to_gassocplot(coloclist, bfile = NULL, plink_bin = NULL)
coloc_to_gassocplot(coloclist, bfile = NULL, plink_bin = NULL)
coloclist |
Output from *_to_coloc |
bfile |
If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink dataset here. |
plink_bin |
If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink executable here |
List to feed into gassocplot
Generate coloc dataset from vcf files
gwasvcf_to_coloc(vcf1, vcf2, chrompos)
gwasvcf_to_coloc(vcf1, vcf2, chrompos)
vcf1 |
VCF object or path to vcf file |
vcf2 |
VCF object or path to vcf file |
chrompos |
Character of chr:pos1-pos2 |
List of datasets to feed into coloc
For a given region and VCF file, extracts the variants in the region along with LD matrix from a reference panel
gwasvcf_to_finemapr( region, vcf, bfile, plink_bin = genetics.binaRies::get_plink_binary(), threads = 1 )
gwasvcf_to_finemapr( region, vcf, bfile, plink_bin = genetics.binaRies::get_plink_binary(), threads = 1 )
region |
Region of the genome to extract eg 1:109317192-110317192". Can be array |
vcf |
Path to VCF file or VCF object |
bfile |
LD reference panel |
plink_bin |
Path to plink. Default = genetics.binaRies::get_plink_binary() |
threads |
Number of threads to run in parallel. Default=1 |
List of datasets for finemapping
Create exposure or outcome data format for TwoSampleMR from vcf
gwasvcf_to_TwoSampleMR(vcf, type = "exposure")
gwasvcf_to_TwoSampleMR(vcf, type = "exposure")
vcf |
VCF object |
type |
="exposure" or "outcome" |
data frame
Assumes ref and alt alleles available for target and reference datasets, and uses chr:pos for alignment
harmonise( chr1, pos1, ref1, alt1, chr2, pos2, ref2, alt2, rsid2 = NULL, indel_recode = FALSE, strand_flip = FALSE )
harmonise( chr1, pos1, ref1, alt1, chr2, pos2, ref2, alt2, rsid2 = NULL, indel_recode = FALSE, strand_flip = FALSE )
chr1 |
Vector |
pos1 |
Vector |
ref1 |
Vector |
alt1 |
Vector |
chr2 |
Vector |
pos2 |
Vector |
ref2 |
Vector |
alt2 |
Vector |
rsid2 |
Optional vector |
indel_recode |
=FALSE. If TRUE then attempts to recode D/I |
strand_flip |
=FALSE. If TRUE then attempts to flip strand when alignment is not otherwise possible |
0: stick 1: swap 2: rename indel 3: rename indel and swap 4: flip 5: flip and swap 6: drop (no match) 7: drop (no reference)
Dataframe of outcomes
Harmonise gwas alleles to be same as reference
harmonise_against_ref(gwas, reference)
harmonise_against_ref(gwas, reference)
gwas |
<what param does> |
reference |
<what param does> |
data frame with attributes
Generate coloc dataset from the IEU GWAS database
ieugwasr_to_coloc(id1, id2, chrompos, type1 = NULL, type2 = NULL)
ieugwasr_to_coloc(id1, id2, chrompos, type1 = NULL, type2 = NULL)
id1 |
ID for trait 1 |
id2 |
ID for trait 2 |
chrompos |
Character of chr:pos1-pos2 |
type1 |
Provide "cc" or "quant" to override automatic lookup of trait type for trait 1 |
type2 |
Provide "cc" or "quant" to override automatic lookup of trait type for trait 2 |
List of datasets to feed into coloc
Uses the finemapr package https://github.com/variani/finemapr
ieugwasr_to_finemapr(region, id, bfile = NULL, plink_bin = NULL)
ieugwasr_to_finemapr(region, id, bfile = NULL, plink_bin = NULL)
region |
Region of the genome to extract eg 1:109317192-110317192" |
id |
Array of GWAS studies to query. See |
bfile |
If this is provided then will use the API. Default = NULL |
plink_bin |
If null and bfile is not null then will detect packaged plink binary for specific OS. Otherwise specify path to plink binary. Default = NULL |
Each id will be a list of z score data, ld matrix, and sample size
Uses James Staley's gassocplot package https://github.com/jrs95/gassocplot
ieugwasr_to_gassocplot(chrpos, id, bfile = NULL, plink_bin = NULL)
ieugwasr_to_gassocplot(chrpos, id, bfile = NULL, plink_bin = NULL)
chrpos |
A window range to plot e.g. 16:3349655-3849655 |
id |
Vector of one or more IEU GWAS db study IDs |
bfile |
If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink dataset here. |
plink_bin |
If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink executable here |
assoc_plot or stack_assoc_plot if multiple markers given
Convert output from query to TwoSampleMR format
ieugwasr_to_TwoSampleMR(x, type = "exposure")
ieugwasr_to_TwoSampleMR(x, type = "exposure")
x |
Output from ieugwasr query e.g. associations, tophits, phewas |
type |
"exposure" (default) or "outcome" |
data frame
Assuming reference data is all on forward strand, check if the GWAS is also. Use some threshold e.g. if more than 90 need to be flipped then it's likely that the dataset is on the forward strand
is_forward_strand( gwas_snp, gwas_a1, gwas_a2, ref_snp, ref_a1, ref_a2, threshold = 0.9 )
is_forward_strand( gwas_snp, gwas_a1, gwas_a2, ref_snp, ref_a1, ref_a2, threshold = 0.9 )
gwas_snp |
Vector of SNP names for the dataset being checked |
gwas_a1 |
Vector of alleles |
gwas_a2 |
Vector of alleles |
ref_snp |
Vector of SNP names for the reference dataset |
ref_a1 |
Vector of alleles |
ref_a2 |
Vector of alleles |
threshold |
=0.9 If the proportion of allele strands match is above this threshold, then declare the dataset to be on the forward strand |
This function can be used to evaluate how strict harmonisation should be The trade off if you assume we are not on the forward strand then palindromic SNPs are dropped within a particular frequency range But you could instead have some small probability of error for whether palindromic SNPs are on the forward strand, and avoid dropping too many variants.
1 = Forward strand; 2 = Not on forward strand
This mimics the TwoSampleMR::make_dat function, which automatically looks up exposure and outcome datasets and harmonises them, except this function uses GWAS-VCF datasets instead. The supporting reference datasets can be accessed by UoB users on BC4 using set_bc4_files()
make_TwoSampleMR_dat( id1, id2, proxies = TRUE, nthreads = 1, vcfdir = options()$gwasglue.vcfdir, proxydb = options()$gwasglue.proxydb, rsidx = options()$gwasglue.rsidx, bfile = options()$gwasglue.bfile, action = 1, plink_bin = genetics.binaRies::get_plink_binary() )
make_TwoSampleMR_dat( id1, id2, proxies = TRUE, nthreads = 1, vcfdir = options()$gwasglue.vcfdir, proxydb = options()$gwasglue.proxydb, rsidx = options()$gwasglue.rsidx, bfile = options()$gwasglue.bfile, action = 1, plink_bin = genetics.binaRies::get_plink_binary() )
id1 |
Exposure datasets. Either an array of vcf files, or array of IDs if vcfdir is set |
id2 |
Outcome datasets. Either an array of vcf files, or array of IDs if vcfdir is set |
proxies |
Lookup proxies? default=TRUE but requires either bfile or proxydb to be set |
nthreads |
Parellelise default=1 |
vcfdir |
Location of vcf files if id1 and id2 are just IDs. Defaults to options()$gwasglue.vcfdir |
proxydb |
Location of LD proxy database Default=options()$gwasglue.proxydb |
rsidx |
Location of rsidx index database Default=options()$gwasglue.rsidx |
bfile |
Location of LD reference panel Default=options()$gwasglue.bfile |
harmonised dataset
LD regions defined here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4731402/
map_variants_to_regions(chrpos, pop)
map_variants_to_regions(chrpos, pop)
chrpos |
Array of chr:pos |
pop |
EUR, AFR or ASN |
Figure out specific files and IDs depending on what files exist and whethet vcfdir is set
organise_ids(id, vcfdir)
organise_ids(id, vcfdir)
id |
List of IDs within the vcfdir structure, or a list of GWAS VCF files, or a mixture |
vcfdir |
Location of GWAS VCF files, or NULL if id is a list of actual files |
File paths to all datasets
Read in GWAS dataset
read_gwas( filename, skip, delimiter, gzipped, snp, nea, ea, ea_af, effect, se, pval, n, info, z )
read_gwas( filename, skip, delimiter, gzipped, snp, nea, ea, ea_af, effect, se, pval, n, info, z )
filename |
<what param does> |
skip |
<what param does> |
delimiter |
<what param does> |
gzipped |
<what param does> |
snp |
<what param does> |
nea |
<what param does> |
ea |
<what param does> |
ea_af |
<what param does> |
effect |
<what param does> |
se |
<what param does> |
pval |
<what param does> |
n |
<what param does> |
info |
<what param does> |
z |
<what param does> |
data frame with log attributes
Read in reference dataset
read_reference( reference_file, rsid = NULL, chrompos = NULL, remove_dup_rsids = TRUE )
read_reference( reference_file, rsid = NULL, chrompos = NULL, remove_dup_rsids = TRUE )
reference_file |
Reference vcf |
rsid |
List of variants to read |
chrompos |
List of chrompos to read |
remove_dup_rsids |
=TRUE Remove duplicates from output |
data frame
This is a convenience function for members at the University of Bristol to automatically set file locations for various reference datasets. It relates only to paths on bc4
set_bc4_files()
set_bc4_files()
Clumps data, then maps those to LD representative regions. Within each detected LD representative region, fine mapping is performed
susieR_pipeline( vcffile, bfile, plink_bin, pop, threads = 1, clump_kb = 1000, clump_r2 = 0.001, clump_p = 5e-08, ... )
susieR_pipeline( vcffile, bfile, plink_bin, pop, threads = 1, clump_kb = 1000, clump_r2 = 0.001, clump_p = 5e-08, ... )
vcffile |
Path to vcf file |
bfile |
Path to ld reference panel |
plink_bin |
Path to plink |
pop |
EUR, ASN or AFR |
clump_kb |
<what param does> |
clump_r2 |
<what param does> |
clump_p |
<what param does> |
... |
Optional arguments to be passed to susie_rss |
List
Takes raw files and aligns them to reference. Important if files don't have chr:pos already
write_out(harmonised, path)
write_out(harmonised, path)
harmonised |
Output from /codeharmonise_against_ref |
path |
Path to write out json file and txt file |