Package 'gwasglue' reference manual

Title:	GWAS summary data sources connected to analytical tools
Description:	Many tools exist that use GWAS summary data for colocalisation, fine mapping, Mendelian randomization, visualisation, etc. This package is a conduit that connects R packages that can retrieve GWAS summary data to various tools for analysing those data.
Authors:	Gibran Hemani [aut, cre]
Maintainer:	Gibran Hemani <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.0.9000
Built:	2025-03-24 04:11:43 UTC
Source:	https://github.com/MRCIEU/gwasglue

Perform LD clumping

Description

Usage

clump_gwasvcf(
  vcf,
  clump_kb = 1000,
  clump_r2 = 0.001,
  clump_p = 5e-08,
  pop = NULL,
  bfile = NULL,
  plink_bin = NULL,
  access_token = NULL
)
clump_gwasvcf(
  vcf,
  clump_kb = 1000,
  clump_r2 = 0.001,
  clump_p = 5e-08,
  pop = NULL,
  bfile = NULL,
  plink_bin = NULL,
  access_token = NULL
)

Arguments

`vcf`	VCF file or VCF object
`clump_kb`	Clumping kb window. Default is very strict, 10000
`clump_r2`	Clumping r2 threshold. Default is very strict, 0.001
`clump_p`	Clumping sig level for index variants. Default = 1 (i.e. no threshold)
`pop`	Super-population to use as reference panel. Default = "EUR". Options are EUR, SAS, EAS, AFR, AMR. 'legacy' also available - which is a previously used verison of the EUR panel with a slightly different set of markers
`bfile`	If this is provided then will use the API. Default = NULL
`plink_bin`	If null and bfile is not null then will detect packaged plink binary for specific OS. Otherwise specify path to plink binary. Default = NULL
`access_token`	Google OAuth2 access token. Used to authenticate level of access to data

Value

data frame of clumped results

Perform conditional analysis using GCTA COJO

Description

For a list of fine-mapped rsids, will assign to regions and generate conditionally independent summary stats for each rsid

Usage

cojo_cond(
  vcffile,
  bfile,
  snplist,
  pop,
  gcta = genetics.binaRies::get_gcta_binary(),
  workdir = tempdir(),
  threads = 1
)
cojo_cond(
  vcffile,
  bfile,
  snplist,
  pop,
  gcta = genetics.binaRies::get_gcta_binary(),
  workdir = tempdir(),
  threads = 1
)

Arguments

`vcffile`	Path to vcffile
`bfile`	LD reference panel
`snplist`	List of rsids
`pop`	EUR, ASN or AFR
`gcta`	Path to gcta binary. For convenience can use default=genetics.binaRies::get_gcta_binary()
`workdir`	Location to store temporary files. Default=tempdir()
`threads`	Number of parallel threads. Default=1

Value

List of independent summary stats

Write vcf file to cojo sumstat file

Description

Write vcf file to cojo sumstat file

Usage

cojo_sumstat_file(vcffile, outfile)
cojo_sumstat_file(vcffile, outfile)

Arguments

`vcffile`	Path to vcf file
`outfile`	Path to output file

Value

vcf object

Convert coloc dataset to gassocplot dataset

Description

Convert coloc dataset to gassocplot dataset

Usage

coloc_to_gassocplot(coloclist, bfile = NULL, plink_bin = NULL)
coloc_to_gassocplot(coloclist, bfile = NULL, plink_bin = NULL)

Arguments

`coloclist`	Output from *_to_coloc
`bfile`	If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink dataset here.
`plink_bin`	If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink executable here

Value

List to feed into gassocplot

Generate coloc dataset from vcf files

Description

Generate coloc dataset from vcf files

Usage

gwasvcf_to_coloc(vcf1, vcf2, chrompos)
gwasvcf_to_coloc(vcf1, vcf2, chrompos)

Arguments

`vcf1`	VCF object or path to vcf file
`vcf2`	VCF object or path to vcf file
`chrompos`	Character of chr:pos1-pos2

Value

List of datasets to feed into coloc

Generate data for fine mapping analysis

Description

For a given region and VCF file, extracts the variants in the region along with LD matrix from a reference panel

Usage

gwasvcf_to_finemapr(
  region,
  vcf,
  bfile,
  plink_bin = genetics.binaRies::get_plink_binary(),
  threads = 1
)
gwasvcf_to_finemapr(
  region,
  vcf,
  bfile,
  plink_bin = genetics.binaRies::get_plink_binary(),
  threads = 1
)

Arguments

`region`	Region of the genome to extract eg 1:109317192-110317192". Can be array
`vcf`	Path to VCF file or VCF object
`bfile`	LD reference panel
`plink_bin`	Path to plink. Default = genetics.binaRies::get_plink_binary()
`threads`	Number of threads to run in parallel. Default=1

Value

List of datasets for finemapping

Create exposure or outcome data format for TwoSampleMR from vcf

Description

Create exposure or outcome data format for TwoSampleMR from vcf

Usage

gwasvcf_to_TwoSampleMR(vcf, type = "exposure")
gwasvcf_to_TwoSampleMR(vcf, type = "exposure")

Arguments

`vcf`	VCF object
`type`	="exposure" or "outcome"

Value

data frame

Generic harmonisation function

Description

Assumes ref and alt alleles available for target and reference datasets, and uses chr:pos for alignment

Usage

harmonise(
  chr1,
  pos1,
  ref1,
  alt1,
  chr2,
  pos2,
  ref2,
  alt2,
  rsid2 = NULL,
  indel_recode = FALSE,
  strand_flip = FALSE
)
harmonise(
  chr1,
  pos1,
  ref1,
  alt1,
  chr2,
  pos2,
  ref2,
  alt2,
  rsid2 = NULL,
  indel_recode = FALSE,
  strand_flip = FALSE
)

Arguments

`chr1`	Vector
`pos1`	Vector
`ref1`	Vector
`alt1`	Vector
`chr2`	Vector
`pos2`	Vector
`ref2`	Vector
`alt2`	Vector
`rsid2`	Optional vector
`indel_recode`	=FALSE. If TRUE then attempts to recode D/I
`strand_flip`	=FALSE. If TRUE then attempts to flip strand when alignment is not otherwise possible

Details

0: stick 1: swap 2: rename indel 3: rename indel and swap 4: flip 5: flip and swap 6: drop (no match) 7: drop (no reference)

Value

Dataframe of outcomes

Harmonise gwas alleles to be same as reference

Description

Harmonise gwas alleles to be same as reference

Usage

harmonise_against_ref(gwas, reference)
harmonise_against_ref(gwas, reference)

Arguments

`gwas`	<what param does>
`reference`	<what param does>

Value

data frame with attributes

Generate coloc dataset from the IEU GWAS database

Description

Generate coloc dataset from the IEU GWAS database

Usage

ieugwasr_to_coloc(id1, id2, chrompos, type1 = NULL, type2 = NULL)
ieugwasr_to_coloc(id1, id2, chrompos, type1 = NULL, type2 = NULL)

Arguments

`id1`	ID for trait 1
`id2`	ID for trait 2
`chrompos`	Character of chr:pos1-pos2
`type1`	Provide "cc" or "quant" to override automatic lookup of trait type for trait 1
`type2`	Provide "cc" or "quant" to override automatic lookup of trait type for trait 2

Value

List of datasets to feed into coloc

Generate data for analysis in various finemapping methods

Description

Uses the finemapr package https://github.com/variani/finemapr

Usage

ieugwasr_to_finemapr(region, id, bfile = NULL, plink_bin = NULL)
ieugwasr_to_finemapr(region, id, bfile = NULL, plink_bin = NULL)

Arguments

`region`	Region of the genome to extract eg 1:109317192-110317192"
`id`	Array of GWAS studies to query. See `gwasinfo` for available studies
`bfile`	If this is provided then will use the API. Default = NULL
`plink_bin`	If null and bfile is not null then will detect packaged plink binary for specific OS. Otherwise specify path to plink binary. Default = NULL

Value

Each id will be a list of z score data, ld matrix, and sample size

Generate regional plot for ieugwasr

Description

Uses James Staley's gassocplot package https://github.com/jrs95/gassocplot

Usage

ieugwasr_to_gassocplot(chrpos, id, bfile = NULL, plink_bin = NULL)
ieugwasr_to_gassocplot(chrpos, id, bfile = NULL, plink_bin = NULL)

Arguments

`chrpos`	A window range to plot e.g. 16:3349655-3849655
`id`	Vector of one or more IEU GWAS db study IDs
`bfile`	If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink dataset here.
`plink_bin`	If number of SNPs > 500 then need to provide your own LD reference panel. Provide plink executable here

Value

assoc_plot or stack_assoc_plot if multiple markers given

Convert output from query to TwoSampleMR format

Description

Convert output from query to TwoSampleMR format

Usage

ieugwasr_to_TwoSampleMR(x, type = "exposure")
ieugwasr_to_TwoSampleMR(x, type = "exposure")

Arguments

`x`	Output from ieugwasr query e.g. associations, tophits, phewas
`type`	"exposure" (default) or "outcome"

Value

data frame

Check a GWAS dataset against a reference known to be on the forward strand

Description

Assuming reference data is all on forward strand, check if the GWAS is also. Use some threshold e.g. if more than 90 need to be flipped then it's likely that the dataset is on the forward strand

Usage

is_forward_strand(
  gwas_snp,
  gwas_a1,
  gwas_a2,
  ref_snp,
  ref_a1,
  ref_a2,
  threshold = 0.9
)
is_forward_strand(
  gwas_snp,
  gwas_a1,
  gwas_a2,
  ref_snp,
  ref_a1,
  ref_a2,
  threshold = 0.9
)

Arguments

`gwas_snp`	Vector of SNP names for the dataset being checked
`gwas_a1`	Vector of alleles
`gwas_a2`	Vector of alleles
`ref_snp`	Vector of SNP names for the reference dataset
`ref_a1`	Vector of alleles
`ref_a2`	Vector of alleles
`threshold`	=0.9 If the proportion of allele strands match is above this threshold, then declare the dataset to be on the forward strand

Details

This function can be used to evaluate how strict harmonisation should be The trade off if you assume we are not on the forward strand then palindromic SNPs are dropped within a particular frequency range But you could instead have some small probability of error for whether palindromic SNPs are on the forward strand, and avoid dropping too many variants.

Value

1 = Forward strand; 2 = Not on forward strand

Create a harmonised dataset from lists of vcf files

Description

This mimics the TwoSampleMR::make_dat function, which automatically looks up exposure and outcome datasets and harmonises them, except this function uses GWAS-VCF datasets instead. The supporting reference datasets can be accessed by UoB users on BC4 using set_bc4_files()

Usage

make_TwoSampleMR_dat(
  id1,
  id2,
  proxies = TRUE,
  nthreads = 1,
  vcfdir = options()$gwasglue.vcfdir,
  proxydb = options()$gwasglue.proxydb,
  rsidx = options()$gwasglue.rsidx,
  bfile = options()$gwasglue.bfile,
  action = 1,
  plink_bin = genetics.binaRies::get_plink_binary()
)
make_TwoSampleMR_dat(
  id1,
  id2,
  proxies = TRUE,
  nthreads = 1,
  vcfdir = options()$gwasglue.vcfdir,
  proxydb = options()$gwasglue.proxydb,
  rsidx = options()$gwasglue.rsidx,
  bfile = options()$gwasglue.bfile,
  action = 1,
  plink_bin = genetics.binaRies::get_plink_binary()
)

Arguments

`id1`	Exposure datasets. Either an array of vcf files, or array of IDs if vcfdir is set
`id2`	Outcome datasets. Either an array of vcf files, or array of IDs if vcfdir is set
`proxies`	Lookup proxies? default=TRUE but requires either bfile or proxydb to be set
`nthreads`	Parellelise default=1
`vcfdir`	Location of vcf files if id1 and id2 are just IDs. Defaults to options()$gwasglue.vcfdir
`proxydb`	Location of LD proxy database Default=options()$gwasglue.proxydb
`rsidx`	Location of rsidx index database Default=options()$gwasglue.rsidx
`bfile`	Location of LD reference panel Default=options()$gwasglue.bfile

Value

harmonised dataset

For a set of variants map to LD regions

Description

LD regions defined here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4731402/

Usage

map_variants_to_regions(chrpos, pop)
map_variants_to_regions(chrpos, pop)

Arguments

`chrpos`	Array of chr:pos
`pop`	EUR, AFR or ASN

Figure out specific files and IDs depending on what files exist and whethet vcfdir is set

Description

Figure out specific files and IDs depending on what files exist and whethet vcfdir is set

Usage

organise_ids(id, vcfdir)
organise_ids(id, vcfdir)

Arguments

`id`	List of IDs within the vcfdir structure, or a list of GWAS VCF files, or a mixture
`vcfdir`	Location of GWAS VCF files, or NULL if id is a list of actual files

Value

File paths to all datasets

Read in GWAS dataset

Description

Read in GWAS dataset

Usage

read_gwas(
  filename,
  skip,
  delimiter,
  gzipped,
  snp,
  nea,
  ea,
  ea_af,
  effect,
  se,
  pval,
  n,
  info,
  z
)
read_gwas(
  filename,
  skip,
  delimiter,
  gzipped,
  snp,
  nea,
  ea,
  ea_af,
  effect,
  se,
  pval,
  n,
  info,
  z
)

Arguments

`filename`	<what param does>
`skip`	<what param does>
`delimiter`	<what param does>
`gzipped`	<what param does>
`snp`	<what param does>
`nea`	<what param does>
`ea`	<what param does>
`ea_af`	<what param does>
`effect`	<what param does>
`se`	<what param does>
`pval`	<what param does>
`n`	<what param does>
`info`	<what param does>
`z`	<what param does>

Value

data frame with log attributes

Read in reference dataset

Description

Read in reference dataset

Usage

read_reference(
  reference_file,
  rsid = NULL,
  chrompos = NULL,
  remove_dup_rsids = TRUE
)
read_reference(
  reference_file,
  rsid = NULL,
  chrompos = NULL,
  remove_dup_rsids = TRUE
)

Arguments

`reference_file`	Reference vcf
`rsid`	List of variants to read
`chrompos`	List of chrompos to read
`remove_dup_rsids`	=TRUE Remove duplicates from output

Value

data frame

Determine locations of useful reference datasets on bluecrystal4

Description

This is a convenience function for members at the University of Bristol to automatically set file locations for various reference datasets. It relates only to paths on bc4

Usage

set_bc4_files()
set_bc4_files()

Perform fine mapping pipeline using susieR

Description

Clumps data, then maps those to LD representative regions. Within each detected LD representative region, fine mapping is performed

Usage

susieR_pipeline(
  vcffile,
  bfile,
  plink_bin,
  pop,
  threads = 1,
  clump_kb = 1000,
  clump_r2 = 0.001,
  clump_p = 5e-08,
  ...
)
susieR_pipeline(
  vcffile,
  bfile,
  plink_bin,
  pop,
  threads = 1,
  clump_kb = 1000,
  clump_r2 = 0.001,
  clump_p = 5e-08,
  ...
)

Arguments

`vcffile`	Path to vcf file
`bfile`	Path to ld reference panel
`plink_bin`	Path to plink
`pop`	EUR, ASN or AFR
`clump_kb`	<what param does>
`clump_r2`	<what param does>
`clump_p`	<what param does>
`...`	Optional arguments to be passed to susie_rss

Value

List

Create format for HPC pipeline

Description

Takes raw files and aligns them to reference. Important if files don't have chr:pos already

Usage

write_out(harmonised, path)
write_out(harmonised, path)

Arguments

`harmonised`	Output from /codeharmonise_against_ref
`path`	Path to write out json file and txt file

Package 'gwasglue'

Help Index

Perform LD clumping

Description

Usage

Arguments

Value

Perform conditional analysis using GCTA COJO

Description

Usage

Arguments

Value

Write vcf file to cojo sumstat file

Description

Usage

Arguments

Value

Convert coloc dataset to gassocplot dataset

Description

Usage

Arguments

Value

Generate coloc dataset from vcf files

Description

Usage

Arguments

Value

Generate data for fine mapping analysis

Description

Usage

Arguments

Value

Create exposure or outcome data format for TwoSampleMR from vcf

Description

Usage

Arguments

Value

Generic harmonisation function

Description

Usage

Arguments

Details

Value

Harmonise gwas alleles to be same as reference

Description

Usage

Arguments

Value

Generate coloc dataset from the IEU GWAS database

Description

Usage

Arguments

Value

Generate data for analysis in various finemapping methods

Description

Usage

Arguments

Value

Generate regional plot for ieugwasr

Description

Usage

Arguments

Value

Convert output from query to TwoSampleMR format

Description

Usage

Arguments

Value

Check a GWAS dataset against a reference known to be on the forward strand

Description

Usage

Arguments

Details

Value

Create a harmonised dataset from lists of vcf files

Description

Usage

Arguments

Value

For a set of variants map to LD regions