Title: | GenEpi Utility Functions |
---|---|
Description: | The genepi.utils package is a collection of utility functions for working with genetic epidemiology data. |
Authors: | Nicholas Sunderland [aut, cre] |
Maintainer: | Nicholas Sunderland <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.33 |
Built: | 2024-11-24 06:14:42 UTC |
Source: | https://github.com/nicksunderland/genepi.utils |
as.data.table
as.data.table(object, ...)
as.data.table(object, ...)
object |
GWAS object to covert to data.table |
... |
argument for data.table generic, ignored in this implementation |
Chromosome & position data to variant RSID
chrpos_to_rsid( dt, chr_col, pos_col, ea_col = NULL, nea_col = NULL, flip = "allow", alt_rsids = FALSE, build = "b37_dbsnp156", dbsnp_dir = genepi.utils::which_dbsnp_directory(), parallel_cores = parallel::detectCores(), verbose = TRUE )
chrpos_to_rsid( dt, chr_col, pos_col, ea_col = NULL, nea_col = NULL, flip = "allow", alt_rsids = FALSE, build = "b37_dbsnp156", dbsnp_dir = genepi.utils::which_dbsnp_directory(), parallel_cores = parallel::detectCores(), verbose = TRUE )
dt |
a data.frame like object, or file path, with at least columns (chrom |
chr_col |
a string column name; chromosome position |
pos_col |
a string column name; base position |
ea_col |
a string column name; effect allele |
nea_col |
a string column name; non effect allele |
flip |
a string, options: "report", "allow", "no_flip" |
alt_rsids |
a logical, whether to return additional alternate RSIDs |
build |
a string, options: "b37_dbsnp156", "b38_dbsnp156" (corresponds to the appropriate data directory) |
dbsnp_dir |
a string file path to the dbSNP .fst file directory - see setup documentation |
parallel_cores |
an integer, the number of cores/workers to set up the |
verbose |
a logical, runtime reporting |
a data.table with an RSID column (or a list: 1-data.table; 2-list of alternate rsids IDs)
Clump variants in a GWAS using PLINK2 and an appropriate reference panel.
For example, the 1000 genomes phase 3 data can be downloaded from the PLINK
website (https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg). To remove
duplicates you can run:
plink2
–pfile all_phase3
–rm-dup force-first
–make-pgen
–out all_phase3_nodup
The path to the reference (without the plink extensions) should be passed as the
plink_ref
argument. The path to the plink2 executable should be passed as the
plink2
argument.
clump( gwas, p1 = 1, p2 = 1, r2 = 0.1, kb = 250, plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), logging = TRUE, parallel_cores = parallel::detectCores() )
clump( gwas, p1 = 1, p2 = 1, r2 = 0.1, kb = 250, plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), logging = TRUE, parallel_cores = parallel::detectCores() )
gwas |
a data.frame like object with at least columns rsid, ea, oa, and p |
p1 |
a numeric, the p-value threshold for inclusion as a clump |
p2 |
a numeric, the p-value threshold for incorporation into a clump |
r2 |
a numeric, the r2 value |
kb |
a integer, the window for clumping |
plink2 |
a string, path to the plink executable |
plink_ref |
a string, path to the pfile genome reference |
logging |
a logical, whether to set the plink logging information as attributes ( |
parallel_cores |
an integer, how many cores / threads to use |
a data.table with additional columns index
(logical, whether the variant is an index SNP) and clump
(integer, the clump the variant belongs to)
Clump MR object exposure
clump_mr( x, p1 = 1, p2 = 1, r2 = 0.001, kb = 250, plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), parallel_cores = parallel::detectCores() )
clump_mr( x, p1 = 1, p2 = 1, r2 = 0.001, kb = 250, plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), parallel_cores = parallel::detectCores() )
x |
an object of class MR description |
p1 |
a numeric, the p-value threshold for inclusion as a clump |
p2 |
a numeric, the p-value threshold for incorporation into a clump |
r2 |
a numeric, the r2 value |
kb |
a integer, the window for clumping |
plink2 |
a string, path to the plink executable |
plink_ref |
a string, path to the pfile genome reference |
parallel_cores |
an integer, how many cores / threads to use |
Run collider bias assessment
collider_bias( x, bias_method = "dudbridge", r2 = 0.001, p1 = 5e-08, kb = 250, plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), ip = 0.001, pi0 = 0.6, sxy1 = 1e-05, bootstraps = 100, weighted = TRUE, method = "Simex", B = 1000, seed = 2023 )
collider_bias( x, bias_method = "dudbridge", r2 = 0.001, p1 = 5e-08, kb = 250, plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), ip = 0.001, pi0 = 0.6, sxy1 = 1e-05, bootstraps = 100, weighted = TRUE, method = "Simex", B = 1000, seed = 2023 )
x |
an object of class MR |
bias_method |
a character or character vector, one or more of c("dudbridge", "slopehunter", "mr_ivw", "mr_egger", "mr_weighted_median", "mr_weighted_mode") |
r2 |
a numeric 0-1, r2 used for clumping - set all clumping params to NA to turn off |
p1 |
a numeric 0-1, p1 used for clumping - set all clumping params to NA to turn off |
kb |
an integer, kb used for clumping - set all clumping params to NA to turn off |
plink2 |
a path, the plink2 binary |
plink_ref |
a path, the reference genome pfile |
ip |
a numeric 0-1, threshold for removing incidence variants; see |
pi0 |
a numeric 0-1, proportion of SNPs in the incidence only cluster; see |
sxy1 |
a numeric, the covariance between incidence and progression Gip SNPs; see |
bootstraps |
an integer, number of bootstraps to estimate SE; see |
weighted |
see |
method |
see |
B |
see |
seed |
seed, for reproducibility |
Column object
Column(name = class_missing, alias = class_missing, type = class_missing)
Column(name = class_missing, alias = class_missing, type = class_missing)
name |
the standard column name |
alias |
a character vector of aliases (other column names) for this column |
type |
a character, an atomic R type |
an S7 class genepi.utils::Column object
name
the standard column name
alias
a character vector of aliases (other column names) for this column
type
a character, an atomic R type
A mapping to the standardised column names used in this package. Available names: 'rsid', 'chr', 'bp', 'ea', 'oa', 'eaf', 'p', 'beta', 'se', 'or', 'or_se', 'or_lb', 'or_ub', 'beta_lb', 'beta_ub', 'z', 'q_stat', 'i2', 'nstudies', 'n'
ColumnMap(x)
ColumnMap(x)
x |
either a list of |
an S7 class genepi.utils::ColumnMap object
map
a list of Column
class objects
Corrected Weighted Least Squares collider bias method
cwls(x, ...)
cwls(x, ...)
x |
an object of class MR |
... |
parameter sink, additional ignored parameters |
an object of class MRResult
Dudbridge collider bias method
dudbridge( x, weighted = TRUE, prune = NULL, method = "Simex", B = 1000, lambda = seq(0.25, 5, 0.25), seed = 2018, ... )
dudbridge( x, weighted = TRUE, prune = NULL, method = "Simex", B = 1000, lambda = seq(0.25, 5, 0.25), seed = 2018, ... )
x |
an object of class MR |
weighted |
see indexevent::indexevent() |
prune |
see indexevent::indexevent() |
method |
see indexevent::indexevent() |
B |
see indexevent::indexevent() |
lambda |
see indexevent::indexevent() |
seed |
see indexevent::indexevent() |
... |
parameter sink, additional ignored parameters |
an object of class MRResult
Plotting reported effect allele frequencies (EAF) against a reference set to identify study variants which significantly deviate from the expected population frequencies.
eaf_plot( gwas, eaf_col = "EAF", ref_eaf_col = "EUR_EAF", tolerance = 0.2, colours = list(missing = "#5B1A18", outlier = "#FD6467", within = "#7294D4"), title = NULL, facet_grid_row_col = NULL, facet_grid_col_col = NULL )
eaf_plot( gwas, eaf_col = "EAF", ref_eaf_col = "EUR_EAF", tolerance = 0.2, colours = list(missing = "#5B1A18", outlier = "#FD6467", within = "#7294D4"), title = NULL, facet_grid_row_col = NULL, facet_grid_col_col = NULL )
gwas |
a data.table |
eaf_col |
a string, the column containing the study EAF data |
ref_eaf_col |
a string, the column containing the reference EAF data |
tolerance |
a numeric, frequency difference that determines outliers |
colours |
a 3 element list of colour codes, e.g. list(missing="#5B1A18", outlier="#FD6467", within="#7294D4") |
title |
a string, the plot title |
facet_grid_row_col |
(optional), a column by which to facet the plot by rows |
facet_grid_col_col |
(optional), a column by which to facet the plot by columns |
a ggplot
Generates rows of synthetic GWAS summary stats data. Useful for developing plotting and other methods. No attempt is made to make this data at all realistic.
generate_random_gwas_data(n, seed = 2023)
generate_random_gwas_data(n, seed = 2023)
n |
number of fake variants to generate |
seed |
seed, for reproducibility |
a data.table with columns SNP, CHR, BP, OA, EA, EAF, BETA, P, EUR_EAF
Extract variants from plink binary
get_pfile_variants( snp, win_kb, chr, from_bp, to_bp, plink2 = genepi.utils::which_plink2(), pfile = genepi.utils::which_1000G_reference(build = "GRCh37") )
get_pfile_variants( snp, win_kb, chr, from_bp, to_bp, plink2 = genepi.utils::which_plink2(), pfile = genepi.utils::which_1000G_reference(build = "GRCh37") )
snp |
character, an rsid |
win_kb |
numeric, window size around snp in kb |
chr |
character, the chromosome (use instead of snp and win_kb, not in addition) |
from_bp |
numeric, the start base position (use instead of snp and win_kb, not in addition) |
to_bp |
numeric, the end base position (use instead of snp and win_kb, not in addition) |
plink2 |
character / path, the plink2 executable |
pfile |
character / path, the plink pfile set |
a data.table
Get proxies for variants from plink binary
get_proxies( x, stat = "r2-unphased", win_kb = 125, win_r2 = 0.8, win_ninter = Inf, proxy_eaf = NULL, plink2 = genepi.utils::which_plink2(), pfile = genepi.utils::which_1000G_reference(build = "GRCh37"), ... )
get_proxies( x, stat = "r2-unphased", win_kb = 125, win_r2 = 0.8, win_ninter = Inf, proxy_eaf = NULL, plink2 = genepi.utils::which_plink2(), pfile = genepi.utils::which_1000G_reference(build = "GRCh37"), ... )
x |
a character vector of rsids or a GWAS object |
stat |
character, the R stat to calculate, one of "r2-unphased", "r2-phased", "r-unphased", "r-phased" |
win_kb |
numeric, the window to look in around the variants |
win_r2 |
numeric, the lower r2 limit to include in output, (for –r-phased and –r-unphased, this means |r|≥sqrt(0.2)) |
win_ninter |
numeric, controls the maximum number of other variants allowed between variant-pairs in the report. Inf = off. |
proxy_eaf |
numeric, the minimal effect allele frequency for proxy variants. NULL = eaf filtering off. |
plink2 |
character / path, the plink2 executable |
pfile |
character / path, the plink pfile set |
... |
other arguments (see below) |
snps |
a character vector (available if |
then |
a string (available if |
a data.table of variants and their proxies (if x
is a character
vector) or a GWAS
object if
x
is a GWAS
object.
A GWAS object is a container for vectors of GWAS data, a correlation matrix, and meta-data regarding quality control procedures applied at the point of object creation / data import.
GWAS( dat, map = "default", drop = FALSE, fill = FALSE, fill_rsid = FALSE, missing_rsid = "fill_CHR:BP", parallel_cores = parallel::detectCores(), dbsnp_dir = genepi.utils::which_dbsnp_directory(), filters = list(beta_invalid = "!is.infinite(beta) & abs(beta) < 20", eaf_invalid = "eaf > 0 & eaf < 1", p_invalid = "!is.infinite(p)", se_invalid = "!is.infinite(se)", alleles_invalid = "!is.na(ea) & !is.na(oa)", chr_missing = "!is.na(chr)", bp_missing = "!is.na(bp)", beta_missing = "!is.na(beta)", se_missing = "!is.na(se)", p_missing = "!is.na(p)", eaf_missing = "!is.na(eaf)"), reference = NULL, ref_map = NULL, verbose = TRUE, ... )
GWAS( dat, map = "default", drop = FALSE, fill = FALSE, fill_rsid = FALSE, missing_rsid = "fill_CHR:BP", parallel_cores = parallel::detectCores(), dbsnp_dir = genepi.utils::which_dbsnp_directory(), filters = list(beta_invalid = "!is.infinite(beta) & abs(beta) < 20", eaf_invalid = "eaf > 0 & eaf < 1", p_invalid = "!is.infinite(p)", se_invalid = "!is.infinite(se)", alleles_invalid = "!is.na(ea) & !is.na(oa)", chr_missing = "!is.na(chr)", bp_missing = "!is.na(bp)", beta_missing = "!is.na(beta)", se_missing = "!is.na(se)", p_missing = "!is.na(p)", eaf_missing = "!is.na(eaf)"), reference = NULL, ref_map = NULL, verbose = TRUE, ... )
dat |
a valid string file path to be read by |
map |
a valid input to the |
drop |
a logical, whether to drop data source columns not in the column |
fill |
a logical, whether to add (NAs) missing columns present in the column |
fill_rsid |
either FALSE or a valid argument for the chrpos_to_rsid |
missing_rsid |
a string, how to handle missing rsids: one of "fill_CHR:BP", "fill_CHR:BP_OA_EA", "overwrite_CHR:BP", "overwrite_CHR:BP:OA:EA", "none", or "leave" |
parallel_cores |
an integer, number of cores to used for RSID mapping, default is maximum machine cores |
dbsnp_dir |
path to the dbsnp directory of fst files see chrpos_to_rsid |
filters |
a list of named strings, each to be evaluated as an expression to filter the data during the quality control steps (above) |
reference |
a valid string file path to be read by |
ref_map |
a valid input to the |
verbose |
a logical, whether to print details |
... |
variable capture to be passed to the constructor, e.g. individual vectors for the slots, rather that |
an S7 class genepi.utils::GWAS object
rsid
character, variant ID - usually in rs12345 format, however this can be changed with the missing_rsid
argument
chr
character, chromosome identifier
bp
integer, base position
ea
character, effect allele
oa
character, other allele
eaf
numeric, effect allele frequency
beta
numeric, effect size
se
numeric, effect size standard error
p
numeric, p-value
n
integer, total number of samples
ncase
integer, number of cases
strand
character, the strand + or -
imputed
logical, whether imputed
info
numeric, the info score
q
numeric, the Q statistic for meta analysis results
q_p
numeric, the Q statistic P-value
i2
numeric, the I2 statistic
proxy_rsid
character, proxy variant ID
proxy_chr
character, proxy chromosome identifier
proxy_bp
integer, proxy base position
proxy_ea
character, proxy effect allele
proxy_oa
character, proxy other allele
proxy_eaf
numeric, proxy effect allele frequency
proxy_r2
numeric, proxy r2 with rsid
trait
character, the GWAS trait
id
character, the GWAS identifier
source
character, data source; either the file path, or "data.table" if loaded directly
correlation
matrix, a correlation matrix of signed R values between variants
map
ColumnMap
, a mapping of class ColumnMap
qc
list, a named list of filters; name is the filter expression and value is an integer vector of rows that fail the filter
Harmonise GWAS
harmonise_gwas(gwas, ref, join = "chr:bp", action = 2, ...)
harmonise_gwas(gwas, ref, join = "chr:bp", action = 2, ...)
gwas |
a GWAS object, data.table, or file path |
ref |
a GWAS object, data.table, or file path |
join |
a character, either 'chr:pos'(default) or 'rsid', the columns to perform the join on |
action |
an integer, 1-, 2-, or 3- |
... |
additional parameters below |
rmap |
a named vector or list, mapping reference input, standard name = old name (active if using data.table or file path inputs) |
gmap |
a named vector or list, mapping gwas input, standard name = old name (active if using data.table or file path inputs) |
a data.table, harmonised GWAS data
Based on the ieugwasr function (see reference)
ld_matrix( dat, colmap = NULL, method = "r", plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), ukbb_ref = NULL )
ld_matrix( dat, colmap = NULL, method = "r", plink2 = genepi.utils::which_plink2(), plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"), ukbb_ref = NULL )
dat |
data.frame like object, or file path, with at least column |
colmap |
a list, mapping to columns list(rsid=?,ea=?,oa=?,beta=?,eaf=?) where ? can be a character vector in the case of harmonised datasets. Warning - it is assumed that harmonised datasets are indeed harmonised, if not, any unharmonised variants will be inappropriately removed. |
method |
a string, either |
plink2 |
a string, path to the plink executable |
plink_ref |
a string, path to the pfile genome reference |
ukbb_ref |
path to a UKBB reference file |
an LD matrix if only variants provided, else if alleles provided a list(dat=harmonised data, ld_mat=ld_matrix)
Determine GWAS build and liftover to required build. This is the same function from the GwasDataImport package, the only difference being that you can specify the build rather than it trying to guess the build (which fails if you are trying to liftover small segments of the genome).
lift( gwas, from = "Hg19", to = "Hg38", snp_col = "snp", chr_col = "chr", pos_col = "pos", ea_col = "ea", oa_col = "oa", remove_duplicates = TRUE )
lift( gwas, from = "Hg19", to = "Hg38", snp_col = "snp", chr_col = "chr", pos_col = "pos", ea_col = "ea", oa_col = "oa", remove_duplicates = TRUE )
gwas |
a data.table, or file path, chr, pos, snp name, effect allele, non-effect allele columns |
from |
which build to lift from, one of c("Hg18", "Hg19", "Hg38") |
to |
which build to lift over to, one of c("Hg18", "Hg19", "Hg38") |
snp_col |
Name of SNP column name. Optional. Uses less certain method of matching if not available |
chr_col |
Name of chromosome column name. Required |
pos_col |
Name of position column name. Required |
ea_col |
Name of effect allele column name. Optional. Might lead to duplicated rows if not presented |
oa_col |
Name of other allele column name. Optional. Might lead to duplicated rows if not presented |
remove_duplicates |
a logical, whether to remove duplicate IDs |
data.table with updated position columns
https://github.com/MRCIEU/GwasDataImport
Create a Manhattan plot with ggplot2 geom_point.
manhattan( gwas, highlight_snps = NULL, highlight_win = 100, annotate_snps = NULL, colours = c("#d9d9d9", "#bfbfbf"), highlight_colour = "#e15758", highlight_shape = 16, highlight_alpha = 1, sig_line_1 = 5e-08, sig_line_2 = NULL, y_limits = c(NULL, NULL), title = NULL, subtitle = NULL, base_text_size = 14, hit_table = FALSE, max_table_hits = 10, downsample = 0.9, downsample_pval = 0.7 )
manhattan( gwas, highlight_snps = NULL, highlight_win = 100, annotate_snps = NULL, colours = c("#d9d9d9", "#bfbfbf"), highlight_colour = "#e15758", highlight_shape = 16, highlight_alpha = 1, sig_line_1 = 5e-08, sig_line_2 = NULL, y_limits = c(NULL, NULL), title = NULL, subtitle = NULL, base_text_size = 14, hit_table = FALSE, max_table_hits = 10, downsample = 0.9, downsample_pval = 0.7 )
gwas |
a data.table with a minimum of columns SNP, CHR, BP, and P |
highlight_snps |
(optional) a character vector of SNPs to highlight |
highlight_win |
(optional) a numeric, the number of kb either side of the highlight_snps to also highlight (i.e create peaks) |
annotate_snps |
(optional) a character vector of SNPs to annotate |
colours |
(optional) a character vector colour codes to be replicated along the chromosomes |
highlight_colour |
(optional) a character colour code; the colour to highlight points in |
highlight_shape |
(optional) a numeric shape code; the shape of the highlight points (see ggplot2 shape codes) |
highlight_alpha |
(optional) a numeric value between 0 and 1; the alpha of the highlighted points colour |
sig_line_1 |
(optional) a numeric value (-log10(P)) for where to draw a horizontal line |
sig_line_2 |
(optional) a numeric value (-log10(P)) for where to draw a second horizontal line |
y_limits |
(optional) a numeric length 2 vector c(min-Y, max-Y) |
title |
(optional) a string title |
subtitle |
(optional) a string subtitle |
base_text_size |
an integer, |
hit_table |
(optional) a logical, whether to display a table of top hits (lowest P values) |
max_table_hits |
(optional) an integer, how many top hits to show in the table |
downsample |
(optional) a numeric between 0 and 1, the proportion by which to downsample by, e.g. 0.6 will remove 60% of points above the downsample_pval threshold (can help increase plotting speed with minimal impact on plot appearance) |
downsample_pval |
(optional) a numeric between 0 and 1, the p-values affected by downsampling, default >0.1 |
a ggplot
Create a Miami plot. Please look carefully at the parameters as these largely map to the
manhattan()
parameters, the main difference being that you need to supply a 2 element
list of the parameter, one for the upper and one for the lower plot aspect of the Miami
plot. Some parameters are not duplicated however - see the example defaults below.
miami( gwases, highlight_snps = list(top = NULL, bottom = NULL), highlight_win = list(top = 100, bottom = 100), annotate_snps = list(top = NULL, bottom = NULL), colours = list(top = c("#d9d9d9", "#bfbfbf"), bottom = c("#bfbfbf", "#d9d9d9")), highlight_colour = list(top = "#e15758", bottom = "#4f79a7"), highlight_shape = list(top = 16, bottom = 16), sig_line_1 = list(top = 5e-08, bottom = 5e-08), sig_line_2 = list(top = NULL, bottom = NULL), y_limits = list(top = c(NULL, NULL), bottom = c(NULL, NULL)), title = NULL, subtitle = list(top = NULL, bottom = NULL), base_text_size = 14, hit_table = FALSE, max_table_hits = 10, downsample = 0.1, downsample_pval = 0.1 )
miami( gwases, highlight_snps = list(top = NULL, bottom = NULL), highlight_win = list(top = 100, bottom = 100), annotate_snps = list(top = NULL, bottom = NULL), colours = list(top = c("#d9d9d9", "#bfbfbf"), bottom = c("#bfbfbf", "#d9d9d9")), highlight_colour = list(top = "#e15758", bottom = "#4f79a7"), highlight_shape = list(top = 16, bottom = 16), sig_line_1 = list(top = 5e-08, bottom = 5e-08), sig_line_2 = list(top = NULL, bottom = NULL), y_limits = list(top = c(NULL, NULL), bottom = c(NULL, NULL)), title = NULL, subtitle = list(top = NULL, bottom = NULL), base_text_size = 14, hit_table = FALSE, max_table_hits = 10, downsample = 0.1, downsample_pval = 0.1 )
gwases |
a list of 2 data.tables |
highlight_snps |
(optional) a character vector of SNPs to highlight |
highlight_win |
(optional) a numeric, the number of kb either side of the highlight_snps to also highlight (i.e create peaks) |
annotate_snps |
(optional) a character vector of SNPs to annotate |
colours |
(optional) a character vector colour codes to be replicated along the chromosomes |
highlight_colour |
(optional) a character colour code; the colour to highlight points in |
highlight_shape |
(optional) a numeric shape code; the shape of the highlight points (see ggplot2 shape codes) |
sig_line_1 |
(optional) a numeric value (-log10(P)) for where to draw a horizontal line |
sig_line_2 |
(optional) a numeric value (-log10(P)) for where to draw a second horizontal line |
y_limits |
(optional) a numeric length 2 vector c(min-Y, max-Y) |
title |
(optional) a string title |
subtitle |
(optional) a string subtitle |
base_text_size |
an integer, |
hit_table |
(optional) a logical, whether to display a table of top hits (lowest P values) |
max_table_hits |
(optional) an integer, how many top hits to show in the table |
downsample |
(optional) a numeric between 0 and 1, the proportion by which to downsample by, e.g. 0.6 will remove 60% of points above the downsample_pval threshold (can help increase plotting speed with minimal impact on plot appearance) |
downsample_pval |
(optional) a numeric between 0 and 1, the p-values affected by downsampling, default >0.1 |
a ggplot
An MR object is a container for vectors and matrices of 2 or more GWAS data.
MR( exposure, outcome, harmonise_strictness = 2, correlation = NULL, verbose = TRUE )
MR( exposure, outcome, harmonise_strictness = 2, correlation = NULL, verbose = TRUE )
exposure |
a |
outcome |
a |
harmonise_strictness |
an integer (1,2,3) corresponding to the TwoSampleMR harmonisation options of the same name. |
correlation |
a matrix, correlation matrix of signed R values between variants |
verbose |
a logical, print more information |
an S7 class genepi.utils::MR object
snps
character, variant ID
chr
character, chromosome identifier
bp
integer, base position
ea
character, effect allele
oa
character, other allele
eafx
numeric, exposure effect allele frequency
nx
integer, exposure total number of samples
ncasex
integer, exposure number of cases
bx
numeric, exposure effect size
bxse
numeric, exposure effect size standard error
px
numeric, exposure p-value
eafy
numeric, exposure effect allele frequency
ny
integer, exposure total number of samples
ncasey
integer, exposure number of cases
by
numeric, exposure effect size
byse
numeric, exposure effect size standard error
py
numeric, exposure p-value
exposure_id
character, the GWAS identifier
exposure
character, the GWAS exposure
outcome_id
character, the GWAS identifier
outcome
character, the GWAS outcome
group
integer, grouping variable used for plotting
index_snp
logical, whether the variant is an index variant (via clumping)
proxy_snp
character, the id of the proxy snp
ld_info
logical, whether there is LD information
info
data.frame, information about the loaded GWAS objects
correlation
matrix, a correlation matrix of signed R values between variants
Run Egger MR
mr_egger(x, corr = FALSE, ...)
mr_egger(x, corr = FALSE, ...)
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
... |
parameter sink, not used |
Run IVW MR
mr_ivw(x, corr = FALSE, ...)
mr_ivw(x, corr = FALSE, ...)
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
... |
parameter sink, not used |
Run PC-GMM MR
mr_pcgmm(x, corr = TRUE, ...)
mr_pcgmm(x, corr = TRUE, ...)
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
... |
parameter sink, not used |
MR results to data.table
mr_results_to_data_table(x)
mr_results_to_data_table(x)
x |
MRResult object to covert to data.table |
Run weighted median MR
mr_weighted_median(x, corr = FALSE, ...)
mr_weighted_median(x, corr = FALSE, ...)
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
... |
parameter sink, not used |
Run weighted mode MR
mr_weighted_mode(x, corr = FALSE, ...)
mr_weighted_mode(x, corr = FALSE, ...)
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
... |
parameter sink, not used |
A plotting wrapper for the coloc
package. Produces a ggplot for either
the prior or posterior probability sensitivity analyses. See the
coloc
package vignettes for details.
plot_coloc_probabilities(coloc, rule = "H4 > 0.5", type = "prior", row = 1)
plot_coloc_probabilities(coloc, rule = "H4 > 0.5", type = "prior", row = 1)
coloc |
coloc object, output from |
rule |
a string, a valid rule indicating success e.g. "H4 > 0.5" |
type |
a string, either |
row |
an integer, row in a |
a ggplot
Plot MR results
plot_mr(mr, res)
plot_mr(mr, res)
mr |
an object of class MR |
res |
a data.table output from run_mr or other MR methods |
QQ plot
qq_plot( gwas, pval_col = "p", colours = list(raw = "#2166AC"), title = NULL, subtitle = NULL, plot_corrected = FALSE, facet_grid_row_col = NULL, facet_grid_col_col = NULL, facet_nrow = NULL, facet_ncol = NULL )
qq_plot( gwas, pval_col = "p", colours = list(raw = "#2166AC"), title = NULL, subtitle = NULL, plot_corrected = FALSE, facet_grid_row_col = NULL, facet_grid_col_col = NULL, facet_nrow = NULL, facet_ncol = NULL )
gwas |
a data.frame like object or valid file path |
pval_col |
the P value column |
colours |
a 2 element list of colour codes (1-the uncorrected points, 2-the GC corrected points) |
title |
a string, the title for the plot |
subtitle |
a string, the subtitle for the plot |
plot_corrected |
a logical, whether to apply and plot the lambda correction |
facet_grid_row_col |
a string, the column name in |
facet_grid_col_col |
a string, the column name in |
facet_nrow |
an integer, passed to facet_wrap, the number of rows to facet by (if only facet_grid_row_col is provided) |
facet_ncol |
an integer, passed to facet_wrap, the number of cols to facet by (if only facet_grid_col_col is provided) |
a ggplot
Reset index SNP
reset_index_snp(x)
reset_index_snp(x)
x |
an object of class MR |
Run MR
run_mr( x, corr = FALSE, methods = c("mr_ivw", "mr_egger", "mr_weighted_median", "mr_weighted_mode"), ... )
run_mr( x, corr = FALSE, methods = c("mr_ivw", "mr_egger", "mr_weighted_median", "mr_weighted_mode"), ... )
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
methods |
a string, one of c('mr_ivw','mr_egger','mr_weighted_median','mr_weighted_mode', 'mr_pcgmm') |
... |
parameter sink, not used |
Set the 1000G reference path
set_1000G_reference(path, build = "GRCh37")
set_1000G_reference(path, build = "GRCh37")
path |
path to the 1000G reference pfile |
build |
one of c("GRCh37", "GRCh38") |
NULL, updated config file
Set dbSNP directory
set_dbsnp_directory(path)
set_dbsnp_directory(path)
path |
path to the |
NULL, updated config file
Set the LD matrix
set_ld_mat(x, correlation)
set_ld_mat(x, correlation)
x |
an object of class MR |
correlation |
a matrix, the correlation ('r') matrix |
Set the PLINK2 path
set_plink2(path)
set_plink2(path)
path |
path to the PLINK2 executable |
NULL, updated config file
Slope-Hunter collider bias method
slopehunter( x, ip = 0.001, pi0 = 0.6, sxy1 = 1e-05, bootstraps = 100, seed = 777, ... )
slopehunter( x, ip = 0.001, pi0 = 0.6, sxy1 = 1e-05, bootstraps = 100, seed = 777, ... )
x |
an object of class MR |
ip |
see |
pi0 |
see |
sxy1 |
see |
bootstraps |
see |
seed |
see |
... |
parameter sink, additional ignored parameters |
an object of class MRResult
subset_gwas
subset_gwas(x, snps)
subset_gwas(x, snps)
x |
GWAS object |
snps |
a vector, either row indicies (integers) into the GWAS object (e.g. obtained with filters such as which(GWAS'at'p < 5e-8), or rsids (characters) to be found in the GWAS rsid slot. |
GWAS object subsetted by snps
Convert to MendelianRandomization::MRInput object
to_MRInput(x, corr = FALSE)
to_MRInput(x, corr = FALSE)
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
Convert to MendelianRandomization::MRMVInput object
to_MRMVInput(x, corr = FALSE)
to_MRMVInput(x, corr = FALSE)
x |
an object of class MR |
corr |
a logical, whether to use the correlation matrix when running MR |
Get 1000G reference path(s)
which_1000G_reference(build = NULL)
which_1000G_reference(build = NULL)
build |
one of "GRCh37" or "GRCh38", or null to return both |
a string file path, the currently set 1000G reference path
Get available dbSNP builds
which_dbsnp_builds(build = NULL)
which_dbsnp_builds(build = NULL)
build |
a dbSNP build |
a list of available dbSNP builds - name(dbSNP build): value(directory_path)
Get dbSNP directory
which_dbsnp_directory()
which_dbsnp_directory()
a string file path, the currently set dbSNP directory path
Get plink2 path
which_plink2()
which_plink2()
a string file path, the currently set plink2 path