Title: | Latent Heritable Confounder - Mendelian Randomisation |
---|---|
Description: | lhcMR esimates a causal effect between two traits while accounting for a possible latent heritable confounder acting on them, as well as sample overlap. |
Authors: | Liza Darrous [aut, cre] |
Maintainer: | Liza Darrous <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9000 |
Built: | 2024-11-20 03:35:10 UTC |
Source: | https://github.com/LizaDarrous/lhcMR |
Calculate starting points to be used in the likelihood function optimisation
calculate_SP( input.df, trait.names, run_ldsc = TRUE, run_MR = TRUE, saveRFiles = TRUE, hm3 = NA, ld = NA, nStep = 2, SP_single = 3, SP_pair = 50, SNP_filter = 10, SNP_filter_ldsc = NA, nCores = 1, M = 1e+07 )
calculate_SP( input.df, trait.names, run_ldsc = TRUE, run_MR = TRUE, saveRFiles = TRUE, hm3 = NA, ld = NA, nStep = 2, SP_single = 3, SP_pair = 50, SNP_filter = 10, SNP_filter_ldsc = NA, nCores = 1, M = 1e+07 )
input.df |
The resulting data frame from merge_sumstats(), where the effect size, SE, RSID and other columns are present, in addition to columns representing LD scores, weights and local LD structure |
trait.names |
Vector containing the trait names in the order they were used in merge_sumstats(): Exposure, Outcome |
run_ldsc |
Boolean. Whether GenomicSEM::ldsc should be run to obtain the cross trait-intercept (i_XY). If FALSE, a random value will be generated. Default value = TRUE |
run_MR |
Boolean. Whether TwoSampleMR::mr should be run to obtain the bidirectional causal effects (axy_MR, ayx_MR). If FALSE, random values will be generated. Default value = TRUE |
saveRFiles |
Boolean, whether to write the results of GenomicSEM::ldsc,TwoSampleMR::mr, and the single trait analysis of LHC-MR (returns trait intercept and polygenicity) Default value = TRUE |
hm3 |
Path to the input file (HAPMAP3 SNPs) required by GenomicSEM::ldsc |
ld |
Path to the input file (LD scores) required by GenomicSEM::ldsc |
nStep |
Can take two numerical values: 1 or 2. Represents the number of steps the lhcMR analysis will undertake. One single step estimates all 9 parameters simultaneously while fixing only the traits' intercepts iX and iY, while two steps estimates 7 parameters after having estimated traits' intercepts and polygenicity (iX, piX, iY, piY) from the single trait analysis and fixed their values in the likelihood optimisation and parameter estimation |
SP_single |
Numerical value indicating how many starting points should the single trait analysis use in the likelihood optimisation. Best to range between 3-5, default value = 3 |
SP_pair |
Numerical value indicating how many starting points should the pair trait analysis use in the likelihood optimisation. Best to range between 50-100, default value = 50 |
SNP_filter |
Numerical value indicating the filtering of every nth SNP to reduce large datasets and speed up analysis. Default value = 10 |
SNP_filter_ldsc |
Numerical value indicating the filtering of every nth SNP to reduce large datasets and speed up the LDSC analysis. Set to 1 if no filtering is needed, otherwise default = 10 |
nCores |
Numerical value indicating number of cores to be used in 'mclapply' to parallelise the analysis. If set to NA, then it will be calculated as 2/3 of the available cores, default value = 1 to avoid parallelisation |
M |
Numerical value indicating the number of SNPs used to calculate the LD reported in the LD file (for genotyped SNPs). Default value = 1e7 |
Returns a list containing the filtered dataset (by every SNP_filter
th SNP), the starting points to be used in the pair trait optimisation, the traits' intercepts,
the traits' polygenicity if nStep = 2, as well as some extra parameters like the cross-trait intercept and bidirectional causal effect estimated by IVW
Main trait pair analysis using LHC-MR
lhc_mr( SP_list, trait.names, partition = NA, account = NA, param = "comp", paral_method = "rslurm", nCores = NA, nBlock = 200, M = 1e+07 )
lhc_mr( SP_list, trait.names, partition = NA, account = NA, param = "comp", paral_method = "rslurm", nCores = NA, nBlock = 200, M = 1e+07 )
SP_list |
List resulting from calculate_SP. Contains the filtered dataset (by every 'SNP_filter'th SNP), the starting points to be used in the pair trait optimisation, the traits' intercepts, the traits' polygenicity if nStep = 2, as well as some extra parameters like the cross-trait intercept and bidirectional causal effect estimated by IVW |
trait.names |
Vector containing the trait names in the order they were used in merge_sumstats(): Exposure, Outcome |
partition |
String indicating the partition name to be used for the "rslurm" parallelisation - equivalent to '-p, –partition' in SLURM commands |
account |
String indicating the account name to be used for the "rslurm" parallelisation - equivalent to '-A, –account' in SLURM commands |
param |
String indicating which model the likelihood function will be optimised with, either "comp" by default or "U" for a no-confounder model |
paral_method |
String indicating which method to parallelise the optimisation over the number of sets of starting points. "rslurm" will submit the calculation to a SLURM cluster using a 'Slurm' workload manager, "lapply" will parallelise the optimisation using 'mclapply' over a set number of cores but will go sequentially over the sets of starting points and thus take more time. |
nCores |
Numerical value indicating number of cores to be used in 'mclapply' to parallelise the analysis. If not set (default value = NA), then it will be calculated as 2/3 of the available cores |
nBlock |
Numerical value indicating the number of blocks to create from the block jackknife analysis, where at each iteration one block is left out and the optimisation is ran again for a single starting point to obtain eventually 'nBlock' estimates and calculate the SE of the parameter estimates |
M |
Numerical value indicating the number of SNPs used to calculate the LD reported in the LD file (for genotyped SNPs). Default value = 1e7 |
Prints out a summary of the results
Merge summary statistics into a single input data frame
merge_sumstats( input.files, trait.names, LD.filepath, rho.filepath, mafT = 0.005, infoT = 0.99 )
merge_sumstats( input.files, trait.names, LD.filepath, rho.filepath, mafT = 0.005, infoT = 0.99 )
input.files |
|
trait.names |
|
mafT |
|
infoT |
|
LD.file |
|
rho.file |
|
Returns a data frame where the summary statistics file, the LD file, and the SNP-specific LD file are merged