Standard errors and confidence intervals of adjusted estimates

library(winnerscurse)

This third vignette briefly illustrates how our R package, winnerscurse, can be used to generate standard errors and confidence intervals for certain adjusted association estimates, obtained using the ‘discovery only’ Winner’s Curse correction methods. Here, we demonstrate the functionality of the functions, se_adjust and cl_interval, using a simulated data set of GWAS summary statistics and discuss the properties of each function. We create the same toy data set as that used in the first vignette:

set.seed(1998)
sim_dataset <- sim_stats(nsnp=10^6,h2=0.4,prop_effect=0.01,nid=50000)
## simulated GWAS summary statistics
summary_stats <- sim_dataset$disc

Standard errors of adjusted estimates

out_EB <- se_adjust(summary_data = summary_stats, method = "empirical_bayes", n_boot=10)
head(out_EB)
#>   rsid        beta          se     beta_EB      adj_se
#> 1 7815  0.04771727 0.006568299  0.04599530 0.008669139
#> 2 3965  0.04609999 0.006346232  0.04443623 0.008500261
#> 3 4998 -0.04527792 0.006458823 -0.03999539 0.010010348
#> 4 9917  0.04164616 0.006334188  0.03998556 0.004601265
#> 5 7261  0.04162686 0.006343244  0.03996389 0.008016920
#> 6 6510  0.04254046 0.006500057  0.04083638 0.009074620

out_BR_ss <- se_adjust(summary_data = summary_stats, method = "BR_ss", n_boot=10)
head(out_BR_ss)
#>   rsid        beta          se  beta_BR_ss      adj_se
#> 1 7815  0.04771727 0.006568299  0.03499261 0.006790542
#> 2 3965  0.04609999 0.006346232  0.03380467 0.006879602
#> 3 4998 -0.04527792 0.006458823 -0.03203960 0.002972575
#> 4 9917  0.04164616 0.006334188  0.02836485 0.004280070
#> 5 7261  0.04162686 0.006343244  0.02830782 0.004225294
#> 6 6510  0.04254046 0.006500057  0.02886472 0.004706990

out_FIQT <- se_adjust(summary_data = summary_stats, method = "FDR_IQT", n_boot=10)
head(out_FIQT)
#>   rsid        beta          se   beta_FIQT      adj_se
#> 1 7815  0.04771727 0.006568299  0.03422825 0.009403474
#> 2 3965  0.04609999 0.006346232  0.03307103 0.007694662
#> 3 4998 -0.04527792 0.006458823 -0.03188772 0.009541773
#> 4 9917  0.04164616 0.006334188  0.02823824 0.007188336
#> 5 7261  0.04162686 0.006343244  0.02827861 0.006387643
#> 6 6510  0.04254046 0.006500057  0.02897770 0.004577303

Note: Due to the nature of the conditional likelihood methods, i.e. adjustments are only made to association estimates of SNPs with p-values less than a chosen significance threshold, it is not possible to obtain standard errors of the conditional likelihood adjusted association estimates in the above manner.

Note: Currently, this package does not provide a way to create confidence intervals for adjusted association estimates obtained using the three methods mentioned above, empirical Bayes, FDR Inverse Quantile Transformation and bootstrap. However, one option that could be considered is to take an approach similar to that of se_adjust and obtain quantiles by means of parametric bootstrap. Unfortunately, in order to ensure accurate quantiles, this would require many n_boot simulated data sets to which the correction method of choice is applied, resulting in a very computationally intensive process. For this reason, a function that would generate suitable quantiles for each adjusted association estimate has been omitted from the R package for now.

Confidence intervals for estimates adjusted by conditional likelihood methods

out <- cl_interval(summary_data=summary_stats, alpha = 5e-8, conf_level=0.95)
head(out)
#>   rsid        beta          se    beta.cl1    beta.cl2    beta.cl3       lower
#> 1 7815  0.04771727 0.006568299  0.04709173  0.04575084  0.04642129  0.02855506
#> 2 3965  0.04609999 0.006346232  0.04549476  0.04419814  0.04484645  0.02758022
#> 3 4998 -0.04527792 0.006458823 -0.04421716 -0.04238468 -0.04330092 -0.05716545
#> 4 9917  0.04164616 0.006334188  0.03909824  0.03598923  0.03754373  0.01349641
#> 5 7261  0.04162686 0.006343244  0.03900942  0.03584977  0.03742959  0.01319315
#> 6 6510  0.04254046 0.006500057  0.03975867  0.03645266  0.03810567  0.01304579
#>         upper
#> 1  0.06008004
#> 2  0.05804426
#> 3 -0.02376903
#> 4  0.05248926
#> 5  0.05245207
#> 6  0.05358307

Note: Similar to condlike_rep discussed in the second vignette, cl_interval also uses the R function uniroot which generates values for upper and lower numerically and thus, we advise users to be cautious of uniroot’s possible failure to obtain appropriate values in certain circumstances. In practice, this failure has been witnessed to occur very rarely. However, in order to ensure a greater possibility that cl_interval will produce appropriate confidence intervals for each significant SNPs, cl_interval can only be used with data sets in which the absolute value of the largest naive z-statistic is less than 150.

In the final part of this vignette, we will evaluate and visualise the above obtained confidence intervals. Firstly, we will consider computing the coverage of these confidence intervals for the significant SNPs, i.e. we calculate what percentage of these confidence intervals contain the true association estimate, as shown below.

## Simulated true effect sizes:
true_beta <- sim_dataset$true$true_beta[out$rsid]

## coverage
coverage <- (sum((true_beta > out$lower) & (true_beta < out$upper))/length(true_beta))*100
coverage
#> [1] 96.9697

Next, we construct a simple forest plot to illustrate the adjusted association estimates and corresponding confidence intervals for each SNP in our simulated data set. Of the three conditional likelihood adjusted estimates, we include beta.cl2 as the effect size in our plot.

library(ggplot2)
library(RColorBrewer)
col <- brewer.pal(8,"Dark2")
out$index <- c(33:1)
out$pos <- out$beta > 0
ggplot(data=out, aes(y=index, x=beta.cl2, xmin=lower, xmax=upper, color=pos)) +
  geom_point() +
  geom_errorbarh(height=.5) +
  scale_color_manual(values=c(col[1],col[2])) +
  scale_y_continuous(name = "SNP ID", breaks=1:nrow(out), labels=out$rsid) + xlab(expression(paste("Effect size of sig. SNPs at  ", 5%*%10^-8))) + theme_bw() + geom_vline(xintercept=0, colour="black",linetype='dashed', alpha=.5) +  theme(text = element_text(size=10),legend.position="none")