---
title: "Get started"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{get-started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(mrSampleOverlap)
```

<!-- WARNING - This vignette is generated by {fusen} from /dev/estimate_overlap_bias.Rmd: do not edit by hand -->

<!-- 
Run this 'development' chunk

Store every call to library() that you need to run chunks line by line, as in a classical Rmd for analysis
-->


<!--

# Description of your package

This will fill the description of your package.
Fill and run the content of this chunk, before anything else. 

Note: when you will use other flat templates, this part will be in a separate file. Do not be surprised!
--> 


# Estimate bias due to sample overlap

This overall goal of this package is to allow the user to estimate bias due to participant overlap in Mendelian randomization studies using the equations published in Burgess et. al. 2016 (DOI: [10.1002/gepi.21998](https://doi.org/10.1002/gepi.21998)).

<!-- 
This first section shows:

- the three parts necessary for a package: 'function', 'examples' and 'tests'.  
  + Note that the three following chunks have names accordingly.

-->


<!--
Here is an example on how to use the function.
This should be a reproducible and working example
-->


```{r examples-estimate_overlap_bias}
# Binary outcome
estimate_overlap_bias(
  samplesize_exposure = 361194,
  samplesize_outcome = 1125328,
  case_prop = 0.035,
  rsq_exposure = 0.068,
  n_variants = 196,
  ols_bias = 0.2,
  overlap_prop = 0.3
)

# Continuous outcome
estimate_overlap_bias(
  samplesize_exposure = 361194,
  samplesize_outcome = 1125328,
  rsq_exposure = 0.068,
  n_variants = 196,
  ols_bias = 0.2,
  overlap_prop = 0.3
)
```

<!--
Here are some unit tests to verify the function works as expected.
-->


## Complex Example

Here, we will use the `estimate_overlap_bias` function in a more complex example, to estimate the bias across a range of possible values of sample overlap and observational bias. 

First, we will use the `TwoSampleMR` package to query the [MRC-IEU OpenGWAS Project](https://gwas.mrcieu.ac.uk/) for summary GWAS data to use for our exposure and outcome. We will consider LDL cholesterol (Willer et. al. 2013) as our exposure, and Coronary Artery Disease (Van der Harst et. al. 2017; 122,733 cases and 424,528 controls) as our outcome:

```{r eval = FALSE}
library(tidyverse)
library(TwoSampleMR)
library(mrSampleOverlap)

# extract genetic instruments for BMI
ldl_exposure <- extract_instruments(outcomes = "ieu-a-300")

# extract corresponding outcome data for coronary artery disease
cad_outcome <- extract_outcome_data(snps = ldl_exposure$SNP, outcomes = "ebi-a-GCST005195")

# harmonize effect alleles, and keep only alleles present in both exposure and outcome data
dat_harmonized <- harmonise_data(ldl_exposure, cad_outcome) %>%
  filter(mr_keep == TRUE)
```

Next, we use `TwoSampleMR::add_rsq()` to add the R^2^ value necessary to calculate bias, and summarize:

```{r eval = FALSE}
dat_summarized <- dat_harmonized %>%
  add_rsq() %>%
  group_by(exposure) %>%
  summarize(rsq_exposure = sum(rsq.exposure), n_variants = n(), samplesize_exposure = max(samplesize.exposure), samplesize_outcome = max(samplesize.outcome))
```

We can use the `tidyr::crossing()` function to generate a grid containing a range of values for sample overlap and observational bias

```{r eval = FALSE}
grid <- tidyr::crossing(overlap_prop = seq(0, 1, 0.1),
                        ols_bias = seq(0, 1, 0.2))
```

Finally, we can estimate bias in our MR estimates using the `estimate_overlap_bias()` function:

```{r eval = FALSE}
bias_res <- dat_summarized %>%
  crossing(grid) %>%
  mutate(res = estimate_overlap_bias(samplesize_exposure = samplesize_exposure, samplesize_outcome = samplesize_outcome, n_variants = n_variants, rsq_exposure = rsq_exposure, overlap_prop = overlap_prop, ols_bias = ols_bias, case_prop = 122733/547261)) %>%
  unnest(res)
```

We can optionally plot our results. We see that as the proportion of sample overlap increases, so does type 1 error, while bias remains relatively small. Type 1 error and bias are also magnified as the bias in the observational estimate increases:

```{r eval = FALSE}
bias_res %>%
  split_exposure() %>%
  pivot_longer(cols = c(bias, type1_error)) %>%
  ggplot(aes(overlap_prop, value, group = ols_bias, color = as.character(ols_bias))) +
    geom_point() +
    geom_line() +
    facet_grid(rows = vars(exposure), 
               cols = vars(name),
               scales = "free_y") +
  labs(x = "Proportion of Overlapping Participants",
       y = "Value") +
  scale_color_discrete(name = "Bias in \nObservational \nEstimate") +
  theme_bw(base_size = 14) 
```

# Estimate F-statistic

This is a generic function which can be used to estimate the F-statistic. Optionally, the function will return the lower limit of the one-sided 95% confidence interval of the F-statistic, which may represent a more conservative/less optimistic estimate.
    

```{r example-estimate_f}
estimate_f(samplesize_exposure = 361194, n_variants = 196, rsq_exposure = 0.068, lci_95 = FALSE)

# return lower bound of one-sided 95% confidence interval of F-statistic
estimate_f(samplesize_exposure = 361194, n_variants = 196, rsq_exposure = 0.068, lci_95 = TRUE)
```

  
<!-- 

# Inflate your package

You're one inflate from paper to box.
Build your package from this very Rmd using `fusen::inflate()` 
-->


<!-- 
- Verify your `"DESCRIPTION"` file has been updated
- Verify your function is in `"R/"` directory
- Verify your test is in `"tests/testthat/"` directory
- Verify this Rmd appears in `"vignettes/"` directory 
-->