---
title: "Estimating phenotypic correlations"
output: html_vignette
vignette: >
  %\VignetteIndexEntry{Estimating phenotypic correlations}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: refs.bib
---

## Covariance and Phenotypic correlations in MVMR

Summary data multivariable Mendelian randomization (MVMR) requires estimation of the covariance between the effect of each instrument on each exposure included in the MVMR model for effective strength testing, heterogeneity estimation, and pleiotropy robust estimation (@sanderson-statsmed-2021).

This covariance is dependent on the level of sample overlap between the GWAS used for each exposure and is zero if there is no sample overlap. In that case `gencov = 0` can be used in the `pleiotropy_mvmr()` and `strength_mvmr()` functions.

If there is any level of sample overlap between the samples used for the exposure GWAS then accurate estimation of the Q-statistic, conditional F-statistics and robust estimation with weak or pleiotropic SNPs (under the assumption of balanced pleiotropy) relies on an estimation of the covariance between the errors in the GWAS results between the exposures for each SNP. In a MVMR estimation with two exposures $X_1$ and $X_2$ the estimated effects $\hat{\beta}_1$ and $\hat{\beta}_2$ are distributed as;

$$\left[\begin{matrix}
\hat{\beta}_{X_1}\\
\hat{\beta}_{X_2}
\end{matrix}\right] \sim N\left(\left[
\begin{matrix}
\beta_{X_1}\\
\beta_{X_2}
\end{matrix}\right],\left[
\begin{matrix}
\sigma_{x_1}^2 & \sigma_{X_1,X_2}\\
\sigma_{X_1,X_2} & \sigma_{X_2}^2
\end{matrix}\right]\right)$$

The estimate of the covariance $\hat{\sigma}_{X_1,X_2}$ is not available from standard GWAS summary statistics and so needs to be obtained elsewhere. This can be done in a couple of different ways:

1.  *Estimating the covariances from individual level data.*

    If there is complete sample overlap between the exposure samples, and the individual level data on which the GWAS summary statistics were generated is available, then the covariance required can be estimated directly from the individual level data by obtaining the covariance ($\hat{\sigma}_{12,i}$) between the residuals $\varepsilon_{1i}$ and $\varepsilon_{2i}$ from the regressions;

    $$x_1 = \alpha_1 + \gamma_{1i} G_i + \varepsilon_{1i}$$

    and

    $$x_2 = \alpha_2 + \gamma_{2i}G_i + \varepsilon_{2i}$$

    These estimates can be obtained from the `snpcov_mvmr()` function. Estimation in this way does not account for population structure controls included in the original GWAS which may affect the estimates of the covariance of the resulting residuals.

2.  ***RECOMMENDED:** Estimate* $\lambda$

    The distribution of the estimated effects can be rewritten as;

    $$\left[\begin{matrix}
    \hat{\beta}_{X_1}\\
    \hat{\beta}_{X_2}
    \end{matrix}\right] \sim N\left(\left[
    \begin{matrix}
    \beta_{X_1}\\
    \beta_{X_2}
    \end{matrix}\right],\left[
    \begin{matrix}
    \sigma_{x_1}^2 & \lambda\sigma_{X_1}\sigma_{X_2}\\
    \lambda\sigma_{X_1}\sigma_{X_2} & \sigma_{X_2}^2
    \end{matrix}\right]\right)$$

    where;

    $$\lambda = \frac{n_{\text{overlap}}\rho}{\sqrt{n_{X_1}n_{X_2}}}$$

    $\rho$ is the correlation between $X_1$ and $X_2$, $n_{X_1}$ is the sample size for $X_1$, $n_{X_2}$ is the sample size for $X_2$ and $n_{\text{overlap}}$ is the sample size for the overlap between the samples for $X_1$ and $X_2$.

    @forde-plosgen-2026 provide a function to estimate this value of $\lambda$ using the null SNPs for both $X_1$ and $X_2$ from the GWAS summary statistics. It therefore does not require access to any individual level data. This method works best if the full summary statistics are available. The function is `est_lambda()` in the [mr.simss package](https://amandaforde.github.io/mr.simss/).

    This value of $\lambda$ should be used in the `phenocov_mvmr()` function as the `pcor` argument to calculate the values $\hat{\sigma}_{X_1,X_2}$ for use in the required functions in the MVMR package.

3.  *Obtain* $\rho$ *from phenotypic data.*

    If there is full sample overlap between the GWAS for $X_1$ and $X_2$ then

    $$\lambda = \rho $$

    and so an estimate of $\rho$ obtained from the phenotypic data that was used to generate the GWAS summary statistics can be used in place of $\lambda$. This requires access to the individual level data that was used to estimate the GWAS summary statistics.

## References