Summary data multivariable Mendelian randomization (MVMR) requires estimation of the covariance between the effect of each instrument on each exposure included in the MVMR model for effective strength testing, heterogeneity estimation, and pleiotropy robust estimation (Sanderson et al. (2021)).
This covariance is dependent on the level of sample overlap between
the GWAS used for each exposure and is zero if there is no sample
overlap. In that case gencov = 0 can be used in the
pleiotropy_mvmr() and strength_mvmr()
functions.
If there is any level of sample overlap between the samples used for the exposure GWAS then accurate estimation of the Q-statistic, conditional F-statistics and robust estimation with weak or pleiotropic SNPs (under the assumption of balanced pleiotropy) relies on an estimation of the covariance between the errors in the GWAS results between the exposures for each SNP. In a MVMR estimation with two exposures \(X_1\) and \(X_2\) the estimated effects \(\hat{\beta}_1\) and \(\hat{\beta}_2\) are distributed as;
\[\left[\begin{matrix} \hat{\beta}_{X_1}\\ \hat{\beta}_{X_2} \end{matrix}\right] \sim N\left(\left[ \begin{matrix} \beta_{X_1}\\ \beta_{X_2} \end{matrix}\right],\left[ \begin{matrix} \sigma_{x_1}^2 & \sigma_{X_1,X_2}\\ \sigma_{X_1,X_2} & \sigma_{X_2}^2 \end{matrix}\right]\right)\]
The estimate of the covariance \(\hat{\sigma}_{X_1,X_2}\) is not available from standard GWAS summary statistics and so needs to be obtained elsewhere. This can be done in a couple of different ways:
Estimating the covariances from individual level data.
If there is complete sample overlap between the exposure samples, and the individual level data on which the GWAS summary statistics were generated is available, then the covariance required can be estimated directly from the individual level data by obtaining the covariance (\(\hat{\sigma}_{12,i}\)) between the residuals \(\varepsilon_{1i}\) and \(\varepsilon_{2i}\) from the regressions;
\[x_1 = \alpha_1 + \gamma_{1i} G_i + \varepsilon_{1i}\]
and
\[x_2 = \alpha_2 + \gamma_{2i}G_i + \varepsilon_{2i}\]
These estimates can be obtained from the snpcov_mvmr()
function. Estimation in this way does not account for population
structure controls included in the original GWAS which may affect the
estimates of the covariance of the resulting residuals.
RECOMMENDED: Estimate \(\lambda\)
The distribution of the estimated effects can be rewritten as;
\[\left[\begin{matrix} \hat{\beta}_{X_1}\\ \hat{\beta}_{X_2} \end{matrix}\right] \sim N\left(\left[ \begin{matrix} \beta_{X_1}\\ \beta_{X_2} \end{matrix}\right],\left[ \begin{matrix} \sigma_{x_1}^2 & \lambda\sigma_{X_1}\sigma_{X_2}\\ \lambda\sigma_{X_1}\sigma_{X_2} & \sigma_{X_2}^2 \end{matrix}\right]\right)\]
where;
\[\lambda = \frac{n_{\text{overlap}}\rho}{\sqrt{n_{X_1}n_{X_2}}}\]
\(\rho\) is the correlation between \(X_1\) and \(X_2\), \(n_{X_1}\) is the sample size for \(X_1\), \(n_{X_2}\) is the sample size for \(X_2\) and \(n_{\text{overlap}}\) is the sample size for the overlap between the samples for \(X_1\) and \(X_2\).
Forde et al. (2026) provide a function
to estimate this value of \(\lambda\)
using the null SNPs for both \(X_1\)
and \(X_2\) from the GWAS summary
statistics. It therefore does not require access to any individual level
data. This method works best if the full summary statistics are
available. The function is est_lambda() in the mr.simss package.
This value of \(\lambda\) should be
used in the phenocov_mvmr() function as the
pcor argument to calculate the values \(\hat{\sigma}_{X_1,X_2}\) for use in the
required functions in the MVMR package.
Obtain \(\rho\) from phenotypic data.
If there is full sample overlap between the GWAS for \(X_1\) and \(X_2\) then
\[\lambda = \rho \]
and so an estimate of \(\rho\) obtained from the phenotypic data that was used to generate the GWAS summary statistics can be used in place of \(\lambda\). This requires access to the individual level data that was used to estimate the GWAS summary statistics.