Title: | Adjustment for index event bias |
---|---|
Description: | Adjusts association statistics for index event bias in the context of a genome-wide association study for a subsequent event. |
Authors: | Frank Dudbridge |
Maintainer: | Frank Dudbridge <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2025-01-10 05:16:46 UTC |
Source: | https://github.com/DudbridgeLab/indexevent |
Given effect sizes and standard errors for predictors of an index trait and a subsequent trait, this function adjusts the statistics for the subsequent trait for selection bias through the index trait.
indexevent( xbeta, xse, ybeta, yse, weighted = T, prune = NULL, method = c("CWLS", "Hedges-Olkin", "Simex"), tol = 1e-06, B = 10, lambda = seq(0.25, 5, 0.25), seed = 2018 )
indexevent( xbeta, xse, ybeta, yse, weighted = T, prune = NULL, method = c("CWLS", "Hedges-Olkin", "Simex"), tol = 1e-06, B = 10, lambda = seq(0.25, 5, 0.25), seed = 2018 )
xbeta |
Vector of effects on the index trait |
xse |
Vector of standard errors of |
ybeta |
Vector of effects on the subsequent trait |
yse |
Vector of standard errors of |
weighted |
If true (default), regression of |
prune |
Vector containing the indices of an approximately independent subset of the predictors in |
method |
Method to adjust for regression dilution (weak instruments) in the regression of |
B |
Number of simulations performed in each stage of the Simex adjustment. |
lambda |
Vector of lambdas for which the Simex simulations are performed. |
seed |
Random number seed for the Simex adjustment |
Effect sizes are on a linear scale, so could be the coefficients from linear regression, or log odds ratios, or log hazard ratios. Effects on the subsequent trait are regressed on the effects on the index trait. By default, the regression is weighted by the inverse variances of the subsequent trait effects. The regression is adjusted for sampling variation in the index trait effects, and the residuals then used to obtain adjusted effect sizes and standard errors for the subsequent trait.
The regression should be performed on a subset of predictors that are independent.
In the context of a genome-wide association study, these would be LD-pruned SNPs.
In terms of the input parameters, the regression command is lm(ybeta[prune]~xbeta[prune],weights=1/yse[prune]^2)
.
The effects in xbeta
and ybeta
should be aligned for the same variables
and the same direction prior to running indexevent
.
The default value of B
is 10 to get a quick result, but higher values are recommended, eg 1000.
An object of class "indexevent" which contains:
ybeta.adj
Adjusted effects on the subsequent trait
yse.adj
Adjusted standard errors of ybeta.adj
ychisq.adj
Chi-square statistics for (ybeta.adj/yse.adj)^2
yp.adj
P-values for ychisq.adj
on 1df
b
Coefficient of the regression of ybeta[prune]
on xbeta[prune]
, after correction for regression dilution
b.se
Standard error of b
b.ci
Lower and upper confidence limits for b
b.raw
Regression coefficient without correction for regression dilution
simex.estimates
Regression coefficients under simulated measurement error
Frank Dudbridge
Cai S, Hartley A, Mahmoud O, Tilling K, Dudbridge F (2022) Adjusting for collider bias in genetic association studies using instrumental variable methods. Genetic Epidemiol 46:303-316
Dudbridge F, Allen RJ, Sheehan NA, Schmidt AF, Lee JC, Jenkins RG, Wain LV, Hingorani AD, Patel RS (2019) Adjustment for index event bias in genome-wide association studies of subsequent events. Nat Commun 10:1561
Calculates the log-likelihood in a simple linear regression model with measurement error in the predictor, using the SIMEX method.
simexllhd(pvar, pmean, simex.estimates)
simexllhd(pvar, pmean, simex.estimates)
pvar |
Ratio of the sampling variance to the variance of the true predictors, on the log scale. |
pmean |
Slope of the simple linear regression. |
simex.estimates |
Matrix containing data simulated by SIMEX. |
simex.estimates
is a matrix with three columns.
Column 1 contains the values of lambda under which measurement error is simulated.
Column 2 contains the estimated slopes for each value of lambda.
Column 3 contains the sampling variances of the estimated slopes.
The likelihood is a function of two parameters, the true slope of the simple linear regression and a parameter representing the ratio of the sampling variance to the variance of the true predictors. As this parameter must be positive, it is estimated on the log scale.
Log-likelihood evaluated at pvar
and pmean
for the data in simex.estimates
.
Obtains a maximum likelihood estimate of the slope in a simple linear regression model with measurement error in the predictor, using the SIMEX method.
simexprofileCI(simex.estimates, variance.ratio)
simexprofileCI(simex.estimates, variance.ratio)
simex.estimates |
Matrix containing data simulated by SIMEX. |
variance.ratio |
Ratio of the variance of the predictor to the variance of the outcome. |
simex.estimates
is a matrix with three columns.
Column 1 contains the values of lambda under which measurement error is simulated.
Column 2 contains the estimated slopes for each value of lambda.
Column 3 contains the sampling variances of the estimated slopes.
The likelihood is a profile likelihood for the true regression slope, with the profile taken over a nuisance
parameter representing the ratio of the sampling variance to the variance of the true predictors.
The profiling step requires a value for the variance.ratio
.
A vector with three elements, the estimated slope and its lower and upper 95% confidence limits.
Calculates the profile log-likelihood of the slope in a simple linear regression model with measurement error in the predictor, using the SIMEX method.
simexprofilellhd(p, simex.estimates, variance.ratio)
simexprofilellhd(p, simex.estimates, variance.ratio)
p |
Slope of the simple linear regression. |
simex.estimates |
Matrix containing data simulated by SIMEX. |
variance.ratio |
Ratio of the variance of the predictor to the variance of the outcome. |
simex.estimates
is a matrix with three columns.
Column 1 contains the values of lambda under which measurement error is simulated.
Column 2 contains the estimated slopes for each value of lambda.
Column 3 contains the sampling variances of the estimated slopes.
The likelihood is a profile likelihood for the true regression slope, with the profile taken over a nuisance
parameter representing the ratio of the sampling variance to the variance of the true predictors.
The lower bound for this nuisance parameter depends on p
and variance.ratio
.
Profile log-likelihood evaluated at p
for the data in simex.estimates
.
A simulated dataset consisting of regression coefficients on incidence and prognosis, with their standard errors, for 10,000 variables (eg SNPs). 500 variables have effects on incidence only, 500 on prognosis only, and 500 on both. The effects on incidence and prognosis are independent. The estimates are obtained from linear regression in a simulated dataset of 20,000 individuals.
testData
testData
A data frame with 10,000 rows and 4 variables:
Regression coefficient on incidence
Standard error of xbeta
Regression coefficient on prognosis
Standard error of ybeta
Default analysis with CWLS indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse) # [1] "Coefficient -0.416773273239147" # [1] "Standard error 0.0196993218284169" # [1] "95% CI -0.455383234542707 -0.378163311935586" # Hedges-Olkin adjustment for regression dilution # Equivalent to an unweighted regression with CWLS indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse, method="Hedges-Olkin") # [1] "Coefficient -0.441061156526639" # [1] "Standard error 0.0211910391231297" # [1] "95% CI -0.482594830002953 -0.399527483050326" # SIMEX adjustment with 100 simulations for each step indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse,method="SIMEX",B=100) # [1] "Coefficient -0.446543628582032" # [1] "Standard error 0.011576233488927" # [1] "95% CI -0.470301533547 -0.424923532117153" # First few unadjusted effects on prognosis testData$ybeta[1:5] # [1] 0.032240 0.057070 -0.006959 0.080460 0.032820 # Adjusted effects indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse)$ybeta.adj[1:5] # [1] 0.05109482 0.06088181 -0.01446092 0.08931226 0.01435694
Default analysis with CWLS indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse) # [1] "Coefficient -0.416773273239147" # [1] "Standard error 0.0196993218284169" # [1] "95% CI -0.455383234542707 -0.378163311935586" # Hedges-Olkin adjustment for regression dilution # Equivalent to an unweighted regression with CWLS indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse, method="Hedges-Olkin") # [1] "Coefficient -0.441061156526639" # [1] "Standard error 0.0211910391231297" # [1] "95% CI -0.482594830002953 -0.399527483050326" # SIMEX adjustment with 100 simulations for each step indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse,method="SIMEX",B=100) # [1] "Coefficient -0.446543628582032" # [1] "Standard error 0.011576233488927" # [1] "95% CI -0.470301533547 -0.424923532117153" # First few unadjusted effects on prognosis testData$ybeta[1:5] # [1] 0.032240 0.057070 -0.006959 0.080460 0.032820 # Adjusted effects indexevent(testData$xbeta,testData$xse,testData$ybeta,testData$yse)$ybeta.adj[1:5] # [1] 0.05109482 0.06088181 -0.01446092 0.08931226 0.01435694