Package 'screening'

Title: What the Package Does (one line, title case)
Description: What the package does (one paragraph).
Authors: First Last [aut, cre]
Maintainer: First Last <[email protected]>
License: What license is it under?
Version: 0.0.0.9000
Built: 2025-01-09 03:08:08 UTC
Source: https://github.com/wwrechard/screening

Help Index


A test function for linear models

Description

This function tests screening performance on linear models with user specified dimension. The function generates synthetic data using linear model and prints the screened variable for all method to the screen.

Usage

linearModelTest(n, p, beta.not.null = c(1, 2, 3), num.select = 5 *
  length(beta.not.null), ebic = FALSE)

Arguments

n

the sampel size

p

the model dimension

beta.not.null

the non-zero coefficient indexes

num.select

the same as in "screening"

ebic

the same as in "screening"


A test function for logistic models

Description

This function tests screening performance on logistic models with user specified dimension. The function generates synthetic data using logistic model and prints the screened variable for all method to the screen.

Usage

logisticTest(n, p, beta.not.null = c(1, 2, 3), num.select = 5 *
  length(beta.not.null), ebic = FALSE)

Arguments

n

the sampel size

p

the model dimension

beta.not.null

the non-zero coefficient indexes

num.select

the same as in "screening"

ebic

the same as in "screening"


An efficient variable screening method

Description

This function implements 4 different screening methods (SIS, HOLP, RRCS and Forward regression) for linear models and 3 (excluding RRCS) for generalized linear models.

Usage

screening(x, y, method = "holp", num.select = floor(dim(x)[1]/2),
  family = "gaussian", ebic = FALSE, ebic.gamma = 1)

Arguments

x

the predictor variables, each row corresponds to an observation. Should be a numeric matrix instead of a data.frame

y

the observation.

method

the screening method to use. Choices are "sis", "holp", "rrcs", "forward". Default to "holp".

num.select

the number of variables to keep after screening. Default to half of the sample size. It will not be used if ebic is set to be TRUE.

family

the model type choices are the same as glmnet. Default to be 'gaussian'.

ebic

Indicate whether the extended BIC should be used to determine the number of variables to keep. If ebic is TRUE, then the algorithm will use ebic to terminate the screening procedure and num.select will be ignored.

ebic.gamma

tunning parameter for ebic (between 0 and 1). Gamma = 0 corresponds to the usual BIC. default to be 1.

Value

a list of two variables "screen" and "method". "screen" contains the index of the selected variables and "method" indicates the method of the screening.

References

Fan, Jianqing, and Jinchi Lv. "Sure independence screening for ultrahigh dimensional feature space." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70.5 (2008): 849-911. Wang, Xiangyu, and Chenlei Leng. "High-dimensional ordinary least-squares projection for screening variables." arXiv preprint arXiv:1506.01782 (2015). Li, Gaorong, et al. "Robust rank correlation based screening." The Annals of Statistics 40.3 (2012): 1846-1877. Wang, Hansheng. "Forward regression for ultra-high dimensional variable screening." Journal of the American Statistical Association 104.488 (2009): 1512-1524.

Examples

There are one unit test function and two integrated test functions. Two integrated function test on linear model and logistic model. User specify the sample size, dimension and the true indexes. The two function generate simulate data and coefficients and print the screening results for all methords.

linearModelTest(n = 50, p = 100, beta.not.null = c(1, 2, 3), num.select = 20)
logisticTest(n = 50, p = 100, beta.not.null = c(1, 2, 3), nums.select = 20)