Given a set of training data, this function builds the HDRDA classifier from Ramey, Stein, and Young (2017). Specially designed for small-sample, high-dimensional data, the HDRDA classifier incorporates dimension reduction and covariance-matrix shrinkage to enable a computationally efficient classifier.
For a given rda_high_dim
object, we predict the class of each observation
(row) of the the matrix given in newdata
.
rda_high_dim(x, ...) # S3 method for default rda_high_dim( x, y, lambda = 1, gamma = 0, shrinkage_type = c("ridge", "convex"), prior = NULL, tol = 1e-06, ... ) # S3 method for formula rda_high_dim(formula, data, ...) # S3 method for rda_high_dim predict( object, newdata, projected = FALSE, type = c("class", "prob", "score"), ... )
x | Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
---|---|
... | additional arguments (not currently used). |
y | vector of class labels for each training observation |
lambda | the HDRDA pooling parameter. Must be between 0 and 1, inclusively. |
gamma | a numeric values used for the shrinkage parameter. |
shrinkage_type | the type of covariance-matrix shrinkage to apply. By
default, a ridge-like shrinkage is applied. If |
prior | vector with prior probabilities for each class. If |
tol | a threshold for determining nonzero eigenvalues. |
formula | A formula of the form |
data | data frame from which variables specified in |
object | Object of type |
newdata | Matrix or data frame of observations to predict. Each row corresponds to a new observation. |
projected | logical indicating whether |
type | Prediction type: either `"class"`, `"prob"`, or `"score"`. |
rda_high_dim
object that contains the trained HDRDA classifier
list with predicted class and discriminant scores for each of the K classes
The HDRDA classifier utilizes a covariance-matrix estimator that is a convex
combination of the covariance-matrix estimators used in the Linear
Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA)
classifiers. For each of the K
classes given in y
,
\((k = 1, \ldots, K)\), we first define this convex combination as
$$\hat{\Sigma}_k(\lambda) = (1 - \lambda) \hat{\Sigma}_k
+ \lambda \hat{\Sigma},$$
where \(\lambda \in [0, 1]\) is the pooling parameter. We then
calculate the covariance-matrix estimator
$$\tilde{\Sigma}_k = \alpha_k \hat{\Sigma}_k(\lambda) + \gamma I_p,$$
where \(I_p\) is the \(p \times p\) identity matrix. The matrix
\(\tilde{\Sigma}_k\) is substituted into the HDRDA classifier. See Ramey et
al. (2017) for more details.
The matrix of training observations are given in x
. The rows of
x
contain the sample observations, and the columns contain the features
for each training observation. The vector of class labels given in y
are coerced to a factor
. The length of y
should match the number
of rows in x
.
The vector prior
contains the a priori class membership for
each class. If prior
is NULL
(default), the class membership
probabilities are estimated as the sample proportion of observations
belonging to each class. Otherwise, prior
should be a vector with the
same length as the number of classes in y
. The prior
probabilities should be nonnegative and sum to one. The order of the prior
probabilities is assumed to match the levels of factor(y)
.
Ramey, J. A., Stein, C. K., and Young, D. M. (2017), "High-Dimensional Regularized Discriminant Analysis." https://arxiv.org/abs/1602.01182.
Friedman, J. H. (1989), "Regularized Discriminant Analysis," Journal of American Statistical Association, 84, 405, 165-175. http://www.jstor.org/pss/2289860 (Requires full-text access).