`R/lda-pseudo.r`

`lda_pseudo.Rd`

Given a set of training data, this function builds the Linear Discriminant Analysis (LDA) classifier, where the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. A common method to overcome this issue is to replace the inverse of the pooled sample covariance matrix with the Moore-Penrose pseudo-inverse, which is unique and always exists. Note that when the pooled sample covariance matrix is nonsingular, it is equal to the pseudo-inverse.

The Linear Discriminant Analysis (LDA) classifier involves the assumption that the distributions of each class are assumed to be multivariate normal and share a common covariance matrix. When the pooled sample covariance matrix is singular, the linear discriminant function is incalculable. A common method to overcome this issue is to replace the inverse of the pooled sample covariance matrix with the Moore-Penrose pseudo-inverse, which is unique and always exists. Note that when the pooled sample covariance matrix is nonsingular, it is equal to the pseudo-inverse.

lda_pseudo(x, ...) # S3 method for default lda_pseudo(x, y, prior = NULL, tol = 1e-08, ...) # S3 method for formula lda_pseudo(formula, data, prior = NULL, tol = 1e-08, ...) # S3 method for lda_pseudo predict(object, newdata, type = c("class", "prob", "score"), ...)

x | Matrix or data frame containing the training data. The rows are the sample observations, and the columns are the features. Only complete data are retained. |
---|---|

... | additional arguments (not currently used). |

y | Vector of class labels for each training observation. Only complete data are retained. |

prior | Vector with prior probabilities for each class. If NULL (default), then equal probabilities are used. See details. |

tol | tolerance value below which eigenvalues are considered numerically equal to 0 |

formula | A formula of the form |

data | data frame from which variables specified in |

object | Fitted model object |

newdata | Matrix or data frame of observations to predict. Each row corresponds to a new observation. |

type | Prediction type: either `"class"`, `"prob"`, or `"score"`. |

`lda_pseudo`

object that contains the trained lda_pseudo
classifier

The matrix of training observations are given in `x`

. The rows of `x`

contain the sample observations, and the columns contain the features for each
training observation.

The vector of class labels given in `y`

are coerced to a `factor`

.
The length of `y`

should match the number of rows in `x`

.

An error is thrown if a given class has less than 2 observations because the variance for each feature within a class cannot be estimated with less than 2 observations.

The vector, `prior`

, contains the *a priori* class membership for
each class. If `prior`

is NULL (default), the class membership
probabilities are estimated as the sample proportion of observations belonging
to each class. Otherwise, `prior`

should be a vector with the same length
as the number of classes in `y`

. The `prior`

probabilities should be
nonnegative and sum to one.

library(modeldata) data(penguins) pred_rows <- seq(1, 344, by = 20) penguins <- penguins[, c("species", "body_mass_g", "flipper_length_mm")] lda_pseudo_out <- lda_pseudo(species ~ ., data = penguins[-pred_rows, ]) predicted <- predict(lda_pseudo_out, penguins[pred_rows, -1], type = "class") lda_pseudo_out2 <- lda_pseudo(x = penguins[-pred_rows, -1], y = penguins$species[-pred_rows]) predicted2 <- predict(lda_pseudo_out2, penguins[pred_rows, -1], type = "class") all.equal(predicted, predicted2)#> [1] TRUE