• 1 Introduction
  • 2 Visualizations
  • 3 Pre-Processing
    • 3.1 Creating Dummy Variables
    • 3.2 Zero- and Near Zero-Variance Predictors
    • 3.3 Identifying Correlated Predictors
    • 3.4 Linear Dependencies
    • 3.5 The preProcess Function
    • 3.6 Centering and Scaling
    • 3.7 Imputation
    • 3.8 Transforming Predictors
    • 3.9 Putting It All Together
    • 3.10 Class Distance Calculations
  • 4 Data Splitting
    • 4.1 Simple Splitting Based on the Outcome
    • 4.2 Splitting Based on the Predictors
    • 4.3 Data Splitting for Time Series
    • 4.4 Simple Splitting with Important Groups
  • 5 Model Training and Tuning
    • 5.1 Model Training and Parameter Tuning
    • 5.2 An Example
    • 5.3 Basic Parameter Tuning
    • 5.4 Notes on Reproducibility
    • 5.5 Customizing the Tuning Process
      • 5.5.1 Pre-Processing Options
      • 5.5.2 Alternate Tuning Grids
      • 5.5.3 Plotting the Resampling Profile
      • 5.5.4 The trainControl Function
      • 5.5.5 Alternate Performance Metrics
    • 5.6 Choosing the Final Model
    • 5.7 Extracting Predictions and Class Probabilities
    • 5.8 Exploring and Comparing Resampling Distributions
      • 5.8.1 Within-Model
      • 5.8.2 Between-Models
    • 5.9 Fitting Models Without Parameter Tuning
  • 6 Available Models
  • 7 train Models By Tag
    • 7.0.1 Accepts Case Weights
    • 7.0.2 Bagging
    • 7.0.3 Bayesian Model
    • 7.0.4 Binary Predictors Only
    • 7.0.5 Boosting
    • 7.0.6 Categorical Predictors Only
    • 7.0.7 Cost Sensitive Learning
    • 7.0.8 Discriminant Analysis
    • 7.0.9 Distance Weighted Discrimination
    • 7.0.10 Ensemble Model
    • 7.0.11 Feature Extraction
    • 7.0.12 Feature Selection Wrapper
    • 7.0.13 Gaussian Process
    • 7.0.14 Generalized Additive Model
    • 7.0.15 Generalized Linear Model
    • 7.0.16 Handle Missing Predictor Data
    • 7.0.17 Implicit Feature Selection
    • 7.0.18 Kernel Method
    • 7.0.19 L1 Regularization
    • 7.0.20 L2 Regularization
    • 7.0.21 Linear Classifier
    • 7.0.22 Linear Regression
    • 7.0.23 Logic Regression
    • 7.0.24 Logistic Regression
    • 7.0.25 Mixture Model
    • 7.0.26 Model Tree
    • 7.0.27 Multivariate Adaptive Regression Splines
    • 7.0.28 Neural Network
    • 7.0.29 Oblique Tree
    • 7.0.30 Ordinal Outcomes
    • 7.0.31 Partial Least Squares
    • 7.0.32 Patient Rule Induction Method
    • 7.0.33 Polynomial Model
    • 7.0.34 Prototype Models
    • 7.0.35 Quantile Regression
    • 7.0.36 Radial Basis Function
    • 7.0.37 Random Forest
    • 7.0.38 Regularization
    • 7.0.39 Relevance Vector Machines
    • 7.0.40 Ridge Regression
    • 7.0.41 Robust Methods
    • 7.0.42 Robust Model
    • 7.0.43 ROC Curves
    • 7.0.44 Rule-Based Model
    • 7.0.45 Self-Organising Maps
    • 7.0.46 String Kernel
    • 7.0.47 Support Vector Machines
    • 7.0.48 Supports Class Probabilities
    • 7.0.49 Text Mining
    • 7.0.50 Tree-Based Model
    • 7.0.51 Two Class Only
  • 8 Models Clustered by Tag Similarity
  • 9 Parallel Processing
  • 10 Random Hyperparameter Search
  • 11 Subsampling For Class Imbalances
    • 11.1 Subsampling Techniques
    • 11.2 Subsampling During Resampling
    • 11.3 Complications
    • 11.4 Using Custom Subsampling Techniques
  • 12 Using Recipes with train
    • 12.1 Why Should you learn this?
      • 12.1.1 More versatile tools for preprocessing data
      • 12.1.2 Using additional data to measure performance
    • 12.2 An Example
    • 12.3 Case Weights
  • 13 Using Your Own Model in train
    • 13.1 Introduction
    • 13.2 Illustrative Example 1: SVMs with Laplacian Kernels
    • 13.3 Model Components
      • 13.3.1 The parameters Element
      • 13.3.2 The grid Element
      • 13.3.3 The fit Element
      • 13.3.4 The predict Element
      • 13.3.5 The prob Element
    • 13.4 The sort Element
      • 13.4.1 The levels Element
    • 13.5 Illustrative Example 2: Something More Complicated - LogitBoost
    • 13.6 Illustrative Example 3: Nonstandard Formulas
    • 13.7 Illustrative Example 4: PLS Feature Extraction Pre-Processing
    • 13.8 Illustrative Example 5: Optimizing probability thresholds for class imbalances
    • 13.9 Illustrative Example 6: Offsets in Generalized Linear Models
  • 14 Adaptive Resampling
  • 15 Variable Importance
    • 15.1 Model Specific Metrics
    • 15.2 Model Independent Metrics
    • 15.3 An Example
  • 16 Miscellaneous Model Functions
    • 16.1 Yet Another k-Nearest Neighbor Function
    • 16.2 Partial Least Squares Discriminant Analysis
    • 16.3 Bagged MARS and FDA
    • 16.4 Bagging
      • 16.4.1 The fit Function
      • 16.4.2 The pred Function
      • 16.4.3 The aggregate Function
    • 16.5 Model Averaged Neural Networks
    • 16.6 Neural Networks with a Principal Component Step
    • 16.7 Independent Component Regression
  • 17 Measuring Performance
    • 17.1 Measures for Regression
    • 17.2 Measures for Predicted Classes
    • 17.3 Measures for Class Probabilities
    • 17.4 Lift Curves
    • 17.5 Calibration Curves
  • 18 Feature Selection Overview
    • 18.1 Models with Built-In Feature Selection
    • 18.2 Feature Selection Methods
    • 18.3 External Validation
  • 19 Feature Selection using Univariate Filters
    • 19.1 Univariate Filters
    • 19.2 Basic Syntax
      • 19.2.1 The score Function
      • 19.2.2 The filter Function
      • 19.2.3 The fit Function
      • 19.2.4 The summary and pred Functions
    • 19.3 The Example
  • 20 Recursive Feature Elimination
    • 20.1 Backwards Selection
    • 20.2 Resampling and External Validation
    • 20.3 Recursive Feature Elimination via caret
    • 20.4 An Example
    • 20.5 Helper Functions
      • 20.5.1 The summary Function
      • 20.5.2 The fit Function
      • 20.5.3 The pred Function
      • 20.5.4 The rank Function
      • 20.5.5 The selectSize Function
      • 20.5.6 The selectVar Function
    • 20.6 The Example
    • 20.7 Using a Recipe
  • 21 Feature Selection using Genetic Algorithms
    • 21.1 Genetic Algorithms
    • 21.2 Internal and External Performance Estimates
    • 21.3 Basic Syntax
    • 21.4 Genetic Algorithm Example
    • 21.5 Customizing the Search
      • 21.5.1 The fit Function
      • 21.5.2 The pred Function
      • 21.5.3 The fitness_intern Function
      • 21.5.4 The fitness_extern Function
      • 21.5.5 The initial Function
      • 21.5.6 The selection Function
      • 21.5.7 The crossover Function
      • 21.5.8 The mutation Function
      • 21.5.9 The selectIter Function
    • 21.6 The Example Revisited
    • 21.7 Using Recipes
  • 22 Feature Selection using Simulated Annealing
    • 22.1 Simulated Annealing
    • 22.2 Internal and External Performance Estimates
    • 22.3 Basic Syntax
    • 22.4 Simulated Annealing Example
    • 22.5 Customizing the Search
      • 22.5.1 The fit Function
      • 22.5.2 The pred Function
      • 22.5.3 The fitness_intern Function
      • 22.5.4 The fitness_extern Function
      • 22.5.5 The initial Function
      • 22.5.6 The perturb Function
      • 22.5.7 The prob Function
    • 22.6 Using Recipes
  • 23 Data Sets
    • 23.1 Blood-Brain Barrier Data
    • 23.2 COX-2 Activity Data
    • 23.3 DHFR Inhibition
    • 23.4 Tecator NIR Data
    • 23.5 Fatty Acid Composition Data
    • 23.6 German Credit Data
    • 23.7 Kelly Blue Book
    • 23.8 Cell Body Segmentation Data
    • 23.9 Sacramento House Price Data
    • 23.10 Animal Scat Data
  • 24 Session Information

The caret Package

10 Random Hyperparameter Search

The default method for optimizing tuning parameters in train is to use a grid search. This approach is usually effective but, in cases when there are many tuning parameters, it can be inefficient. An alternative is to use a combination of grid search and racing. Another is to use a random selection of tuning parameter combinations to cover the parameter space to a lesser extent.

There are a number of models where this can be beneficial in finding reasonable values of the tuning parameters in a relatively short time. However, there are some models where the efficiency in a small search field can cancel out other optimizations. For example, a number of models in caret utilize the “sub-model trick” where M tuning parameter combinations are evaluated, potentially far fewer than M model fits are required. This approach is best leveraged when a simple grid search is used. For this reason, it may be inefficient to use random search for the following model codes: ada, AdaBag, AdaBoost.M1, bagEarth, blackboost, blasso, BstLm, bstSm, bstTree, C5.0, C5.0Cost, cubist, earth, enet, foba, gamboost, gbm, glmboost, glmnet, kernelpls, lars, lars2, lasso, lda2, leapBackward, leapForward, leapSeq, LogitBoost, pam, partDSA, pcr, PenalizedLDA, pls, relaxo, rfRules, rotationForest, rotationForestCp, rpart, rpart2, rpartCost, simpls, spikeslab, superpc, widekernelpls, xgbDART, xgbTree.

Finally, many of the models wrapped by train have a small number of parameters. The average number of parameters is 2.

To use random search, another option is available in trainControl called search. Possible values of this argument are "grid" and "random". The built-in models contained in caret contain code to generate random tuning parameter combinations. The total number of unique combinations is specified by the tuneLength option to train.

Again, we will use the sonar data from the previous training page to demonstrate the method with a regularized discriminant analysis by looking at a total of 30 tuning parameter combinations:

library(mlbench)
data(Sonar)

library(caret)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]

fitControl <- trainControl(method = "repeatedcv",
                           number = 10,
                           repeats = 10,
                           classProbs = TRUE,
                           summaryFunction = twoClassSummary,
                           search = "random")

set.seed(825)
rda_fit <- train(Class ~ ., data = training, 
                  method = "rda",
                  metric = "ROC",
                  tuneLength = 30,
                  trControl = fitControl)
rda_fit
## Regularized Discriminant Analysis 
## 
## 157 samples
##  60 predictor
##   2 classes: 'M', 'R' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 141, 142, 141, 142, 141, 142, ... 
## Resampling results across tuning parameters:
## 
##   gamma       lambda       ROC        Sens       Spec     
##   0.03177874  0.767664044  0.8662029  0.7983333  0.7600000
##   0.03868192  0.499283304  0.8526513  0.8120833  0.7600000
##   0.11834801  0.974493793  0.8379266  0.7780556  0.7428571
##   0.12391186  0.018063038  0.8321825  0.8112500  0.7233929
##   0.13442487  0.868918547  0.8590501  0.8122222  0.7528571
##   0.19249104  0.335761243  0.8588070  0.8577778  0.7030357
##   0.23568481  0.064135040  0.8465402  0.8372222  0.7026786
##   0.23814584  0.986270274  0.8363070  0.7623611  0.7532143
##   0.25082994  0.674919744  0.8700918  0.8588889  0.7010714
##   0.28285931  0.576888058  0.8706250  0.8650000  0.6871429
##   0.29099029  0.474277013  0.8681548  0.8687500  0.6844643
##   0.29601805  0.002963208  0.8465476  0.8419444  0.6973214
##   0.31717364  0.943120266  0.8440030  0.7863889  0.7444643
##   0.33633553  0.283586169  0.8650794  0.8626389  0.6878571
##   0.41798776  0.881581948  0.8540253  0.8076389  0.7346429
##   0.45885413  0.701431940  0.8704588  0.8413889  0.7026786
##   0.48684373  0.545997273  0.8713442  0.8638889  0.6758929
##   0.48845661  0.377704420  0.8700818  0.8783333  0.6566071
##   0.51491517  0.592224877  0.8705903  0.8509722  0.6789286
##   0.53206420  0.339941226  0.8694320  0.8795833  0.6523214
##   0.54020648  0.253930177  0.8673239  0.8747222  0.6546429
##   0.56009903  0.183772303  0.8652059  0.8709722  0.6573214
##   0.56472058  0.995162379  0.8354911  0.7550000  0.7489286
##   0.58045730  0.773613530  0.8612922  0.8262500  0.7089286
##   0.67085142  0.287354882  0.8686062  0.8781944  0.6444643
##   0.69503284  0.348973440  0.8694742  0.8805556  0.6417857
##   0.72206263  0.653406920  0.8635937  0.8331944  0.6735714
##   0.76035804  0.183676074  0.8642560  0.8769444  0.6303571
##   0.86234436  0.272931617  0.8545412  0.8588889  0.6030357
##   0.98847635  0.580160726  0.7383358  0.7097222  0.6169643
## 
## ROC was used to select the optimal model using the largest value.
## The final values used for the model were gamma = 0.4868437 and lambda
##  = 0.5459973.

There is currently only a ggplot method (instead of a basic plot method). The results of this function with random searching depends on the number and type of tuning parameters. In this case, it produces a scatter plot of the continuous parameters.

ggplot(rda_fit) + theme(legend.position = "top")