10 Random Hyperparameter Search

The default method for optimizing tuning parameters in train is to use a grid search. This approach is usually effective but, in cases when there are many tuning parameters, it can be inefficient. An alternative is to use a combination of grid search and racing. Another is to use a random selection of tuning parameter combinations to cover the parameter space to a lesser extent.

There are a number of models where this can be beneficial in finding reasonable values of the tuning parameters in a relatively short time. However, there are some models where the efficiency in a small search field can cancel out other optimizations. For example, a number of models in caret utilize the “sub-model trick” where M tuning parameter combinations are evaluated, potentially far fewer than M model fits are required. This approach is best leveraged when a simple grid search is used. For this reason, it may be inefficient to use random search for the following model codes: ada, AdaBag, AdaBoost.M1, bagEarth, blackboost, blasso, BstLm, bstSm, bstTree, C5.0, C5.0Cost, cubist, earth, enet, foba, gamboost, gbm, glmboost, glmnet, kernelpls, lars, lars2, lasso, lda2, leapBackward, leapForward, leapSeq, LogitBoost, pam, partDSA, pcr, PenalizedLDA, pls, relaxo, rfRules, rotationForest, rotationForestCp, rpart, rpart2, rpartCost, simpls, spikeslab, superpc, widekernelpls, xgbDART, xgbTree.

Finally, many of the models wrapped by train have a small number of parameters. The average number of parameters is 2.

To use random search, another option is available in trainControl called search. Possible values of this argument are "grid" and "random". The built-in models contained in caret contain code to generate random tuning parameter combinations. The total number of unique combinations is specified by the tuneLength option to train.

Again, we will use the sonar data from the previous training page to demonstrate the method with a regularized discriminant analysis by looking at a total of 30 tuning parameter combinations:

library(mlbench)
data(Sonar)

library(caret)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]

fitControl <- trainControl(method = "repeatedcv",
                           number = 10,
                           repeats = 10,
                           classProbs = TRUE,
                           summaryFunction = twoClassSummary,
                           search = "random")

set.seed(825)
rda_fit <- train(Class ~ ., data = training, 
                  method = "rda",
                  metric = "ROC",
                  tuneLength = 30,
                  trControl = fitControl)
rda_fit

## Regularized Discriminant Analysis 
## 
## 157 samples
##  60 predictor
##   2 classes: 'M', 'R' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 141, 142, 141, 142, 141, 142, ... 
## Resampling results across tuning parameters:
## 
##   gamma       lambda       ROC        Sens       Spec     
##   0.03177874  0.767664044  0.8662029  0.7983333  0.7600000
##   0.03868192  0.499283304  0.8526513  0.8120833  0.7600000
##   0.11834801  0.974493793  0.8379266  0.7780556  0.7428571
##   0.12391186  0.018063038  0.8321825  0.8112500  0.7233929
##   0.13442487  0.868918547  0.8590501  0.8122222  0.7528571
##   0.19249104  0.335761243  0.8588070  0.8577778  0.7030357
##   0.23568481  0.064135040  0.8465402  0.8372222  0.7026786
##   0.23814584  0.986270274  0.8363070  0.7623611  0.7532143
##   0.25082994  0.674919744  0.8700918  0.8588889  0.7010714
##   0.28285931  0.576888058  0.8706250  0.8650000  0.6871429
##   0.29099029  0.474277013  0.8681548  0.8687500  0.6844643
##   0.29601805  0.002963208  0.8465476  0.8419444  0.6973214
##   0.31717364  0.943120266  0.8440030  0.7863889  0.7444643
##   0.33633553  0.283586169  0.8650794  0.8626389  0.6878571
##   0.41798776  0.881581948  0.8540253  0.8076389  0.7346429
##   0.45885413  0.701431940  0.8704588  0.8413889  0.7026786
##   0.48684373  0.545997273  0.8713442  0.8638889  0.6758929
##   0.48845661  0.377704420  0.8700818  0.8783333  0.6566071
##   0.51491517  0.592224877  0.8705903  0.8509722  0.6789286
##   0.53206420  0.339941226  0.8694320  0.8795833  0.6523214
##   0.54020648  0.253930177  0.8673239  0.8747222  0.6546429
##   0.56009903  0.183772303  0.8652059  0.8709722  0.6573214
##   0.56472058  0.995162379  0.8354911  0.7550000  0.7489286
##   0.58045730  0.773613530  0.8612922  0.8262500  0.7089286
##   0.67085142  0.287354882  0.8686062  0.8781944  0.6444643
##   0.69503284  0.348973440  0.8694742  0.8805556  0.6417857
##   0.72206263  0.653406920  0.8635937  0.8331944  0.6735714
##   0.76035804  0.183676074  0.8642560  0.8769444  0.6303571
##   0.86234436  0.272931617  0.8545412  0.8588889  0.6030357
##   0.98847635  0.580160726  0.7383358  0.7097222  0.6169643
## 
## ROC was used to select the optimal model using the largest value.
## The final values used for the model were gamma = 0.4868437 and lambda
##  = 0.5459973.

There is currently only a ggplot method (instead of a basic plot method). The results of this function with random searching depends on the number and type of tuning parameters. In this case, it produces a scatter plot of the continuous parameters.

ggplot(rda_fit) + theme(legend.position = "top")