This function calculates the variable importance (aka attribute usage) for C5.0 models.
C5imp(object, metric = "usage", pct = TRUE, ...)a data frame with a column Overall with the predictor usage
values. The row names indicate the predictor.
By default, C5.0 measures predictor importance by determining the percentage
of training set samples that fall into all the terminal nodes after the
split (this is used when metric = "usage"). For example, the
predictor in the first split automatically has an importance measurement of
100 percent. Other predictors may be used frequently in splits, but if the
terminal nodes cover only a handful of training set samples, the importance
scores may be close to zero. The same strategy is applied to rule-based
models as well as the corresponding boosted versions of the model.
There is a difference in the attribute usage numbers between this output and the nominal command line output. Although the calculations are almost exactly the same (we do not add 1/2 to everything), the C code does not display that an attribute was used if the percentage of training samples covered by the corresponding splits is very low. Here, the threshold was lowered and the fractional usage is shown.
When metric = "splits", the percentage of splits associated with each
predictor is calculated.
Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, http://www.rulequest.com/see5-unix.html
library(modeldata)
data(mlc_churn)
treeModel <- C5.0(x = mlc_churn[1:3333, -20], y = mlc_churn$churn[1:3333])
C5imp(treeModel)
#>                               Overall
#> total_day_minutes              100.00
#> number_customer_service_calls   93.67
#> international_plan              87.73
#> total_eve_charge                20.73
#> voice_mail_plan                  8.97
#> total_intl_calls                 8.01
#> total_intl_minutes               6.48
#> total_night_minutes              6.33
#> total_eve_minutes                4.74
#> total_eve_calls                  0.57
#> account_length                   0.18
#> total_day_charge                 0.18
#> state                            0.00
#> area_code                        0.00
#> number_vmail_messages            0.00
#> total_day_calls                  0.00
#> total_night_calls                0.00
#> total_night_charge               0.00
#> total_intl_charge                0.00
C5imp(treeModel, metric = "splits")
#>                                 Overall
#> total_day_minutes             26.923077
#> total_eve_charge              15.384615
#> total_night_minutes           15.384615
#> international_plan             7.692308
#> voice_mail_plan                7.692308
#> account_length                 3.846154
#> number_customer_service_calls  3.846154
#> total_day_charge               3.846154
#> total_eve_calls                3.846154
#> total_eve_minutes              3.846154
#> total_intl_calls               3.846154
#> total_intl_minutes             3.846154
#> state                          0.000000
#> area_code                      0.000000
#> number_vmail_messages          0.000000
#> total_day_calls                0.000000
#> total_night_calls              0.000000
#> total_night_charge             0.000000
#> total_intl_charge              0.000000