Various parameters that control aspects of the C5.0 fit.
C5.0Control(
subset = TRUE,
bands = 0,
winnow = FALSE,
noGlobalPruning = FALSE,
CF = 0.25,
minCases = 2,
fuzzyThreshold = FALSE,
sample = 0,
seed = sample.int(4096, size = 1) - 1L,
earlyStopping = TRUE,
label = "outcome"
)
A logical: should the model evaluate groups of
discrete predictors for splits? Note: the C5.0 command line
version defaults this parameter to FALSE
, meaning no
attempted groupings will be evaluated during the tree growing
stage.
An integer between 2 and 1000. If TRUE
, the
model orders the rules by their affect on the error rate and
groups the rules into the specified number of bands. This
modifies the output so that the effect on the error rate can be
seen for the groups of rules within a band. If this options is
selected and rules = FALSE
, a warning is issued and
rules
is changed to TRUE
.
A logical: should predictor winnowing (i.e feature selection) be used?
A logical to toggle whether the final, global pruning step to simplify the tree.
A number in (0, 1) for the confidence factor.
an integer for the smallest number of samples that must be put in at least two of the splits.
A logical toggle to evaluate possible advanced splits of the data. See Quinlan (1993) for details and examples.
A value between (0, .999) that specifies the random proportion of the data should be used to train the model. By default, all the samples are used for model training. Samples not used for training are used to evaluate the accuracy of the model in the printed output.
An integer for the random number seed within the C code.
A logical to toggle whether the internal method for stopping boosting should be used.
A character label for the outcome used in the output. @return A list of options.
Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, http://www.rulequest.com/see5-unix.html
library(modeldata)
data(mlc_churn)
treeModel <- C5.0(x = mlc_churn[1:3333, -20],
y = mlc_churn$churn[1:3333],
control = C5.0Control(winnow = TRUE))
summary(treeModel)
#>
#> Call:
#> C5.0.default(x = mlc_churn[1:3333, -20], y = mlc_churn$churn[1:3333], control
#> = C5.0Control(winnow = TRUE))
#>
#>
#> C5.0 [Release 2.07 GPL Edition] Wed Feb 8 19:59:18 2023
#> -------------------------------
#>
#> Class specified by attribute `outcome'
#>
#> Read 3333 cases (20 attributes) from undefined.data
#>
#> 4 attributes winnowed
#> Estimated importance of remaining attributes:
#>
#> 51% total_day_minutes
#> 40% international_plan
#> 32% total_eve_charge
#> 25% voice_mail_plan
#> 22% number_customer_service_calls
#> 20% total_intl_calls
#> 18% total_intl_minutes
#> 16% total_day_charge
#> 9% total_eve_minutes
#> <1% state
#> <1% account_length
#> <1% area_code
#> <1% total_eve_calls
#> <1% total_night_minutes
#> <1% total_night_calls
#>
#> Decision tree:
#>
#> total_day_minutes > 264.4:
#> :...voice_mail_plan = yes:
#> : :...international_plan = no: no (45/1)
#> : : international_plan = yes: yes (8/3)
#> : voice_mail_plan = no:
#> : :...total_eve_minutes > 187.7:
#> : :...total_night_minutes > 126.9: yes (94/1)
#> : : total_night_minutes <= 126.9:
#> : : :...total_day_minutes <= 277: no (4)
#> : : total_day_minutes > 277: yes (3)
#> : total_eve_minutes <= 187.7:
#> : :...total_eve_charge <= 12.26: no (15/1)
#> : total_eve_charge > 12.26:
#> : :...total_day_minutes <= 277:
#> : :...total_night_minutes <= 224.8: no (13)
#> : : total_night_minutes > 224.8: yes (5/1)
#> : total_day_minutes > 277:
#> : :...total_night_minutes > 151.9: yes (18)
#> : total_night_minutes <= 151.9:
#> : :...account_length <= 123: no (4)
#> : account_length > 123: yes (2)
#> total_day_minutes <= 264.4:
#> :...number_customer_service_calls > 3:
#> :...total_day_minutes <= 160.2:
#> : :...total_eve_charge <= 19.83: yes (79/3)
#> : : total_eve_charge > 19.83:
#> : : :...total_day_minutes <= 120.5: yes (10)
#> : : total_day_minutes > 120.5: no (13/3)
#> : total_day_minutes > 160.2:
#> : :...total_eve_charge > 12.05: no (130/24)
#> : total_eve_charge <= 12.05:
#> : :...total_eve_calls <= 125: yes (16/2)
#> : total_eve_calls > 125: no (3)
#> number_customer_service_calls <= 3:
#> :...international_plan = yes:
#> :...total_intl_calls <= 2: yes (51)
#> : total_intl_calls > 2:
#> : :...total_intl_minutes <= 13.1: no (173/7)
#> : total_intl_minutes > 13.1: yes (43)
#> international_plan = no:
#> :...total_day_minutes <= 223.2: no (2221/60)
#> total_day_minutes > 223.2:
#> :...total_eve_charge <= 20.5: no (295/22)
#> total_eve_charge > 20.5:
#> :...voice_mail_plan = yes: no (20)
#> voice_mail_plan = no:
#> :...total_night_minutes > 174.2: yes (50/8)
#> total_night_minutes <= 174.2:
#> :...total_day_minutes <= 246.6: no (12)
#> total_day_minutes > 246.6:
#> :...total_day_charge <= 43.33: yes (4)
#> total_day_charge > 43.33: no (2)
#>
#>
#> Evaluation on training data (3333 cases):
#>
#> Decision Tree
#> ----------------
#> Size Errors
#>
#> 27 136( 4.1%) <<
#>
#>
#> (a) (b) <-classified as
#> ---- ----
#> 365 118 (a): class yes
#> 18 2832 (b): class no
#>
#>
#> Attribute usage:
#>
#> 100.00% total_day_minutes
#> 93.67% number_customer_service_calls
#> 87.73% international_plan
#> 20.73% total_eve_charge
#> 8.97% voice_mail_plan
#> 8.01% total_intl_calls
#> 6.48% total_intl_minutes
#> 6.33% total_night_minutes
#> 4.74% total_eve_minutes
#> 0.57% total_eve_calls
#> 0.18% account_length
#> 0.18% total_day_charge
#>
#>
#> Time: 0.0 secs
#>