model <-linear_reg() %>%set_engine("lm") %>%fit(mpg ~ ., mtcars)
Why tidymodels? Consistency
With glmnet:
model <-glmnet(as.matrix(mtcars[2:11]), mtcars$mpg )
With tidymodels:
model <-linear_reg() %>%set_engine("glmnet") %>%fit(mpg ~ ., mtcars)
Why tidymodels? Consistency
With h2o:
h2o.init()as.h2o(mtcars, "cars")model <-h2o.glm(x =colnames(mtcars[2:11]), y ="mpg","cars" )
With tidymodels:
model <-linear_reg() %>%set_engine("h2o") %>%fit(mpg ~ ., mtcars)
Why tidymodels? Consistency
Why tidymodels? Safety1
Why tidymodels? Safety1
Overfitting leads to analysts believing models are more performant than they actually are.
A 2023 review found data leakage to be βa widespread failure mode in machine-learning (ML)-based science.β
Implementations of the same machine learning model give differing results, resulting in irreproducibility of modeling results.
Why tidymodels? Safety
Some of the resistance Iβve seen to tidymodels comes from a place of βThis makes it too easy- youβre not thinking carefully about what the code is doing!β But I think this is getting it backwards.
By removing the burden of writing procedural logic, I get to focus on scientific and statistical questions about my data and model.
collect_metrics(lm_res)#> # A tibble: 2 Γ 6#> .metric .estimator mean n std_err .config #> <chr> <chr> <dbl> <int> <dbl> <chr> #> 1 rmse standard 2.39 1 NA Preprocessor1_Model1#> 2 rsq standard 0.881 1 NA Preprocessor1_Model1
Does it work better?
cal_plot_regression(lm_res)
Not perfect but π.
The probably package has better tools to visualize and mitigate calibration issues.
Next Steps
We would
do some exploratory data analyses to figure out better features.
tune the model (e.g., optimize the number of spline terms) (TMwR)
try a different estimation method (Bayesian, robust, etc.)