class: title-slide, left, middle background-image: url("images/tidymodels.svg") background-position: 85% 50% background-size: 30% background-color: #F9F8F3 .pull-left[ # What's New and Upcoming in tidymodels ### Max Kuhn (RStudio, PBC) ### https://github.com/topepo/2021_NY_RUG ] --- # What is tidymodels? The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles. If you are getting started, we suggest taking a look at: .code70[ .pull-left-a-lot[ ```r library(tidymodels) ``` ``` ## ── Attaching packages ────────────────── tidymodels 0.1.4 ── ``` ``` ## ✓ broom 0.7.9 ✓ recipes 0.1.17.9000 ## ✓ dials 0.0.10.9000 ✓ rsample 0.1.0 ## ✓ dplyr 1.0.7 ✓ tibble 3.1.6 ## ✓ ggplot2 3.3.5 ✓ tidyr 1.1.4 ## ✓ infer 1.0.0 ✓ tune 0.1.6.9001 ## ✓ modeldata 0.1.1 ✓ workflows 0.2.4.9000 ## ✓ parsnip 0.1.7.9003 ✓ workflowsets 0.1.0 ## ✓ purrr 0.3.4 ✓ yardstick 0.0.8 ``` ``` ## ── Conflicts ───────────────────── tidymodels_conflicts() ── ## x purrr::discard() masks scales::discard() ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## x recipes::step() masks stats::step() ## • Learn how to get started at https://www.tidymodels.org/start/ ``` ] ] .pull-right-a-little[ * [`tidymodels.org`](https://www.tidymodels.org/) * _Tidy Modeling with R_ ([`tmwr.org`](https://www.tmwr.org/)) ] --- layout: false class: inverse, middle, center # What's New? --- # Documentation * We now do quarterly blog posts at [`tidyverse.org/blog`](https://www.tidyverse.org/blog/) * `TMwR` is nearly complete (and we are working with a publisher) * <span class="pkg">parsnip</span> model documentation - There are now engine-specific pages for each model/engine combination. - Dynamic based on loaded packages (this may change). - We will update to include the new mode of "censored regression". --- # Default engines All model specifications in <span class="pkg">parsnip</span> now have default engines. For example: ```r args(linear_reg) ``` ``` ## function (mode = "regression", engine = "lm", penalty = NULL, ## mixture = NULL) ## NULL ``` ```r args(decision_tree) ``` ``` ## function (mode = "unknown", engine = "rpart", cost_complexity = NULL, ## tree_depth = NULL, min_n = NULL) ## NULL ``` --- # Common interface for object extraction Previously, we had a smattering of `pull_*()` objects that could get a specific item from an object (e.g., `pull_workflow()`). There is now a set of S3 methods called `extract_*()` that are much better: .pull-left[ * `extract_fit_engine()` * `extract_fit_parsnip()` * `extract_mold()` * `extract_preprocessor()` ] .pull-right[ * `extract_recipe()` * `extract_spec_parsnip()` * `extract_workflow()` * `extract_workflow_set_result()` ] --- # Recipes updates * Can retain original predictors (with a common interface) * More specific selectors * New steps for sparse PCA analysis (in <span class="pkg">embed</span>), multi-choice dummy variables (<span class="pkg">textrecipes</span>) and more. ```r library(beans) library(embed) recipe(class ~ ., data = beans) %>% step_pca(all_predictors()) ``` --- # Recipes updates * Can retain original predictors (with a common interface) * More specific selectors * New steps for sparse PCA analysis (in <span class="pkg">embed</span>), multi-choice dummy variables (<span class="pkg">textrecipes</span>) and more ```r library(beans) library(embed) recipe(class ~ ., data = beans) %>% * step_pca(all_numeric_predictors()) ``` --- # Recipes updates * Can retain original predictors (with a common interface) * More specific selectors * New steps for sparse PCA analysis (in <span class="pkg">embed</span>), multi-choice dummy variables (<span class="pkg">textrecipes</span>) and more. ```r library(beans) library(embed) recipe(class ~ ., data = beans) %>% * step_pca(all_numeric_predictors(), keep_original_cols = TRUE) ``` --- # Recipes updates * Can retain original predictors (with a common interface) * More specific selectors * New steps for sparse PCA analysis (in <span class="pkg">embed</span>), multi-choice dummy variables (<span class="pkg">textrecipes</span>) and more. ```r library(beans) library(embed) recipe(class ~ ., data = beans) %>% * step_pca_sparse_bayes(all_numeric_predictors(), keep_original_cols = TRUE) ``` --- layout: false class: inverse, middle, center # What's Cooking? --- # Case weights There is a lot of interest for this feature but it is deceivingly difficult: * How do we resample? * Do we require case weights for performance metrics? * What to do about recipe steps that don't handle weights? Answers are affected by the type of weights (e.g., frequency weights versus importance weights, etc.) The work is in-process and affects all of our core packages. --- # Censored regression Censoring is usually a characteristic of time-to-event data. For example: > We know that the dog has been at the shelter for 10 days but is not yet adopted. In this case, the time-to-adoption is 10 and _censored_. R is particularly strong in _survival analysis_ but * The quality of the packages is extremely uneven. * Some of the core packages were written without regard to developers programming with them. * User interfaces differ wildly. This is the _raison d'etre_ for tidymodels. --- # The censored package We add a new mode of `"censored regression"`: ```r # remotes::install_github("tidymodels/censored") library(censored) ``` ``` ## Loading required package: survival ``` ```r sparse_cox <- proportional_hazards(penalty = 0.01) %>% set_mode("censored regression") %>% set_engine("glmnet") %>% fit(Surv(time, status) ~ age + ph.ecog + strata(sex), data = lung[-(1:3),]) time_pred <- predict(sparse_cox, lung[1:3,], type = "survival", time = 1:1000) time_pred ``` ``` ## # A tibble: 3 × 1 ## .pred ## <list> ## 1 <tibble [1,000 × 2]> ## 2 <tibble [1,000 × 2]> ## 3 <tibble [1,000 × 2]> ``` --- # The censored package .pull-left[ ```r time_pred %>% mutate(sample = letters[1:3]) %>% unnest(cols = .pred) %>% ggplot( aes( x = .time, y = .pred_survival, col = sample ) ) + geom_step() ``` ] .pull-right[ <img src="images/surv-plot-1.svg" width="90%" style="display: block; margin: auto;" /> ] --- # Current model slate | model | engine | time | survival | linear\_pred | raw | quantile | hazard | |:----------------------|:---------|:-----|:---------|:-------------|:----|:---------|:-------| | bag\_tree | rpart | ✓ | ✓ | x | x | x | x | | boost\_tree | mboost | x | ✓ | ✓ | x | x | x | | decision\_tree | rpart | ✓ | ✓ | x | x | x | x | | decision\_tree | party | ✓ | ✓ | x | x | x | x | | proportional\_hazards | survival | ✓ | ✓ | ✓ | x | x | x | | proportional\_hazards | glmnet | x | ✓ | ✓ | ✓ | x | x | | rand\_forest | party | ✓ | ✓ | x | x | x | x | | survival\_reg | survival | ✓ | ✓ | x | x | ✓ | ✓ | | survival\_reg | flexsurv | ✓ | ✓ | x | x | ✓ | ✓ | Give us feedback at [`rstd.io/censored-feedback`](https://rstd.io/censored-feedback)! --- # Model operations Julia Silge is working on this and has an initial release of the <span class="pkg">vetiver</span> package on CRAN. * Can publish, version, and deploy different types of models using the <span class="pkg">pins</span> and <span class="pkg">plumber</span> packages. * We will be adding features related to monitoring and other operations. Devel version works with for `lm`, <span class="pkg">xgboost</span>, <span class="pkg">caret</span>, <span class="pkg">mlr3</span>, and tidymodels objects. --- # vetiver model storage ```r # remotes::install_github("tidymodels/vetiver") library(vetiver) data(Sacramento, package = "modeldata") rf_spec <- rand_forest(mode = "regression") rf_form <- price ~ type + sqft + beds + baths rf_fit <- workflow(rf_form, rf_spec) %>% fit(Sacramento) vet_mod <- vetiver_model(rf_fit, "sacramento_rf") vet_mod ``` ``` ## ## ── sacramento_rf ─ <butchered_workflow> model for deployment ## A ranger regression modeling workflow using 4 features ``` --- # Version and share and deploy a model ```r library(pins) model_board <- board_temp() model_board %>% vetiver_pin_write(vet_mod) ``` ``` ## Creating new version '20211201T173401Z-75ac1' ``` ``` ## Writing to pin 'sacramento_rf' ``` ```r library(plumber) pr() %>% vetiver_pr_predict(vet_mod) %>% pr_run(port = 8088) ``` Let's look at a deployed model for predicting "movie or tv show" from a textual description --- # 2021 User survey results <img src="images/survey-time.svg" width="80%" style="display: block; margin: auto;" /> --- # User experience <img src="images/survey-users.svg" width="80%" style="display: block; margin: auto;" /> --- # User roles <img src="images/survey-role.svg" width="80%" style="display: block; margin: auto;" /> --- # Spend hypothetical $100 <img src="images/survey-results.svg" width="75%" style="display: block; margin: auto;" /> --- # Thanks Thanks for the invitation to speak today! The tidymodels team: Davis Vaughan, Julia Silge, and Hannah Frick. Emil Hvitfeldt starts on 2021/12/08. Special thanks for the other folks who contributed so much to tidymodels: Edgar Ruiz, Alison Hill, Desirée De Leon, and the tidyverse team. These slides were made with the [`xaringan`](https://bookdown.org/yihui/rmarkdown/xaringan.html) package and styled by Alison Hill.