Hypterparameter Tuning - Yousef's Notes

Not optimized by the learning algorithm itself.
Data analyst/ML engineer “tunes” Hyperparameters by experimenting with combinations of values, one per hyperparameter.
Each ML model / algorithm has a unique set of hyperparameters.
Several popular hyperparameter tuning techniques
- Grid Search
- Random Search
- Coarse-to-Fine Search
- Bayesian Techniques
- Cross-Validation
Hyperparameter tuning controls two tradeoffs
- Precision-Recall Tradeoff
- Bias-Variance Tradeoff

#Grid Search

Grid search and other techniques require properly sized datsets.
Rule of thumb: 100+ records, 12+ records per each class.
We cannot have training, validation, and test sets with smaller datasets.
We split into training and test sets, then use cross-validation on the training set.
Use any technique with cross-validation to find the best hyperparameter values.
We use the best values to train the final model on the entire training set.
We asses the final model using the test set. Be pragmatic: good enough tuning, given the available time. No perfection.

Optuna: Best for general ml optimization tasks, especially with pruning and Bayesian optimization
Hyperopt: similar to Optuna but slower in large-scale parallel tuning
Ray tune: best for deep learning and large-scale computing
Scikit-optimize: quick small-scale optimizations
Keras tuner: optimized for Tensorflow/Keras models.