Photograph by Jorge Royan, licensed under the Creative Commons Attribution-Share Alike 3.0 Unported
Created by Joris Gillis using Maple 10, distributed as public domain.
Created by Joris Gillis using Maple 10, distributed as public domain.
In some cases, hyperparameters determine the complexity or flexibility of the model (e.g. regularization parameters) which are used to prevent overfitting (high variance).
In other cases, hyperparameters affect how the algorithm learns:
[A hyperparameter optimizer is] a functional from data to classifier (taking classification problems as an example)...
-- James Bergstra, Algorithms for Hyper-Parameter Optimization [0]
Illustration from Random Search for Hyper-Parameter Optimization by Bergstra and Bengio.[1]
Random Search vs Grid Search
How many times would we need to sample randomly to get reasonably close (>95% probability) to an optimal configuration?
The probability of missing (not getting within 5% of) our global optimum on $n$ successive samples will be
$$ P_{miss} = (1 - 0.05)^n $$Hence the probability of a hit will be
$$ P_{hit} = 1 - P_{miss} = 1 - (1 - 0.05)^n. $$Setting the probability of a hit to 95% and solving for $n$ gives us $$ P_{hit} = 1 - (1 - 0.05)^n = 0.95 $$
Hence, using only 60 samples, we have a 95% probability of getting a sample within 5% of the global optimum. Very nice!
Core illustration from Random Search for Hyper-Parameter Optimization by Bergstra and Bengio. [1]
import numpy as np
from time import time
from operator import itemgetter
from scipy.stats import randint as sp_randint
from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
iris = load_digits() # get some data
X, y = iris.data, iris.target
clf = RandomForestClassifier(n_estimators=20) # build a classifier
def report(grid_scores, n_top=3):
# Utility function to report best scores
top_scores = sorted(grid_scores, key=itemgetter(1), reverse=True)[:n_top]
for i, score in enumerate(top_scores):
print("Model with rank: {0}".format(i + 1))
print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
score.mean_validation_score,
np.std(score.cv_validation_scores)))
print("Parameters: {0}".format(score.parameters))
print("")
# specify parameters and distributions to sample from
param_dist = {"max_depth": [3, None],
"max_features": sp_randint(1, 11),
"min_samples_split": sp_randint(1, 11),
"min_samples_leaf": sp_randint(1, 11),
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
# run randomized search
n_iter_search = 20
random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
n_iter=n_iter_search)
start = time()
random_search.fit(X, y)
finish = (time() - start)
print("RandomizedSearchCV took {time} seconds for {candidate} candidates"
" parameter settings.".format(time=finish, candidate=n_iter_search))
report(random_search.grid_scores_)
RandomizedSearchCV took 4.0490076541900635 seconds for 20 candidates parameter settings. Model with rank: 1 Mean validation score: 0.921 (std: 0.013) Parameters: {'max_features': 8, 'criterion': 'entropy', 'max_depth': None, 'min_samples_split': 5, 'bootstrap': True, 'min_samples_leaf': 3} Model with rank: 2 Mean validation score: 0.920 (std: 0.007) Parameters: {'max_features': 4, 'criterion': 'gini', 'max_depth': None, 'min_samples_split': 3, 'bootstrap': True, 'min_samples_leaf': 3} Model with rank: 3 Mean validation score: 0.919 (std: 0.017) Parameters: {'max_features': 8, 'criterion': 'entropy', 'max_depth': None, 'min_samples_split': 10, 'bootstrap': False, 'min_samples_leaf': 5}
# use a full grid over all parameters
param_grid = {"max_depth": [3, None],
"max_features": [1, 3, 10],
"min_samples_split": [1, 3, 10],
"min_samples_leaf": [1, 3, 10],
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
# run grid search
grid_search = GridSearchCV(clf, param_grid=param_grid)
start = time()
grid_search.fit(X, y)
finish = (time() - start)
print("GridSearchCV took {time} seconds for {candidates} candidate parameter settings.".format(
time=finish, candidates=len(grid_search.grid_scores_)))
report(grid_search.grid_scores_)
GridSearchCV took 37.768978118896484 seconds for 216 candidate parameter settings. Model with rank: 1 Mean validation score: 0.936 (std: 0.013) Parameters: {'max_features': 10, 'criterion': 'entropy', 'max_depth': None, 'min_samples_split': 3, 'bootstrap': True, 'min_samples_leaf': 1} Model with rank: 2 Mean validation score: 0.932 (std: 0.009) Parameters: {'max_features': 3, 'criterion': 'gini', 'max_depth': None, 'min_samples_split': 1, 'bootstrap': False, 'min_samples_leaf': 1} Model with rank: 3 Mean validation score: 0.931 (std: 0.012) Parameters: {'max_features': 10, 'criterion': 'gini', 'max_depth': None, 'min_samples_split': 3, 'bootstrap': False, 'min_samples_leaf': 3}
Feel free to check out the TotalGood Hyperparameter Optimization repo (HyperParameterOptimization/benchmarks) on github to learn more about our current research (we love pull requests fyi).
[0] Bengio, Y., Bergstra, J., Bardenet, R., & Kegl, B. (n.d.). Algorithms for Hyper-Parameter Optimization. Annual Conference on Neural Information Processing Systems (NIPS). Retrieved August 21, 2015, from http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
[1] Bengio, Yoshua, and James Bergstra. "Random Search for Hyper-Parameter Optimization." Journal of Machine Learning Research 13.281-305 (2012). Web. 20 Aug. 2015. http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf.
Zheng, Alice. "How to Evaluate Machine Learning Models: Hyperparameter Tuning." How to Evaluate Machine Learning Models: Hyperparameter Tuning. 27 May 2015. Web. 21 Aug. 2015.
"Automatic Hyperparameter Tuning Methods." John Myles White. Web. 21 Aug. 2015.
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
Presentation made using Jupyter, IPython, and React.js