The most surprising thing about Keras Tuner is that it doesn’t actually tune your hyperparameters; it searches for the best ones using various strategies.
Let’s watch it in action. Imagine you’re building a simple neural network for image classification using the MNIST dataset. You want to find the optimal number of units in a dense layer and the best learning rate for your Adam optimizer.
import tensorflow as tf
from tensorflow import keras
import keras_tuner as kt
# Load and preprocess the data
(img_train, label_train), (img_test, label_test) = keras.datasets.mnist.load_data()
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0
img_train = img_train.reshape(-1, 28, 28, 1)
img_test = img_test.reshape(-1, 28, 28, 1)
# Define the model-building function
def build_model(hp):
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28, 1)))
# Tune the number of units in the first dense layer
units = hp.Int('units', min_value=32, max_value=128, step=32)
model.add(keras.layers.Dense(units, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
# Tune the learning rate for the Adam optimizer
learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
model.compile(optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
# Initialize the Hyperband tuner
tuner = kt.Hyperband(
build_model,
objective='val_accuracy',
max_epochs=10,
factor=3, # Controls the reduction factor for Hyperband
directory='my_dir',
project_name='intro_to_kt'
)
# Search for the best hyperparameters
tuner.search(img_train, label_train, epochs=10, validation_split=0.2)
# Get the best hyperparameters and model
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"""
The optimal number of units is {best_hps.get('units')}
and the optimal learning rate is {best_hps.get('learning_rate')}.
""")
model = tuner.hypermodel.build(best_hps)
model.fit(img_train, label_train, epochs=10, validation_split=0.2) # Train the final model
Keras Tuner solves the problem of manually trying out different hyperparameter combinations, which is tedious and often leads to suboptimal models. It automates this process by defining a search space for your hyperparameters and then employing a search algorithm to explore that space efficiently.
Internally, the tuner.search() method orchestrates the entire process. For each trial, it:
- Samples Hyperparameters: Based on the chosen tuner (e.g.,
Hyperband,RandomSearch,BayesianOptimization), it selects a combination of hyperparameter values from the search space defined inbuild_model. - Builds the Model: It calls your
build_modelfunction with the sampled hyperparameters to construct a Keras model. - Trains the Model: It trains this model on your training data for a specified number of epochs, evaluating it on a validation split.
- Records Results: It records the performance metric (e.g.,
val_accuracy) for that trial. - Iterates: It repeats this process for many trials, systematically exploring the hyperparameter space.
The kt.Hyperband tuner, as used above, is particularly interesting because it implements the Hyperband algorithm. It’s an early-stopping-based algorithm that allocates a fixed budget of resources (epochs in this case) across a large number of configurations. It starts by training many models for a few epochs and progressively eliminates the worst-performing ones, allocating more resources to the promising candidates. The factor argument controls how aggressively it reduces the number of configurations at each bracket.
The hp object passed to your build_model function is the key to defining your search space. hp.Int, hp.Choice, hp.Float, and hp.Boolean allow you to specify different types of hyperparameters and their ranges or discrete values. The tuner then intelligently samples from these defined spaces.
When you call tuner.get_best_hyperparameters(), the tuner analyzes the results of all its trials and returns the combination of hyperparameters that yielded the best objective metric. You then use this best_hps object to build and train your final, optimized model.
A common pitfall is not having a sufficiently diverse or well-defined search space. If your min_value and max_value for hp.Int are too close, or if your values for hp.Choice don’t include the truly optimal setting, the tuner might miss the best combination. You also need to ensure your objective metric is appropriate for your task and that your max_epochs in the tuner is high enough to allow models to converge and show meaningful performance differences.
The next step after hyperparameter tuning is often exploring different model architectures or incorporating more advanced regularization techniques.