A W&B sweep can find a better model than random search with fewer trials, but it’s not magic; it’s a sophisticated statistical model trying to guess where the good hyperparameters live based on what it’s already seen.

Let’s watch a sweep in action. Imagine we’re training a simple Keras model to classify MNIST digits. We’ve got a sweep configuration that’s looking for the best learning_rate and batch_size.

program: train.py
method: bayes
metric:
  goal: maximize
  name: accuracy
parameters:
  learning_rate:
    distribution: uniform
    max: 0.1
    min: 0.0001
  batch_size:
    distribution: int_uniform
    max: 128
    min: 16

When you run wandb agent YOUR_ENTITY/YOUR_PROJECT/SWEEP_ID, it doesn’t just pick random learning_rate and batch_size values. Instead, it starts by running a few initial trials (defined by early_terminate or a minimum number of trials). For each trial, it logs the accuracy (our goal) along with the hyperparameters used.

Once it has a few results, the Bayesian optimization engine kicks in. It builds a probabilistic model (often a Gaussian Process) of the relationship between hyperparameters and the metric. This model estimates the expected improvement at any given point in the hyperparameter space. The sweep then selects the next set of hyperparameters that has the highest expected improvement, meaning it’s most likely to yield a better result than what we’ve seen so far, while also balancing exploration (trying new, uncertain regions) and exploitation (focusing on promising areas).

Here’s what that looks like under the hood. Let’s say our first few trials looked like this:

  • Trial 1: learning_rate=0.01, batch_size=64, accuracy=0.85
  • Trial 2: learning_rate=0.001, batch_size=32, accuracy=0.90
  • Trial 3: learning_rate=0.05, batch_size=128, accuracy=0.70

The Bayesian model will take these points and create a surface representing the predicted accuracy for all possible learning_rate/batch_size combinations. It also quantifies the uncertainty in these predictions. The acquisition function (e.g., Expected Improvement) uses both the predicted mean and variance to guide the search. It will suggest a new trial in a region that’s either predicted to be good (high mean) or where the uncertainty is high (high variance), indicating a potentially undiscovered good region.

The key levers you control are:

  • program: The Python script that trains your model.
  • method: bayes for Bayesian optimization.
  • metric.name: The metric you want to optimize (e.g., accuracy, loss).
  • metric.goal: maximize or minimize.
  • parameters: Defines the hyperparameters to tune, their type (uniform, int_uniform, categorical, etc.), and their search space (min, max, values).
  • early_terminate: You can define conditions to stop unpromising trials early (e.g., type: metric, metric: loss, goal: minimize, min_iter: 3, confidence_level: 0.95). This is crucial for efficiency, as it stops trials that are clearly not going to perform well.

The surprise is that the "acquisition function" doesn’t just pick the single point with the highest predicted performance. It’s a trade-off between exploitation (picking the spot that looks best based on current data) and exploration (picking a spot that’s uncertain but could be even better). A common acquisition function is Expected Improvement (EI), which calculates the expected amount of improvement over the current best observed result. EI is high in regions that are predicted to have high performance and also in regions where the model is very uncertain, encouraging the search to venture into new territory.

When the sweep is running, you’ll see new runs appearing in your W&B project, each with different hyperparameter combinations. The beauty of W&B is that it visualizes this process, showing you the hyperparameter landscape and how the sweep is navigating it to find the optimal configuration.

Once you’ve optimized your hyperparameters, the next logical step is to explore how different model architectures might perform with those optimized settings.

Want structured learning?

Take the full Wandb course →