By default, Weights & Biases stores all the data logged by your runs, and this can quickly become a significant cost.

Let’s see W&B in action. Imagine you’re training a PyTorch model. You’ve got your standard training loop, but you want to log metrics, artifacts, and even model checkpoints to W&B.

import wandb
import torch
from torch import nn
from torch.optim import Adam

# Initialize W&B
run = wandb.init(project="cost-optimization-demo")

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)

    def forward(self, x):
        return self.linear(x)

model = SimpleModel()
optimizer = Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

# Simulate training data
X_train = torch.randn(100, 10)
y_train = torch.randn(100, 1)

# Training loop
for epoch in range(10):
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

    # Log metrics
    wandb.log({"epoch": epoch, "loss": loss.item()})

    # Log an artifact (e.g., a small dataset snapshot)
    if epoch == 5:
        artifact = wandb.Artifact('training-data-snapshot', type='dataset')
        artifact.add_file('dummy_data.csv') # Assume dummy_data.csv exists
        run.log_artifact(artifact)

    # Log a model checkpoint
    if epoch % 3 == 0:
        torch.save(model.state_dict(), f"model_epoch_{epoch}.pt")
        wandb.save(f"model_epoch_{epoch}.pt")

# Finish the run
run.finish()

This code logs epoch loss, a dataset artifact, and model checkpoints. Without any configuration, W&B will store all of these. The wandb.save() call is particularly important here, as it tells W&B to upload the specified file as an artifact associated with the run.

The core problem W&B addresses is the difficulty in tracking and visualizing experiments. It provides a centralized platform for logging metrics, hyperparameters, code versions, and model artifacts, making reproducibility and comparison straightforward. However, this convenience comes at a cost: storage for all logged data and compute for processing and rendering it.

The key to controlling W&B costs lies in being judicious about what you log and for how long you retain it.

Storage Optimization: What to Keep

  1. Raw Logs & Metrics: Essential for tracking progress. These are usually small.
  2. Hyperparameters: Crucial for understanding experiment configurations. Also small.
  3. Artifacts (Datasets, Models): This is where storage costs skyrocket.
    • Datasets: Logging the entire dataset for every run is rarely necessary. Consider logging only subsets, or if the dataset is static, log it once and reference it.
    • Model Checkpoints: Logging every single checkpoint during training is often wasteful. You typically only need the best-performing one, or a few key milestones.

Compute Optimization: What to Process

  1. Large Media Logs: Logging high-resolution images, videos, or audio for every step can consume significant processing power and bandwidth. Downsample or log less frequently.
  2. Large Artifacts: While storage is the primary concern, downloading and analyzing very large artifacts can also impact your team’s compute.

Controlling Storage with wandb.init() and Run Configuration

You can influence what gets stored before a run starts and during a run.

  • Settings Object: This is your primary tool for fine-grained control. You can pass a wandb.Settings object to wandb.init().

    settings = wandb.Settings(
        log_code=False,  # Don't log your entire codebase
        save_code=False, # Don't save a copy of your code
        disable_git_info=True, # Skip git commit/branch info if not needed
        # The following are more advanced for controlling network/local storage
        # network_stream_enabled=False, # If you only want to sync at the end
        # _disable_meta_upload=True # To prevent uploading meta files
    )
    run = wandb.init(project="cost-optimization-demo", settings=settings)
    
    • log_code=False and save_code=False: These prevent W&B from uploading your entire project directory, saving significant storage and upload time. This is a common optimization.
    • disable_git_info=True: If your project isn’t under Git control or you don’t need to track commit hashes, this skips that metadata collection.
  • Selective Artifact Logging: Instead of wandb.save(), use run.log_artifact() with explicit control.

    # Inside your training loop, only save the best model
    if loss.item() < best_loss:
        best_loss = loss.item()
        torch.save(model.state_dict(), "best_model.pt")
        # Log only the best model artifact
        artifact = wandb.Artifact('best-model', type='model')
        artifact.add_file("best_model.pt")
        run.log_artifact(artifact)
    

    This ensures only the truly valuable model state is stored.

  • run.log_artifact() with inference_only=True: If you log a dataset artifact and only intend to use it for inference later, marking it as inference_only=True can signal to W&B that it might be eligible for different storage tiers or lifecycle policies.

  • run.log_artifact() with local_path and name: Explicitly define the artifact name and type.

    artifact = wandb.Artifact(name="my-dataset-v1", type="dataset")
    artifact.add_dir("data/processed") # Log a directory
    run.log_artifact(artifact)
    

Controlling Storage with W&B UI and API (Post-Run)

Once runs are complete, you can manage storage via the W&B UI or API.

  1. Artifact Management:

    • UI: Navigate to the "Artifacts" tab in your project. You can browse, delete, or mark artifacts for archival.

    • API: Use the W&B SDK to programmatically delete artifacts.

      api = wandb.Api()
      run_obj = api.run("username/project/run_id") # Replace with actual values
      artifact_to_delete = api.artifact("username/project/my-dataset-v1:latest")
      artifact_to_delete.delete()
      

      Be cautious with programmatic deletion; it’s irreversible.

  2. Run Deletion: Deleting a run will delete all associated logs and artifacts that aren’t explicitly stored as standalone artifacts elsewhere.

  3. Storage Tiers (Enterprise Feature): For larger organizations, W&B offers storage tiers where older or less frequently accessed data can be moved to cheaper archival storage. This is configured via your W&B team settings.

Controlling Compute and Bandwidth

  • wandb.log({"image": wandb.Image("low_res.png", sizes={"large": [100, 100]})}): For images, specify sizes to create multiple resolutions. W&B will only upload the smallest resolution by default, saving bandwidth and compute.
  • wandb.log({"video": wandb.Video("short_clip.mp4", fps=4, format="mp4")}): Downsample videos by controlling fps and format.
  • wandb.config.update({"learning_rate": 0.0001}, allow_val_change=True): While not directly a storage/compute cost, updating config frequently can add to processing overhead. Do this sparingly.

The most impactful control you have over storage costs is artifact management. Logging only essential checkpoints and smaller dataset representations, and then cleaning up old runs and artifacts, will yield the biggest savings.

Beyond storage, consider data retention policies. W&B allows you to set default retention periods for artifacts. For instance, you might configure artifacts to be automatically deleted after 90 days unless explicitly marked for long-term storage. This is a crucial step for ongoing cost management.

Want structured learning?

Take the full Wandb course →