W&B Tables aren’t just a fancy spreadsheet for your ML metrics; they’re a fundamental shift in how you debug and understand your models by treating data and predictions as first-class citizens.
Let’s see this in action. Imagine you’re training an image classification model. You’ve logged your predictions, ground truth, and some image data.
import wandb
from wandb.data_types import Table
# Assume `predictions` and `ground_truth` are lists of labels,
# and `image_paths` is a list of paths to your image files.
# `model_output` could be bounding boxes, segmentation masks, etc.
# Initialize a W&B run
run = wandb.init(project="my-image-classifier")
# Create a W&B Table
table = Table(columns=["image", "ground_truth", "prediction", "model_output"])
# Populate the table with data
for img_path, gt_label, pred_label, model_output_data in zip(image_paths, predictions, ground_truth, model_output):
table.add_data(
wandb.Image(img_path), # Log the image
gt_label,
pred_label,
model_output_data # This could be a dict, list, or other serializable object
)
# Log the table to W&B
run.log({"predictions_table": table})
run.finish()
Once this code runs, you’ll have a "predictions_table" in your W&B run. In the W&B UI, you can click on this table and see each row containing an image, its ground truth label, the predicted label, and any associated model output. You can sort, filter, and search this table directly.
The core problem W&B Tables solve is the disconnect between your raw data, your model’s outputs, and your evaluation metrics. Traditionally, you’d look at aggregate metrics (accuracy, F1 score) and then maybe manually inspect a few misclassified examples. Tables bring these together, allowing you to filter for specific classes, identify common misclassifications, or even pinpoint examples where your model’s confidence is low.
Internally, W&B Tables are structured as a collection of columns, where each column has a defined data type (image, text, number, etc.). When you add_data, you’re essentially adding a new row to this structured dataset. The wandb.Image object is crucial here; it tells W&B to not just store a path but to upload and display the image itself within the table cell. For more complex outputs like bounding boxes or segmentation masks, you’d typically log them as JSON or a custom dictionary within the "model_output" column, and then use W&B’s custom visualization features on the frontend to render these directly in the table.
The real power comes from the interactive filtering and sorting. Suppose you want to see all images where the ground truth was "cat" but the prediction was "dog." You can simply filter the "ground_truth" column for "cat" and the "prediction" column for "dog." This immediately shows you exactly which images your model is confusing, and you can then visually inspect them to understand why. Was it an ambiguous image? A lighting issue? A subtle feature the model missed?
Most users understand that Tables let you log data and visualize it. What they often miss is the ability to create derived columns dynamically within the W&B UI after the table has been logged. For instance, you could create a new column that flags images where prediction != ground_truth or another column that calculates the difference between the predicted class probability and the second-highest class probability. This allows for on-the-fly debugging and exploration without needing to re-run your training script with modified logging.
The next concept you’ll likely dive into is using these Tables for model comparison, allowing you to directly compare the outputs of different model checkpoints or architectures side-by-side.