W&B Reports can feel like just a fancy dashboard, but they’re actually a dynamic, programmatic way to build living documentation for your ML projects, directly from your live experiment data.
Let’s see how this plays out with a simple example. Imagine you’re training a classification model and want to show its performance metrics over time, along with some example predictions.
import wandb
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
# Initialize W&B run
run = wandb.init(project="my-ml-docs-demo", job_type="report-example")
# Simulate some training data and metrics
for epoch in range(10):
accuracy = 0.7 + (epoch * 0.02)
loss = 1.0 - (epoch * 0.1)
wandb.log({"epoch": epoch, "accuracy": accuracy, "loss": loss})
# Simulate some predictions for the report
predictions = pd.DataFrame({
"true_label": [0, 1, 0, 1, 0, 1, 0, 1],
"predicted_label": [0, 0, 0, 1, 1, 1, 0, 1],
"image": ["img1.png", "img2.png", "img3.png", "img4.png", "img5.png", "img6.png", "img7.png", "img8.png"]
})
wandb.log({"predictions_table": wandb.Table(dataframe=predictions)})
# Generate a confusion matrix plot
y_true = predictions["true_label"]
y_pred = predictions["predicted_label"]
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Class 0", "Class 1"])
fig, ax = plt.subplots()
disp.plot(ax=ax)
plt.title("Confusion Matrix")
wandb.log({"confusion_matrix": wandb.Image(fig)})
run.finish()
This code logs basic metrics and then, crucially, logs a wandb.Table of predictions and a wandb.Image of a confusion matrix. These aren’t just static artifacts; they are structured data that W&B Reports can directly query and render.
The problem W&B Reports solves is the disconnect between live, evolving ML experiments and static documentation. Traditionally, you’d export plots, manually curate tables, and write narrative text in a separate document (like a Jupyter Notebook or a Markdown file). This documentation quickly becomes stale as you retrain models or collect more data. Reports bridge this gap by embedding the documentation directly within the W&B ecosystem, linked to the very runs that generated the data.
Internally, a W&B Report is essentially a special type of W&B run that references other runs and their logged artifacts. When you create a report, you’re defining a canvas. Into this canvas, you can pull:
- Metrics: Time-series data logged with
wandb.log(). You can plot these directly, showing trends like accuracy or loss over epochs. - Tables: Structured data logged as
wandb.Table. This is perfect for displaying model predictions, evaluation results, or dataset samples. You can sort, filter, and even link to individual runs within a table. - Artifacts: Any logged file, including images, plots, models, and even arbitrary data files. Reports can display these directly or use them as input for further analysis within the report itself.
- System Metrics: CPU usage, GPU utilization, memory consumption, etc., which are logged automatically during a run.
The "magic" lies in the ability to use W&B’s query language (or a simplified UI builder) to select data from one or more runs. For instance, you can tell a report to "show the 'accuracy' metric from all runs in the 'my-ml-docs-demo' project where 'job_type' is 'report-example' and order them by 'creation_time'." This dynamic linking means your report automatically updates as new runs complete.
To build a report, you navigate to your W&B project, click "Reports," and then "Create Report." You’ll see a rich text editor where you can add text, markdown, and importantly, "Blocks." These blocks are where you integrate your live data. You can select a "Line Plot" block and configure it to pull the accuracy metric from your runs. You can add a "Table" block and point it to your predictions_table. You can even add a "Confusion Matrix" block and point it to the logged confusion_matrix artifact.
The most surprising true thing about W&B Reports is their ability to embed executable code within the report itself. While the UI builder is powerful for common visualizations, you can also add "Code" blocks. Within these blocks, you can write Python code that runs in the W&B environment, using the logged data from your referenced runs as input. This allows for highly custom visualizations, complex data transformations, or even running inference on new data using a logged model, all within the report. For example, you could write Python code in a block to take the predictions_table, filter for misclassified examples, and then display those specific examples using their associated image artifact.
Once your report is built and published, it gets a shareable URL, making it easy to communicate your findings to stakeholders who may not have direct access to W&B or the technical background to interpret raw logs.
The next step after mastering reports is often exploring how to programmatically generate them using the W&B SDK, rather than solely relying on the UI.