W&B Media Logging: Images, Audio, and Video (2026)

Logging media in Weights & Biases (W&B) doesn’t just save files; it fundamentally changes how you debug and understand your model’s behavior by making raw data first-class citizens alongside metrics.

Let’s see it in action. Imagine you’re training an object detection model and want to visualize its predictions on a few sample images.

import wandb
import random

# Initialize a W&B run
wandb.init(project="media-logging-demo")

# Simulate some training data and model predictions
for i in range(5):
    # Generate a dummy image (replace with your actual image loading)
    image_data = [[random.randint(0, 255) for _ in range(64)] for _ in range(64)]
    image = wandb.Image(image_data, caption=f"Sample Image {i}")

    # Simulate model predictions (e.g., bounding boxes)
    boxes = {
        "predictions": {
            "box_data": [
                {"position": {"minX": 0.1, "minY": 0.2, "maxX": 0.5, "maxY": 0.6}, "scores": [{"label": "cat", "score": 0.9}]},
                {"position": {"minX": 0.7, "minY": 0.3, "maxX": 0.9, "maxY": 0.8}, "scores": [{"label": "dog", "score": 0.85}]}
            ],
            "class_labels": ["cat", "dog"]
        }
    }

    # Log the image with predictions
    wandb.log({"detection_output": image})

    # Simulate logging audio (replace with your actual audio loading)
    # For demonstration, we'll create a dummy audio file path
    dummy_audio_path = f"dummy_audio_{i}.wav"
    with open(dummy_audio_path, "w") as f: # This just creates an empty file for the example
        f.write("dummy audio content")
    audio = wandb.Audio(dummy_audio_path, caption=f"Sample Audio {i}")
    wandb.log({"audio_output": audio})

    # Simulate logging video (replace with your actual video loading)
    # For demonstration, we'll create a dummy video file path
    dummy_video_path = f"dummy_video_{i}.mp4"
    with open(dummy_video_path, "w") as f: # This just creates an empty file for the example
        f.write("dummy video content")
    video = wandb.Video(dummy_video_path, caption=f"Sample Video {i}")
    wandb.log({"video_output": video})

wandb.finish()

This code snippet shows how to log images with bounding box overlays, audio clips, and video segments directly within your training loop. When you run this, W&B creates a project and a run. Within the run’s dashboard, you’ll see dedicated sections for "Images," "Audio," and "Videos." The images will display the sample images with the predicted bounding boxes drawn on them, the audio will be playable directly in the browser, and the videos will stream.

The core problem W&B media logging solves is the disconnect between abstract metrics (like accuracy or loss) and concrete data. When your model performs poorly, you often need to inspect why by looking at the actual data it’s processing. Traditional methods involve saving files manually, often to scattered directories, making it hard to correlate specific data samples with specific training steps or metric values. W&B integrates this directly into the experiment tracking workflow.

Internally, W&B handles media by uploading the files to cloud storage and then creating references in your W&B project’s metadata. When you log wandb.Image("path/to/image.png"), W&B uploads image.png, stores it, and associates its URL with the current wandb.run object. For complex media like images with overlays or videos, W&B provides specific classes (wandb.Image, wandb.Video, wandb.Audio) that accept not just file paths but also raw data (like NumPy arrays or PIL Images) and can even render visualizations directly. The boxes dictionary passed to wandb.Image is a structured format W&B understands to draw bounding boxes, masks, or keypoints on the image. This structured logging allows the W&B UI to render these annotations interactively.

When you log audio, you can pass a file path or raw audio data. W&B uploads the file and provides an in-browser player. Similarly, for video, W&B uploads the video file and allows streaming playback. The caption argument is crucial for adding context to each logged media item, making it easier to identify what you’re looking at later.

A key detail often overlooked is the allow_code=True flag when logging media that involves custom rendering or complex data structures. If you’re logging a wandb.Image with custom overlays defined by a Python dictionary (like the boxes example), W&B needs to execute some Python code on its backend to render that image. By default, this might be restricted. Setting allow_code=True when initializing your run (wandb.init(project="...", allow_code=True)) permits this backend processing, enabling features like interactive bounding box drawing. Without it, you might see errors or the media might not render as expected.

After successfully logging images, audio, and video, your next step is often to explore how these media outputs change over training epochs, correlating them with metric improvements or degradations.

More Deep Dives in Wandb