W&B Git integration lets you tie each experiment run to the exact code that generated it.

Here’s a quick demo:

import wandb

# Login to your Weights & Biases account
# If you haven't already, run `wandb login` in your terminal
# wandb.login()

# Start a new run
wandb.init(project="git-integration-demo")

# Simulate some model training
print("Training model with code version:", wandb.run.git.commit)
print("Code diff:\n", wandb.run.git.diff)

# Log some dummy metrics
wandb.log({"accuracy": 0.95})

# Finish the run
wandb.finish()

When you run this, Weights & Biases automatically captures your Git commit hash and any uncommitted changes. You’ll see this information attached to your run in the W&B UI.

The Problem It Solves

Reproducibility is a cornerstone of scientific research and robust software development. In machine learning, this means being able to rerun an experiment and get the same results. Code is a critical, often overlooked, component of this reproducibility puzzle. If you change your code between runs, even slightly, it can be impossible to pinpoint which version of the code produced a specific result. Manually tracking Git commits alongside experiment logs is tedious and error-prone.

How It Works Internally

When wandb.init() is called, the W&B client library checks if it’s running within a Git repository. If it is, it queries the Git command-line interface to retrieve:

  1. Commit Hash: The unique identifier for the current state of your repository.
  2. Repository Root: The absolute path to the .git directory.
  3. Uncommitted Changes (Diff): A patch representing any modifications to tracked files that haven’t been committed yet.
  4. Remote URL: The URL of the configured remote repository (e.g., origin).

This information is then associated with the W&B run object and uploaded to the W&B backend. In the W&B UI, you can see the commit hash for each run. Clicking on it will take you directly to that commit on your Git hosting platform (like GitHub, GitLab, or Bitbucket), assuming the remote is correctly configured and accessible. The diff is displayed directly in the W&B UI, highlighting exactly what was different in your working directory at the time of the run.

Controlling the Integration

You can control how Git integration behaves using arguments to wandb.init():

  • git_remote=None: By default, W&B tries to infer the remote. You can explicitly set it, for example, git_remote="https://github.com/myorg/myrepo.git" to ensure the correct remote is recorded, especially if you have multiple remotes configured.
  • save_code=True: This is the default and tells W&B to upload your entire project directory (excluding .git and other specified exclusions) as a zip file. This is crucial for full reproducibility, as it captures all project files, not just the ones tracked by Git.
  • code_dir=".": Specifies the directory to save as code. Defaults to the current working directory.
  • exclude_paths=["*.pyc", "data/", ".ipynb_checkpoints/"]: A list of glob patterns for files and directories to exclude from the code archive.

For example, to explicitly set the remote and exclude a data directory:

wandb.init(
    project="git-integration-advanced",
    git_remote="https://github.com/myorg/myrepo.git",
    exclude_paths=["data/"]
)

You can also disable Git integration entirely for a run if needed, though this is generally discouraged for reproducibility:

wandb.init(project="no-git-tracking", enable_git_repo=False)

The One Thing Most People Don’t Know

Weights & Biases doesn’t just store the diff of uncommitted changes. If you have staged changes (added to the index but not yet committed), these are also included in the diff shown in the UI. This is important because staged changes represent code that you intend to commit soon, and their presence can explain subtle differences in run behavior even if a commit hash is recorded.

Next Steps

Once you’re comfortable tracking code versions, you’ll want to explore how to automatically link these code versions to specific model artifacts or datasets.

Want structured learning?

Take the full Wandb course →