Polygraphy’s accuracy debugging is a powerful tool for pinpointing discrepancies between your model’s output and a reference, often due to TensorRT’s optimizations.
Let’s see Polygraphy in action. Imagine you’ve converted a PyTorch model to TensorRT and are seeing different results. You’ve run your original PyTorch model and your TensorRT engine with the same input, and the outputs don’t match. This is where Polygraphy shines.
First, you need to set up your environment. This involves installing Polygraphy and ensuring you have your model files (e.g., ONNX, TensorRT engine) and a way to run your original model (e.g., PyTorch, TensorFlow).
pip install tensorrt polygraphy
Now, let’s craft a Polygraphy script to compare the outputs. We’ll assume you have a PyTorch model (model.pt) and its ONNX export (model.onnx).
# compare_outputs.py
from polygraphy import util
from polygraphy.comparator import Comparator
from polygraphy.json_file import JsonFile
# Load your original model (e.g., PyTorch)
# This part will be specific to your framework.
# For demonstration, we'll assume a function `run_pytorch_model` exists.
def run_pytorch_model(input_data):
# Replace with your actual PyTorch model inference
import torch
model = torch.jit.load("model.pt")
model.eval()
with torch.no_grad():
output = model(torch.tensor(input_data))
return output.numpy()
# Load your TensorRT engine
from polygraphy.backend.trt import Engine, TrtRunner
trt_engine = Engine("model.onnx") # Or your .plan file
# Generate some dummy input data
input_data = [[1.0, 2.0, 3.0]] # Example input
# Run both models
pytorch_output = run_pytorch_model(input_data)
trt_runner = TrtRunner(trt_engine)
trt_output = list(trt_runner.infer(input_data))[0] # TrtRunner returns an iterator
# Compare the outputs
comparator = Comparator()
if not comparator.compare(pytorch_output, trt_output):
print("Outputs do not match!")
# Polygraphy will automatically log detailed differences when run via CLI
else:
print("Outputs match!")
To run this comparison effectively and get detailed diagnostics, you’d typically use the Polygraphy CLI. Let’s say you have your ONNX model (model.onnx) and a script to run your PyTorch model (pytorch_runner.py) that takes input from a JSON file and outputs to a JSON file.
polygraphy run model.onnx \
--trt \
--onnx-graphsurgeon \
--model-runner=pytorch_runner.py \
--input-file input.json \
--output-dir results \
--save-inputs \
--save-outputs \
--compare-outputs
In this command:
polygraphy run model.onnx: Specifies the model to run.--trt: Instructs Polygraphy to build and run a TensorRT engine from the ONNX.--onnx-graphsurgeon: Useful for inspecting and potentially modifying the ONNX graph.--model-runner=pytorch_runner.py: Points to your script that runs the reference model. This script should accept input from stdin (or a file specified by--input-file) and print its output to stdout (or a file specified by--output-file).--input-file input.json: Provides the input data for both models.--output-dir results: Where Polygraphy will save logs and intermediate files.--save-inputsand--save-outputs: Crucial for debugging, these save the exact inputs and outputs for each run.--compare-outputs: Enables the comparison of outputs between the reference and TensorRT.
Polygraphy will then execute your pytorch_runner.py script and the TensorRT engine with the same inputs. If the outputs differ, it will generate detailed reports in the results directory, highlighting the specific tensors, layers, and elements that have discrepancies.
The mental model for Polygraphy’s accuracy debugging is centered around establishing a ground truth and systematically verifying that your optimized model adheres to it. It acts as an impartial auditor. You provide Polygraphy with your original model’s execution logic (via a runner script or framework integration) and your TensorRT engine. Polygraphy then orchestrates running both with identical inputs, captures all intermediate and final outputs, and performs a rigorous element-wise comparison.
The core problem Polygraphy solves is the "black box" nature of model optimization. When TensorRT converts and optimizes your model, it can perform operations like layer fusion, kernel selection, and precision calibration. These optimizations, while boosting performance, can sometimes introduce subtle numerical differences. Polygraphy allows you to trace these differences back to their source. It doesn’t just tell you "the outputs are different"; it tells you where and by how much.
The levers you control are primarily through the command-line arguments: specifying the input data, the reference runner, the output directory, and the comparison flags. You can also guide the TensorRT build process itself using flags like --trt-min-shapes, --trt-opt-shapes, and --trt-max-shapes to ensure the engine is built for the expected input dimensions.
One of the most powerful, yet often overlooked, aspects of Polygraphy is its ability to compare intermediate layer outputs. By default, it compares the final output. However, by using --save-intermediate-outputs and specifying specific layer names (or using --save-all-intermediate-outputs), you can pinpoint exactly which layer in the TensorRT graph starts deviating from the reference model’s behavior. This is invaluable for understanding if the discrepancy arises from a specific operator or a sequence of operations.
The next concept you’ll likely encounter is performance profiling using Polygraphy, where you’ll analyze the latency and throughput of your TensorRT engine across different input shapes and batch sizes.