TensorRT ONNX GraphSurgeon: Edit and Optimize Models (2026)

GraphSurgeon is a library for manipulating and optimizing ONNX models, often used as a preprocessing step before TensorRT conversion.

Let’s see it in action. Imagine we have a simple ONNX model for adding two tensors, a and b, and we want to add a constant bias to the output.

import onnx
import numpy as np
import onnx_graphsurgeon as gs

# Create a dummy ONNX model
input_a = gs.Variable("input_a", np.float32, [1, 3, 224, 224])
input_b = gs.Variable("input_b", np.float32, [1, 3, 224, 224])
add_node = gs.Node(op="Add", inputs=[input_a, input_b], outputs=["output"])
graph = gs.Graph(nodes=[add_node], inputs=[input_a, input_b], outputs=["output"])
model = gs.export_onnx_model(graph)

# Save the original model
onnx.save(model, "original_model.onnx")
print("Original model saved to original_model.onnx")

Now, we’ll use GraphSurgeon to modify this model. We’ll insert a constant tensor and add it to the output of the original Add node.

# Load the model with GraphSurgeon
graph = gs.import_onnx_model("original_model.onnx")

# Define the constant bias
bias_value = np.array([1.0], dtype=np.float32)
bias_tensor = gs.Constant(name="bias", values=bias_value)

# Find the output tensor of the original Add node
original_output = graph.outputs[0]

# Get the original Add node
add_node = graph.nodes[0]

# Create a new Add node to add the bias
new_add_node = gs.Node(
    op="Add",
    inputs=[original_output, bias_tensor],
    outputs=["final_output"]
)

# Replace the original output with the new node's output
graph.outputs = [new_add_node.outputs[0]]

# Add the new node and the bias tensor to the graph
graph.nodes.append(new_add_node)
graph.tensors().append(bias_tensor)

# Clean up the graph (removes unused nodes/tensors)
graph.cleanup().topological_sort()

# Export the modified model
modified_model = gs.export_onnx_model(graph)
onnx.save(modified_model, "modified_model.onnx")
print("Modified model saved to modified_model.onnx")

The problem this solves is the need for fine-grained control over ONNX graphs. Libraries like ONNX Runtime or TensorRT often perform automatic optimizations, but sometimes you need to insert specific operations, modify layer parameters, or prune parts of the graph before those automatic tools get involved. GraphSurgeon gives you that programmatic access.

Internally, GraphSurgeon represents an ONNX model as a Python object graph. Nodes, Tensors, and Variables are first-class citizens. You can traverse this graph, inspect node inputs and outputs, modify attributes, and add or remove nodes and tensors. The cleanup() method is crucial; it removes any dangling connections or unused elements, ensuring the resulting ONNX graph is valid. topological_sort() then reorders the nodes correctly, which is essential for execution.

The exact levers you control are the Node and Variable objects. You can change their op (operation type), inputs, outputs, and attrs (attributes). You can also directly manipulate the graph.nodes, graph.inputs, graph.outputs, and graph.tensors() lists. For example, to change the shape of an input variable, you’d access graph.inputs[i].shape = new_shape.

When you add a constant, GraphSurgeon doesn’t just dump the values into a node; it creates a gs.Constant object. This object is then referenced by the gs.Node that needs it. This separation allows GraphSurgeon to manage constants efficiently, especially when they might be shared across multiple operations or when you want to modify their values programmatically.

The next concept you’ll likely encounter is integrating this modified ONNX model into a TensorRT engine, which involves understanding TensorRT’s builder and optimization profiles.

More Deep Dives in Tensorrt