TensorRT Network Definition API: Add Custom Ops (2026)

You can define custom operations in TensorRT, but it’s not about adding them to TensorRT itself; it’s about telling TensorRT how to execute an operation it doesn’t natively understand, by mapping it to a sequence of existing TensorRT operations or a custom CUDA kernel.

Let’s see this in action. Imagine we have a simple ONNX model that uses a custom ONNX operator, say com.example.MyCustomOp. TensorRT, by default, won’t know what to do with this. We need to give it instructions.

Here’s a snippet of what the ONNX graph might look like conceptually, before TensorRT gets involved:

Input(tensor) --> com.example.MyCustomOp --> Output(tensor)

When TensorRT encounters com.example.MyCustomOp during network import, it will usually throw an error like "Unimplemented operator com.example.MyCustomOp". This is where we step in.

The primary way to handle this is through TensorRT’s plugin system. A plugin is essentially a piece of code (usually a CUDA kernel and some C++ logic) that TensorRT can call to perform an operation it doesn’t recognize.

Here’s how you’d typically set it up:

Define the Plugin in C++: You’ll write a C++ class that inherits from nvinfer1::IPluginV2Ext (or similar plugin interfaces). This class needs to implement several methods:

initialize(): For any setup needed when the plugin is created.
destroy(): For cleanup.
getOutputDimensions(): To inform TensorRT about the shape and datatype of the plugin’s output.
configurePlugin(): To validate input/output dimensions and allocate workspace.
enqueue(): This is the core. It’s where you launch your custom CUDA kernel to perform the actual computation.
clone(): To create a copy of the plugin.
getPluginType(), getPluginVersion(): To identify your plugin.
serialize(), deserialize(): For saving and loading the network.

Let’s say our MyCustomOp performs an element-wise addition with a learned bias.

// Simplified example
class MyCustomOpPlugin : public nvinfer1::IPluginV2Ext {
public:
    // Constructor, destructor, clone, etc.

    nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputs) noexcept override {
        // For element-wise ops, output dims usually match input dims
        return inputs[0];
    }

    int enqueue(int batchSize, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) noexcept override {
        // Get input/output tensor data pointers and dimensions
        const float* input_data = static_cast<const float*>(inputs[0]);
        float* output_data = static_cast<float*>(outputs[0]);
        const int num_elements = // ... calculate from input dimensions ...

        // Launch your custom CUDA kernel
        my_custom_add_kernel<<<blocks, threads, 0, stream>>>(input_data, bias_data, output_data, num_elements);

        return 0; // Success
    }

    // ... other required methods ...
private:
    float* bias_data; // Assuming bias is a member variable
};

Create a Plugin Registry and Factory: TensorRT uses a registry to find and instantiate plugins. You’ll need to register your plugin type and provide a factory that can create instances of your MyCustomOpPlugin.

// Simplified registry and factory
class MyCustomOpPluginRegistry {
public:
    static nvinfer1::IPluginV2* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) {
        // Parse fields from fc to configure the plugin (e.g., bias values)
        // ...
        return new MyCustomOpPlugin(/* initialized parameters */);
    }
    static void registerPlugin() {
        auto& builder = *nvinfer1::getPluginRegistry();
        builder.registerCreator(MyCustomOpPluginCreator::get(), "com.example.MyCustomOp");
    }
};

Register Plugins with the Builder: Before building your TensorRT engine, you need to tell the nvinfer1::IBuilder about your custom plugins. This is often done by registering a custom nvinfer1::IPluginRegistry or by directly registering your plugin creators.

// In your TensorRT engine building code
nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(logger);
nvinfer1::INetworkDefinition* network = builder->createNetworkV2(0U);
// ... parse ONNX or C++ network definition ...

// Register your custom plugin factory
auto& pluginRegistry = *nvinfer1::getPluginRegistry();
pluginRegistry.registerCreator(MyCustomOpPluginCreator::get(), "com.example.MyCustomOp"); // Assuming MyCustomOpCreator exists

// ... build engine ...

Provide Plugin Data During Deserialization: If you save a TensorRT engine that uses plugins, you need to provide the plugin implementations to the IRuntime when you load the engine. You do this by creating a custom nvinfer1::IPluginRegistry for the runtime.

// When loading an engine with plugins
nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(logger);

// Create a custom plugin registry to resolve your plugin
class MyPluginResolver : public nvinfer1::IPluginRegistry {
    // Implement methods to return your plugin creator(s)
public:
    const nvinfer1::IPluginCreator* getPluginCreator(const char* name, int version) const noexcept override {
        if (strcmp(name, "com.example.MyCustomOp") == 0) {
            return MyCustomOpPluginCreator::get();
        }
        return nullptr;
    }
    // ... other methods ...
};
MyPluginResolver resolver;
runtime->setPluginRegistry(&resolver); // Tell runtime about your plugins

// Load the engine
nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine(serialized_engine.data(), serialized_engine.size());

The critical part is that TensorRT doesn’t learn a new operation. Instead, it delegates the execution of that operation to your custom plugin code. Your plugin must correctly define inputs, outputs, and perform the computation using CUDA kernels. The plugin system is a powerful escape hatch for operations that aren’t natively supported by TensorRT or the ONNX standard it consumes.

The next hurdle you’ll likely face is managing complex plugin configurations, especially when dealing with multiple custom ops or ops with many parameters that need to be serialized and deserialized correctly.

More Deep Dives in Tensorrt