TensorRT doesn’t let you just add operations to its graph; it forces you to reimplement them within its own C++ framework.

Let’s say you’ve got a model with a custom operation that TensorRT doesn’t natively support. This usually happens when you’re using a newer framework feature or a very niche operation. TensorRT needs to know how to execute this operation on the GPU. It can’t just magically infer it.

Here’s how you bring your unsupported operation into TensorRT’s world:

The Core Idea: A Plugin Library

You’ll create a shared library (a .so file on Linux, .dll on Windows) that TensorRT can load at runtime. This library will contain your custom plugin. A plugin is essentially a C++ class that inherits from nvinfer1::IPluginV2Ext. This class tells TensorRT:

  1. What your operation does: This is the execute_v2 method, where you’ll write CUDA kernels to perform the computation.
  2. Its input/output shapes and data types: This is handled by getOutputDimensions and configurePlugin.
  3. How to serialize/deserialize it: This is crucial for saving and loading your engine.

Step-by-Step: Building Your Plugin

1. Define Your Plugin Class

You need a C++ class that inherits from nvinfer1::IPluginV2Ext. It needs to implement several virtual methods.

#include "NvInfer.h"
#include <vector>
#include <string>
#include <cassert>

// Forward declaration of CUDA kernel
void customOpKernel(const float* input, float* output, int batchSize, int numElements);

class MyCustomPlugin : public nvinfer1::IPluginV2Ext {
public:
    MyCustomPlugin(const std::string name, int batchSize); // Constructor
    MyCustomPlugin(const void* data, size_t length);     // Deserialization constructor

    // Core methods
    nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputs) override;
    int enqueue(const nvinfer1::PluginTensorDesc* inputDesc, nvinfer1::PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) override;
    void configurePlugin(const nvinfer1::Dims* inputs, int nbInputs, const nvinfer1::Dims* outputs, int nbOutputs) override;

    // Serialization
    size_t getSerializationSize() const override;
    void serialize(void* buffer) const override;

    // Plugin management
    const char* getPluginNamespace() const override;
    const char* getPluginType() const override;
    const char* getPluginVersion() const override;
    void destroy() override;
    nvinfer1::IPluginV2Ext* clone() const override;

    // Other helper methods
    void setPluginNamespace(const char* namespaceStr);
    void attachToContext(
        const char* libNamespace,
        const char* pluginName,
        const nvinfer1::IPluginV2Registry& pluginRegistry) override;

private:
    std::string mName;
    int mBatchSize;
    const char* mNamespace; // Store namespace if needed
};

2. Implement the Plugin Methods

  • MyCustomPlugin(const std::string name, int batchSize): Your constructor. Here, you’d store any parameters your operation needs (like kernel sizes, strides, etc.) and initialize internal states. For this example, we just store the name and batch size.
  • MyCustomPlugin(const void* data, size_t length): This is crucial for deserialization. When TensorRT loads an engine, it needs to reconstruct your plugin object. You’ll read the serialized data here to repopulate your plugin’s state.
  • getOutputDimensions(...): This tells TensorRT the shape and data type of your plugin’s output tensor(s) given the input tensor shapes. You’ll do shape inference here. For our simple example, the output shape is the same as the input shape.
  • configurePlugin(...): This is where you do per-configuration setup. You might allocate device memory for weights or intermediate buffers here. You’ll also check if the input/output dimensions and data types are compatible with your operation.
  • enqueue(...): This is the heart of your plugin. You’ll launch your CUDA kernel here. inputs and outputs are pointers to the device memory for your tensors. workspace is a buffer for any temporary memory your kernel needs. stream is the CUDA stream to enqueue the kernel on.
    int MyCustomPlugin::enqueue(const nvinfer1::PluginTensorDesc* inputDesc, nvinfer1::PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) {
        // Assuming a single input and output, float32
        const float* input = static_cast<const float*>(inputs[0]);
        float* output = static_cast<float*>(outputs[0]);
    
        // Get dimensions from inputDesc
        // For simplicity, assume a flat tensor and batchSize is known
        int numElements = 1;
        for (int i = 0; i < inputDesc[0].dims.nbDims; ++i) {
            numElements *= inputDesc[0].dims.d[i];
        }
    
        // Launch your CUDA kernel
        customOpKernel(input, output, mBatchSize, numElements);
    
        return 0; // 0 indicates success
    }
    
  • getSerializationSize() and serialize(...): These methods handle saving your plugin’s state when the engine is serialized. You’ll write your plugin’s parameters (like mBatchSize) into the buffer.
  • clone(): Creates a deep copy of your plugin object.
  • destroy(): Cleans up any resources allocated by your plugin.
  • getPluginNamespace(), getPluginType(), getPluginVersion(): These return strings identifying your plugin. Crucial for TensorRT to find and load it.
  • attachToContext(...): Called when the plugin is attached to a builder.

3. Implement the CUDA Kernel

This is standard CUDA programming.

__global__ void customOpKernel(const float* input, float* output, int batchSize, int numElements) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < numElements) {
        // Example: Simple element-wise addition of batchSize
        output[idx] = input[idx] + static_cast<float>(batchSize);
    }
}

4. Register Your Plugin

TensorRT needs to know about your plugin. You do this by implementing a IPluginCreator and registering it with the IPluginRegistry.

class MyCustomPluginCreator : public nvinfer1::IPluginCreator {
public:
    IPluginV2* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) override {
        // Parse fields from fc to get parameters like batchSize
        int batchSize = 1; // Default
        // ... logic to extract batchSize from fc ...
        return new MyCustomPlugin(name, batchSize);
    }

    IPluginV2* deserialize_plugin(const char* name, const void* serial_data, size_t serial_length) override {
        // Create a plugin object from serialized data
        return new MyCustomPlugin(serial_data, serial_length);
    }

    const char* getPluginNamespace() const override { return "my_plugin_namespace"; }
    const char* getPluginName() const override { return "MyCustomPlugin"; }
    const char* getPluginVersion() const override { return "1"; }

    void setPluginNamespace(const char* namespaceStr) override { /* ... */ }

    // Optional: define plugin fields for easier parameter passing
    const nvinfer1::PluginFieldCollection* getFieldDescriptions() const override { return nullptr; }
};

// Global registration function (usually in a .cpp file)
extern "C" void registerPlugins(nvinfer1::IPluginRegistry* registry) {
    static MyCustomPluginCreator myPluginCreator;
    registry->registerCreator(myPluginCreator, myPluginCreator.getPluginNamespace());
}

5. Build and Load the Plugin Library

Compile your C++ code and CUDA code into a shared library.

# Example for Linux
nvcc -I/usr/local/tensorrt/include -c my_plugin.cu -o my_plugin.o
g++ -I/usr/local/tensorrt/include -c my_plugin.cpp -o my_plugin.o -fPIC -shared
g++ my_plugin.o -o libmycustomplugin.so -shared -lcudart -lnvinfer

When building your TensorRT engine, you’ll need to tell it about your plugin library.

# Python example using pycuda
import tensorrt as trt
import pycuda.autoinit # Initializes CUDA context
import pycuda.driver as cuda
import os

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

def build_engine_with_plugin(model_path, plugin_library_path):
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(EXPLICIT_BATCH)
    config = builder.create_builder_config()

    # Load the plugin library
    runtime = trt.Runtime(TRT_LOGGER)
    if not os.path.exists(plugin_library_path):
        raise FileNotFoundError(f"Plugin library not found at {plugin_library_path}")
    
    # THIS IS THE KEY PART: Load the plugin library into the runtime
    # The registerPlugins function inside the library will be called.
    trt.init_libnvinfer_plugins(TRT_LOGGER, "") # Initialize TensorRT plugins
    plugin_registry = trt.get_plugin_registry()
    
    # Register your custom plugin creator if not already done by init_libnvinfer_plugins
    # (This depends on how your plugin library is structured and loaded)
    # A common pattern is to have a function like registerPlugins that is called implicitly
    # or explicitly. `init_libnvinfer_plugins` might handle some standard plugins.
    # For truly custom ones, you often need to ensure your registration function is called.
    # If your plugin isn't found, you might need to manually load it or ensure your
    # registration function is correctly exported and called.

    # Example of adding a custom plugin to the network (if you built it manually):
    # This part assumes you've defined how to add your plugin *after* it's loaded.
    # If your model already contains the plugin node, TensorRT will try to resolve it.

    # For ONNX: TensorRT's ONNX parser will look for registered plugins.
    # If your ONNX model has a node with type "MyCustomPlugin" and namespace "my_plugin_namespace",
    # and your library is loaded, the parser will find it.
    parser = trt.OnnxParser(network, TRT_LOGGER)
    with open(model_path, "rb") as model:
        if not parser.parse(model.read()):
            print("ERROR: Failed to parse ONNX file")
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            return None

    # After parsing, TensorRT might try to find your plugin.
    # If your ONNX has an explicit plugin node, the parser should find it if registered.
    # If you're building the network programmatically, you'd explicitly create and add it.
    # For example, after parsing, you might find a node and replace it with your plugin.
    # Or, if your ONNX is just a sequence of ops and you want to replace a standard op
    # with your plugin, you'd do that here.

    # Example: If you wanted to manually add your plugin (not typical for ONNX import)
    # plugin_field_collection = trt.PluginFieldCollection([trt.PluginField("batchSize", 4, trt.PluginFieldType.INT32)])
    # custom_plugin = plugin_registry.get_plugin_creator("MyCustomPlugin", "1", "my_plugin_namespace").create_plugin("my_custom_layer", plugin_field_collection)
    # network.add_plugin_v2([input_tensor], custom_plugin) # You'd need to get input_tensor

    config.max_workspace_size = 1 << 30 # 1GB workspace

    # Add plugin to builder config if necessary (e.g., for specific optimizations)
    # This is more for plugins that need builder-level configuration, less common for basic ops.
    
    serialized_engine = builder.build_serialized_network(network, config)
    if serialized_engine:
        return runtime.deserialize_cuda_engine(serialized_engine)
    else:
        return None

# --- Usage ---
# Assume model.onnx exists and contains a node that TensorRT can map to "MyCustomPlugin"
# Assume libmycustomplugin.so exists and is in a location accessible by the runtime.
# You might need to set LD_LIBRARY_PATH or provide the full path.

# Ensure the plugin library is discoverable. Often, you need to ensure the directory
# containing the plugin .so is in LD_LIBRARY_PATH or similar environment variable.
# Or, you can use a mechanism to explicitly load the library before building.
# The `trt.init_libnvinfer_plugins` function is a good starting point, but for
# truly custom plugins, you might need a more direct way to ensure your registration
# function is called.

# Example: Forcing a registration function call if not automatic
# import ctypes
# lib = ctypes.CDLL("path/to/libmycustomplugin.so")
# lib.registerPlugins(trt.get_plugin_registry()) # Assuming registerPlugins is exported

plugin_lib_path = "./libmycustomplugin.so" # Adjust path as needed
onnx_model_path = "model.onnx"

engine = build_engine_with_plugin(onnx_model_path, plugin_lib_path)

if engine:
    print("Engine built successfully with custom plugin!")
    # Proceed to create an execution context and run inference
else:
    print("Failed to build engine.")

The key is that when TensorRT parses your model (e.g., ONNX), it looks for nodes that match registered plugin types. If it finds one, and your plugin library has been loaded and registered correctly, it will use your IPluginV2Ext implementation.

The next hurdle is often debugging your CUDA kernels within the TensorRT pipeline or optimizing the enqueue method for maximum performance.

Want structured learning?

Take the full Tensorrt course →