You can define custom operations in TensorRT, but it’s not about adding them to TensorRT itself; it’s about telling TensorRT how to execute an operation it doesn’t natively understand, by mapping it to a sequence of existing TensorRT operations or a custom CUDA kernel.
Let’s see this in action. Imagine we have a simple ONNX model that uses a custom ONNX operator, say com.example.MyCustomOp. TensorRT, by default, won’t know what to do with this. We need to give it instructions.
Here’s a snippet of what the ONNX graph might look like conceptually, before TensorRT gets involved:
Input(tensor) --> com.example.MyCustomOp --> Output(tensor)
When TensorRT encounters com.example.MyCustomOp during network import, it will usually throw an error like "Unimplemented operator com.example.MyCustomOp". This is where we step in.
The primary way to handle this is through TensorRT’s plugin system. A plugin is essentially a piece of code (usually a CUDA kernel and some C++ logic) that TensorRT can call to perform an operation it doesn’t recognize.
Here’s how you’d typically set it up:
-
Define the Plugin in C++: You’ll write a C++ class that inherits from
nvinfer1::IPluginV2Ext(or similar plugin interfaces). This class needs to implement several methods:initialize(): For any setup needed when the plugin is created.destroy(): For cleanup.getOutputDimensions(): To inform TensorRT about the shape and datatype of the plugin’s output.configurePlugin(): To validate input/output dimensions and allocate workspace.enqueue(): This is the core. It’s where you launch your custom CUDA kernel to perform the actual computation.clone(): To create a copy of the plugin.getPluginType(),getPluginVersion(): To identify your plugin.serialize(),deserialize(): For saving and loading the network.
Let’s say our
MyCustomOpperforms an element-wise addition with a learned bias.// Simplified example class MyCustomOpPlugin : public nvinfer1::IPluginV2Ext { public: // Constructor, destructor, clone, etc. nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputs) noexcept override { // For element-wise ops, output dims usually match input dims return inputs[0]; } int enqueue(int batchSize, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) noexcept override { // Get input/output tensor data pointers and dimensions const float* input_data = static_cast<const float*>(inputs[0]); float* output_data = static_cast<float*>(outputs[0]); const int num_elements = // ... calculate from input dimensions ... // Launch your custom CUDA kernel my_custom_add_kernel<<<blocks, threads, 0, stream>>>(input_data, bias_data, output_data, num_elements); return 0; // Success } // ... other required methods ... private: float* bias_data; // Assuming bias is a member variable }; -
Create a Plugin Registry and Factory: TensorRT uses a registry to find and instantiate plugins. You’ll need to register your plugin type and provide a factory that can create instances of your
MyCustomOpPlugin.// Simplified registry and factory class MyCustomOpPluginRegistry { public: static nvinfer1::IPluginV2* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) { // Parse fields from fc to configure the plugin (e.g., bias values) // ... return new MyCustomOpPlugin(/* initialized parameters */); } static void registerPlugin() { auto& builder = *nvinfer1::getPluginRegistry(); builder.registerCreator(MyCustomOpPluginCreator::get(), "com.example.MyCustomOp"); } }; -
Register Plugins with the Builder: Before building your TensorRT engine, you need to tell the
nvinfer1::IBuilderabout your custom plugins. This is often done by registering a customnvinfer1::IPluginRegistryor by directly registering your plugin creators.// In your TensorRT engine building code nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(logger); nvinfer1::INetworkDefinition* network = builder->createNetworkV2(0U); // ... parse ONNX or C++ network definition ... // Register your custom plugin factory auto& pluginRegistry = *nvinfer1::getPluginRegistry(); pluginRegistry.registerCreator(MyCustomOpPluginCreator::get(), "com.example.MyCustomOp"); // Assuming MyCustomOpCreator exists // ... build engine ... -
Provide Plugin Data During Deserialization: If you save a TensorRT engine that uses plugins, you need to provide the plugin implementations to the
IRuntimewhen you load the engine. You do this by creating a customnvinfer1::IPluginRegistryfor the runtime.// When loading an engine with plugins nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(logger); // Create a custom plugin registry to resolve your plugin class MyPluginResolver : public nvinfer1::IPluginRegistry { // Implement methods to return your plugin creator(s) public: const nvinfer1::IPluginCreator* getPluginCreator(const char* name, int version) const noexcept override { if (strcmp(name, "com.example.MyCustomOp") == 0) { return MyCustomOpPluginCreator::get(); } return nullptr; } // ... other methods ... }; MyPluginResolver resolver; runtime->setPluginRegistry(&resolver); // Tell runtime about your plugins // Load the engine nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine(serialized_engine.data(), serialized_engine.size());
The critical part is that TensorRT doesn’t learn a new operation. Instead, it delegates the execution of that operation to your custom plugin code. Your plugin must correctly define inputs, outputs, and perform the computation using CUDA kernels. The plugin system is a powerful escape hatch for operations that aren’t natively supported by TensorRT or the ONNX standard it consumes.
The next hurdle you’ll likely face is managing complex plugin configurations, especially when dealing with multiple custom ops or ops with many parameters that need to be serialized and deserialized correctly.