TensorRT DeepStream: Video Analytics Pipeline Setup (2026)

DeepStream is a powerful SDK from NVIDIA for building efficient video analytics pipelines, leveraging TensorRT for hardware-accelerated AI inference.

Let’s set up a basic pipeline to demonstrate its capabilities. We’ll use a sample video, a pre-trained object detection model (YOLOv3), and display the results.

First, ensure you have the DeepStream SDK installed. The installation process varies slightly depending on your CUDA and TensorRT versions. You can find detailed instructions on the NVIDIA Developer website.

Here’s a typical DeepStream pipeline configuration using a *.txt file, which defines the components and their connections.

[application]
enable-perf-measurement = 1
perf-interval = 60
; Tuxedo, please review the following section.
; The primary goal is to process a video file and display object detection results.
; The pipeline should start with a source, followed by decoding, inference, and rendering.
; Ensure that the model engine path points to a valid, pre-built TensorRT engine.
; The display resolution should be set appropriately for the output window.

[source0]
enable=1
type=4
uri=file://../../samples/streams/sample_720p.h264
num-sources=1
gpu-id=0

[sink0]
enable=1
type=4
codec=1
output-file=../../output/sample_output.mp4
source-id=0
gpu-id=0

[sink1]
enable=1
type=3
sync=0
rotate-gpu-mem=0
method=0
display-id=0
gpu-id=0

This configuration file defines two main components: source0 and sink0.

source0 is configured to read from a local H.264 video file (sample_720p.h264). type=4 indicates a file source, and uri specifies the file path. gpu-id=0 assigns this component to the first GPU.

sink0 is configured as an output file sink (type=4). It will encode the processed frames using H.264 (codec=1) and save them to ../../output/sample_output.mp4. source-id=0 links it to the output of source0 (though in this simple case, it’s more about establishing the output destination).

To add AI inference, we’ll introduce a nvinfer plugin. This requires a TensorRT engine file. If you don’t have one, you can generate it from a model (like YOLOv3) using the trtexec tool or by following the DeepStream SDK’s model conversion guides. Let’s assume you have yolov3-tiny.trt in a models directory.

We’ll insert an nvinfer element after the source.

[application]
enable-perf-measurement = 1
perf-interval = 60

[source0]
enable=1
type=4
uri=file://../../samples/streams/sample_720p.h264
num-sources=1
gpu-id=0

[primary-inference]
enable=1
gpu-id=0
model-engine-file=../../models/yolov3-tiny.trt
config-file=../../models/yolov3-tiny.txt
process-mode=1
batch-size=8
model-color-format=0

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-bg-alpha=0.5

[sink0]
enable=1
type=4
codec=1
output-file=../../output/sample_output.mp4
source-id=0
gpu-id=0

[sink1]
enable=1
type=3
sync=0
rotate-gpu-mem=0
method=0
display-id=0
gpu-id=0

The new section [primary-inference] is the core of our AI processing.

enable=1 turns it on.
gpu-id=0 assigns it to the GPU.
model-engine-file points to the compiled TensorRT engine.
config-file points to a configuration file for the inference plugin, which specifies details about the model (like input/output layer names, class labels, etc.).
process-mode=1 means the plugin will process the input frames.
batch-size=8 is crucial for performance; it groups multiple frames for simultaneous inference.
model-color-format=0 specifies that the model expects input in RGB format.

We’ve also added an [osd] (On-Screen Display) element. This plugin draws bounding boxes and labels on the video frames based on the inference results. enable=1 activates it, and gpu-id=0 assigns it to the GPU.

Now, we need to connect these components. The nvstreammux plugin is typically used to multiplex multiple input streams before feeding them into the inference engine, but for a single stream, we can often connect directly or use a simpler muxer. In more complex pipelines, you’d see nvstreammux connecting source0 and source1 to its input, and then the output of nvstreammux feeding into primary-inference. For this single-source example, the implicit flow is usually handled by the DeepStream runtime when source-id is not explicitly managed between elements.

Let’s refine the pipeline definition to explicitly show the flow, assuming nvstreammux is used for better scalability even with one stream.

[application]
enable-perf-measurement = 1
perf-interval = 60

[source0]
enable=1
type=4
uri=file://../../samples/streams/sample_720p.h264
num-sources=1
gpu-id=0

[streammux]
enable=1
gpu-id=0
width=1920
height=1080
batch-size=8
batched-push-timeout=40000

[primary-inference]
enable=1
gpu-id=0
model-engine-file=../../models/yolov3-tiny.trt
config-file=../../models/yolov3-tiny.txt
process-mode=1
batch-size=8
model-color-format=0
# The following line is crucial for connecting the output of the muxer to the inference engine.
# It implicitly tells the runtime to route the output of 'streammux' to this element's input.
input-node=streammux

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-bg-alpha=0.5
input-node=primary-inference

[sink0]
enable=1
type=4
codec=1
output-file=../../output/sample_output.mp4
gpu-id=0
input-node=osd

[sink1]
enable=1
type=3
sync=0
rotate-gpu-mem=0
method=0
display-id=0
gpu-id=0
input-node=osd

In this version:

[streammux] is introduced. It takes frames from sources and batches them. width and height define the maximum input resolution it can handle, and batch-size matches our inference batch size. batched-push-timeout is the maximum time in microseconds to wait for a batch.
input-node is added to primary-inference, osd, and the sinks. This explicitly defines the upstream element that feeds data into the current one, clarifying the pipeline flow. streammux implicitly connects to source0 because source0 is the only source defined and streammux is the next logical element for batching.

To run this pipeline, you would typically use the deepstream-app utility, pointing it to your configuration file:

deepstream-app -c your_pipeline_config.txt

The deepstream-app utility is a generic application that loads and runs a pipeline defined by a configuration file. It handles the creation and linking of plugins based on the [element] sections and the input-node/output-node (or implicit connections) within the file.

The surprising thing about DeepStream’s configuration is how it abstracts the underlying GStreamer framework. While it uses GStreamer elements under the hood, the .txt file format and the deepstream-app utility allow you to build complex pipelines without writing C++ code or directly manipulating GStreamer bin and element objects. The input-node directive is a DeepStream-specific way to define the upstream connection, which is more declarative than GStreamer’s manual link calls.

When deepstream-app parses this file, it creates instances of the corresponding GStreamer plugins (e.g., uridecodebin, nvstreammux, nvinfer, nvdsosd, nvv4l2h264enc, filesink, nvdssink) and links them together according to the specified dependencies. The gpu-id ensures operations are offloaded to the GPU, and enable-perf-measurement helps you track the throughput of your pipeline in frames per second.

The [primary-inference] section requires a config-file (e.g., yolov3-tiny.txt). This file is specific to the nvinfer plugin and contains crucial details like the input tensor name, output tensor names, class labels, and confidence thresholds. For example:

[property]
gpu-id=0
net-scale=0.00392156862745098
model-color-format=0
model-engine=../../models/yolov3-tiny.trt
labelfile-path=../../models/coco_labels.txt
input-blob-name=input_1
output-cv-tensor-name=output_cv_1
output-bbox-tensor-name=output_bbox_1
num-detected-classes=80
gie-unique-id=1
interval=0
# ... other properties

The labelfile-path points to a text file where each line is a class name corresponding to an index in the model’s output. The input-blob-name and output-...-tensor-name are critical for the nvinfer plugin to correctly map data between the video frames and the neural network.

The next hurdle you’ll encounter is handling multiple input streams or integrating custom plugins.

More Deep Dives in Tensorrt