TensorRT Multi-Instance GPU: MIG Partitioning Setup (2026)

MIG allows you to carve up a single GPU into smaller, isolated instances, each with its own dedicated compute, memory, and cache.

Let’s see this in action. Imagine we have an NVIDIA A100 GPU. We want to create two separate GPU instances from it.

First, we need to check the current MIG mode and partition status.

nvidia-smi -L

This command lists all the GPUs and their associated MIG devices. If MIG is not enabled, you’ll see something like:

GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

If MIG is enabled, you might see something like:

GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
MIG 100: NVIDIA A100-SXM4-40GB MIG 1 Instance (UUID: MIG-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy)
MIG 101: NVIDIA A100-SXM4-40GB MIG 1 Instance (UUID: MIG-zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz)

To enable MIG mode, we use nvidia-smi with the --enable-mig flag. This is a disruptive operation; it will reset the GPU.

sudo nvidia-smi -i 0 --enable-mig 1

After enabling MIG, the GPU will be in a state where it can be partitioned. Now, we need to create specific MIG instance configurations. MIG instances are defined by their compute capability (e.g., 7.0, 8.0) and the amount of GPU memory they should have. The available instance sizes are predefined. For an A100, common instance types might be 2g.5gb (2 GI of GPU memory) or 3g.10gb (3 GI of GPU memory). You can list the available GPU instance profiles for a given GPU with:

nvidia-smi mig -i 0 --list-gpu-instance-profiles

Let’s say we want to create two instances from GPU 0: one with 2 GiB of memory and another with 5 GiB. We’ll use the nvidia-smi mig command for this.

First, create a GPU instance. This is the logical container for the MIG device.

sudo nvidia-smi mig -i 0 --create-gpu-instance 1 --gpu-instance-profile 2g.5gb

This command creates a GPU instance named 1 on GPU 0 using the 2g.5gb profile.

Next, create a compute instance within that GPU instance. A GPU instance can host multiple compute instances, but for simplicity, we’ll create one compute instance per GPU instance.

sudo nvidia-smi mig -i 0 --create-compute-instance 1 --gpu-instance 1

This creates a compute instance named 1 within GPU instance 1. This compute instance is what your applications will see as a distinct GPU device.

Repeat for the second instance:

sudo nvidia-smi mig -i 0 --create-gpu-instance 2 --gpu-instance-profile 3g.10gb
sudo nvidia-smi mig -i 0 --create-compute-instance 2 --gpu-instance 2

After creating these instances, running nvidia-smi -L again will show your new MIG devices:

GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
MIG 100: NVIDIA A100-SXM4-40GB MIG 1 Instance (UUID: MIG-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy)
MIG 101: NVIDIA A100-SXM4-40GB MIG 1 Instance (UUID: MIG-zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz)

You can now see two MIG devices. Their UUIDs are what you’d use in your application frameworks (like PyTorch or TensorFlow) to select a specific GPU.

The core problem MIG solves is resource contention and noisy neighbors. Without MIG, multiple applications sharing a single GPU might interfere with each other’s performance due to shared L2 cache or memory bandwidth. By isolating resources, MIG guarantees a predictable performance baseline for each instance.

The key to understanding MIG partitioning lies in the hierarchy: GPU -> GPU Instance -> Compute Instance. A physical GPU can be divided into multiple GPU Instances, and each GPU Instance can host one or more Compute Instances. However, for maximum isolation, it’s common to have one Compute Instance per GPU Instance. The nvidia-smi mig commands manage this structure.

When you create a compute instance, you’re essentially defining a slice of the GPU’s resources (SMs, L2 cache, memory bandwidth) that will be exclusively dedicated to that instance. The system ensures that these resources are not shared with other compute instances on the same physical GPU. This isolation is enforced at the hardware level by the GPU’s architecture.

When you create a compute instance, you are not just allocating memory; you are also allocating a portion of the streaming multiprocessors (SMs) and the associated L2 cache slices. The gpu-instance-profile dictates the ratio of these resources. For instance, a 2g.5gb profile might allocate a certain number of SMs and a proportional amount of L2 cache and memory bandwidth, while 3g.10gb would allocate more of each. The exact mapping of SMs and cache to profiles is determined by the GPU architecture and driver.

To remove MIG instances and revert the GPU to its default, non-MIG mode, you first destroy the compute instances, then the GPU instances, and finally disable MIG mode.

sudo nvidia-smi mig -i 0 --destroy-compute-instance 1
sudo nvidia-smi mig -i 0 --destroy-gpu-instance 1
sudo nvidia-smi mig -i 0 --destroy-compute-instance 2
sudo nvidia-smi mig -i 0 --destroy-gpu-instance 2
sudo nvidia-smi -i 0 --enable-mig 0

The next challenge you’ll encounter is how to programmatically manage these MIG instances, especially in dynamic environments like Kubernetes.

More Deep Dives in Tensorrt