Triton’s model repository can live on S3 or GCS, and it’s not just about saving disk space.

Let’s see Triton load a model from S3.

# Assuming you have a model named 'my_model' in S3 at 's3://my-bucket/models/'
tritonserver --model-repository s3://my-bucket/models/

And from GCS:

# Assuming you have a model named 'my_model' in GCS at 'gs://my-bucket/models/'
tritonserver --model-repository gs://my-bucket/models/

When you point Triton to s3://bucket-name/path/ or gs://bucket-name/path/, Triton doesn’t actually download the entire model repository to its local disk. Instead, it treats the remote storage as a direct filesystem. This means Triton’s internal mechanisms for discovering and loading models work seamlessly with S3 and GCS. When Triton needs to access a model file, it makes an API call to S3 or GCS to fetch just that specific file. This is crucial for large models or when running Triton on resource-constrained environments. The benefits are manifold:

  • Scalability: You can store a massive number of models without filling up the local disk of your Triton instances.
  • Centralization: A single source of truth for all your models, simplifying management and deployment.
  • Durability: Leveraging the inherent durability and availability of S3 and GCS.
  • Flexibility: Easily swap models by updating the S3/GCS repository without touching running Triton instances.

The model-repository parameter is the key. It accepts a URI that Triton understands. For S3, it’s s3://bucket-name/path/to/repository/. For GCS, it’s gs://bucket-name/path/to/repository/. Triton uses the configured AWS CLI or Google Cloud SDK credentials on the host machine to authenticate with the object storage. This means if aws s3 ls or gsutil ls works on your Triton server’s environment, Triton should be able to access the repository.

Internally, Triton uses the aws-sdk-cpp for S3 and google-cloud-cpp for GCS. When it needs to list models, it makes ListObjectsV2 (S3) or Objects.list (GCS) calls. When it needs to read a specific model file, it makes GetObject (S3) or Objects.get (GCS) calls. These operations are performed lazily. Triton doesn’t pre-fetch everything. It only fetches the metadata and files it needs when it needs them for loading or inferencing.

The structure of your model repository on S3/GCS must be the same as a local one. This means each model should have its own directory, containing the model files and a config.pbtxt file.

s3://my-bucket/models/
├── model_a/
│   ├── 1/
│   │   └── model.onnx
│   └── config.pbtxt
└── model_b/
    ├── 1/
    │   └── model.plan
    └── config.pbtxt

When Triton starts with --model-repository s3://my-bucket/models/, it will list the top-level directories (model_a, model_b) within s3://my-bucket/models/ to discover available models. Then, for each discovered model, it will look for version subdirectories (e.g., 1/) and the config.pbtxt file within that version directory.

The credentials for accessing S3 or GCS are handled by the underlying SDKs. For S3, this typically means IAM roles (if running on EC2/EKS), environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN), or the ~/.aws/credentials file. For GCS, it’s usually service account keys (often mounted as a file and referenced via GOOGLE_APPLICATION_CREDENTIALS environment variable) or the metadata server if running on GCE/GKE.

If you’re using a custom endpoint for S3-compatible storage, you can specify it using the AWS_ENDPOINT_URL environment variable. For example:

export AWS_ENDPOINT_URL="http://my-s3-compatible-storage.local:9000"
tritonserver --model-repository s3://my-bucket/models/

This tells the aws-sdk-cpp to point to your custom endpoint instead of the default AWS S3 endpoints.

When you update a model file in S3 or GCS, Triton’s model repository manager will eventually detect the change. For S3, Triton polls the bucket periodically. For GCS, it also polls. The frequency of these checks is configurable, but by default, Triton will re-examine the repository. If a model’s config.pbtxt changes or a new version directory appears, Triton will attempt to load the updated model. You can also explicitly trigger a model repository update by sending an HTTP POST request to the Triton server’s management endpoint:

curl -X POST http://localhost:8000/v2/models/my_model/reload?config_version=1

Or for a full repository scan:

curl -X POST http://localhost:8000/v2/repository/models/my_model/load

The exact behavior of how Triton detects changes on S3/GCS can depend on the underlying SDK versions and the specific object storage configuration. However, the general principle is that Triton will periodically poll the remote storage for updates or can be signaled to perform an immediate refresh.

One subtle point is that while Triton treats S3/GCS like a filesystem, it’s still an object store. This means operations like directory listing can have different performance characteristics compared to a local filesystem, especially for very large numbers of objects. Also, if you need to perform operations that require atomic updates or file locking (like writing model state during inference), S3/GCS might not be suitable directly; you’d typically manage model versions by uploading new files and then updating the config.pbtxt or simply relying on Triton’s versioning and loading mechanisms.

The next hurdle you’ll likely encounter is managing model versioning and updates effectively in a distributed or dynamic environment, especially when dealing with frequent model deployments.

Want structured learning?

Take the full Triton course →