Qdrant Deployment Guide: Setup and Configuration (2026)

Qdrant can store up to 100x more vectors in RAM than you might expect, given its memory usage.

Let’s get Qdrant up and running. We’ll start with a basic Docker Compose setup and then explore some key configuration options.

version: '3.9'

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - ./qdrant_storage:/qdrant/storage
    environment:
      - QDRANT_CLUSTER_NAME=my-qdrant-cluster
      - QDRANT_NODE_NAME=qdrant-node-1
      - QDRANT_TELEMETRY_ENABLED=true
      - QDRANT_LOG_LEVEL=INFO

To start this, save the above as docker-compose.yml and run docker-compose up -d. You should see the Qdrant container starting.

You can interact with Qdrant using its REST API or gRPC. For a quick check, let’s use curl to see if it’s alive:

curl http://localhost:6333/health

This should return {"status":"ok"}.

Now, let’s talk about configuration. The environment section in docker-compose.yml sets various parameters.

QDRANT_CLUSTER_NAME: Identifies your Qdrant cluster. Essential if you plan to run multiple Qdrant nodes for high availability or sharding.
QDRANT_NODE_NAME: A unique name for this specific Qdrant instance within the cluster.
QDRANT_TELEMETRY_ENABLED: Setting this to true sends anonymized usage data to the Qdrant team, helping them improve the product. You can disable this by setting it to false.
QDRANT_LOG_LEVEL: Controls the verbosity of Qdrant’s logs. INFO is standard for production. Other options include DEBUG, WARN, and ERROR.

The volumes section mounts a local directory ./qdrant_storage to /qdrant/storage inside the container. This is where Qdrant will persist its data, including your collections, points, and payload. If you stop and remove the container (docker-compose down), your data will remain in ./qdrant_storage and will be reattached when you start it again.

Qdrant’s performance and memory footprint are heavily influenced by how it stores vectors. By default, Qdrant uses memory mapping for its vector storage. This means that the operating system manages loading parts of the vector data from disk into RAM as needed. This is incredibly efficient, allowing you to load datasets that might be larger than your available RAM, as only the actively used portions of the vectors are loaded.

Consider the QDRANT_WRITE_BATCH_MAX_SIZE and QDRANT_WRITE_BATCH_MAX_TIME environment variables. These control how Qdrant batches incoming write operations. A larger QDRANT_WRITE_BATCH_MAX_SIZE (default is 1000) means Qdrant will accumulate more points before writing them, potentially improving write throughput but increasing latency. QDRANT_WRITE_BATCH_MAX_TIME (default is 1000ms) sets a timeout for batches, ensuring that even if the batch size isn’t reached, data is written periodically. Tuning these can balance write speed against real-time availability of new data.

For production, you’ll want to configure persistence and potentially sharding. The QDRANT_PERSIST_ENABLE option (defaults to true) ensures data is saved. If you disable it, Qdrant will only store data in memory, losing everything on restart. The QDRANT_PERSIST_INTERVAL_SECONDS (defaults to 60) defines how often Qdrant snapshots its in-memory state to disk. A lower interval means less data loss in case of a crash but more disk I/O.

When deploying for high availability or to handle large datasets, you’ll configure Qdrant for clustering. This involves setting up multiple Qdrant nodes and defining a consensus mechanism. For example, you might have QDRANT_RAFT_ENABLED=true to enable Raft consensus for state replication. In a clustered setup, you’d also configure QDRANT_RAFT_VOTING_NODES to point to the initial set of nodes in the cluster. The QDRANT_RAFT_AUTO_JOIN=true setting can simplify cluster bootstrapping by allowing new nodes to automatically join an existing cluster.

The QDRANT_INDEX_PARAMS configuration is crucial for search performance. While Qdrant offers out-of-the-box performance, you can fine-tune indexing strategies. For instance, you can specify quantization_config within your collection’s payload to reduce memory usage and speed up searches, especially for very high-dimensional vectors. This involves training a quantization model on a subset of your data to represent vectors with fewer bits, at the cost of some accuracy.

The most subtle aspect of Qdrant’s memory management is its use of memory-mapped files for vector storage. Unlike a traditional in-memory database where all data must fit into RAM, Qdrant leverages the OS’s virtual memory system. This allows Qdrant to address a dataset much larger than physical RAM. When a vector is needed for a search or retrieval, the OS page cache brings the relevant data from disk into RAM. If RAM is scarce, the OS will evict less-recently-used pages. This means that while Qdrant can access vast amounts of vector data, performance degrades significantly as the working set of vectors exceeds available RAM, because the system starts heavily relying on disk I/O for vector lookups.

Once you have Qdrant running and configured, the next logical step is to explore advanced indexing techniques like HNSW tuning for specific recall/latency trade-offs.

More Deep Dives in Vector Databases