Here are detailed instructions to install Qdrant using Docker Compose, ensuring optimal performance, security, data persistence outside the container, and additional recommendations.
---
### Step 1: Prepare Directory Structure
Create the necessary directory structure for Qdrant:
```bash
mkdir -p /home/sudo-samurai/Documents/projects/llm-tools/qdrant/{config,data,snapshots}
cd /home/sudo-samurai/Documents/projects/llm-tools/qdrant
```
### Directory Explanation:
- `config`: Qdrant configuration files (optional/customized configurations).
- `data`: Persistent storage of Qdrant data.
- `snapshots`: Store Qdrant snapshots/backups (recommended).
---
### Step 2: Docker Compose File
Create a file named `docker-compose.yml` in the project root directory with the following content:
```yaml
version: '3.9'
services:
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
restart: unless-stopped
ports:
- "6333:6333" # REST API port
- "6334:6334" # gRPC API port
environment:
QDRANT__STORAGE__STORAGE_PATH: "/qdrant/storage"
QDRANT__STORAGE__SNAPSHOT_PATH: "/qdrant/snapshots"
QDRANT__LOG_LEVEL: "INFO"
QDRANT__SERVICE__GRPC_MAX_MESSAGE_SIZE: "104857600" # 100 MB for large requests
QDRANT__SERVICE__MAX_REQUEST_SIZE_MB: "100"
QDRANT__STORAGE__ON_DISK_PAYLOAD: "true" # For larger payloads, enhances performance
QDRANT__STORAGE__OPTIMIZERS__MEMMAP_THRESHOLD: "5242880" # 5 MB, efficient mmap usage
volumes:
- ./data:/qdrant/storage
- ./snapshots:/qdrant/snapshots
deploy:
resources:
limits:
cpus: '4'
memory: '16G'
reservations:
cpus: '2'
memory: '8G'
```
---
### Step 3: Adjust System Settings (Performance Optimization)
Given your large dataset (~5TB, multi-modal), optimize system and kernel settings for better performance.
#### Recommended OS-Level Optimizations:
Add the following kernel parameters to `/etc/sysctl.conf` to enhance performance:
```bash
sudo nano /etc/sysctl.conf
```
Append these lines:
```bash
# Network performance optimizations
net.core.somaxconn = 1024
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535
fs.file-max = 1000000
# Memory and swap management (for improved responsiveness)
vm.swappiness = 10
vm.dirty_ratio = 20
vm.dirty_background_ratio = 10
```
Apply the settings:
```bash
sudo sysctl -p
```
#### Adjust System ulimits (Critical for heavy I/O and large datasets):
Edit `/etc/security/limits.conf` and append:
```bash
* soft nofile 1000000
* hard nofile 1000000
root soft nofile 1000000
root hard nofile 1000000
```
Log out and back in to apply these limits.
---
### Step 4: Security Considerations
- **Firewall**: Limit port access to required IPs or internal network. For example, if using `ufw`:
```bash
sudo ufw allow from 192.168.48.0/24 to any port 6333 proto tcp # REST API
sudo ufw allow from 192.168.48.0/24 to any port 6334 proto tcp # gRPC API
sudo ufw reload
```
- **Authentication**:
Qdrant by default has no authentication layer. To add security, it's recommended to run it behind a reverse proxy like Nginx with SSL (HTTPS). Here's a simple recommended setup:
**Example Nginx configuration:**
```nginx
server {
listen 443 ssl;
server_name qdrant.yourdomain.com;
ssl_certificate /etc/nginx/ssl/qdrant.crt;
ssl_certificate_key /etc/nginx/ssl/qdrant.key;
location / {
proxy_pass http://localhost:6333;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
```
---
### Step 5: Start Qdrant Container
Start your Qdrant service:
```bash
docker compose up -d
```
To check logs and status:
```bash
docker logs -f qdrant
```
---
### Step 6: Verify Installation & Functionality
Access the REST API endpoint:
```bash
curl http://localhost:6333
```
Access the gRPC endpoint (use tools like `grpcurl`):
```bash
grpcurl -plaintext localhost:6334 qdrant.Collections/List
```
---
### Additional Recommendations for Performance and Compatibility:
- **SSD Storage**: For multi-TB datasets, ensure data directory (`data`) is on a fast SSD or NVMe for significantly improved read/write performance.
- **Backup Strategy**: Regular snapshots and backups are essential:
- Use the Qdrant snapshots API regularly (`/snapshots` endpoint).
- Automate backups to cloud storage or external NAS.
- **Monitoring**:
- Deploy Prometheus/Grafana to track Qdrant performance metrics.
- Enable Docker container monitoring to preemptively detect resource bottlenecks.
- **Resource Management**:
- Increase RAM and CPU allocation based on load testing results. Consider initially allocating more than the minimum recommended values and adjusting according to actual usage.
- **Scaling & Cluster Setup**:
- Given your large data size, evaluate Qdrant's clustering mode. Horizontal scaling (Qdrant Cluster) may become necessary for best performance and fault tolerance.