Skip to content
Agentic RAG - Component Choices

Agentic RAG - Component Choices

1. Objective

This document outlines a complete technical strategy and architecture for implementing an Agentic Retrieval-Augmented Generation (RAG) system. The final chosen technology stack comprises:

  • Flowise for no-code multimodal workflow orchestration.
  • SigLIP (timm/ViT-B-16-SigLIP2-256) for multimodal embeddings.
  • Qdrant for vector database management and semantic retrieval.
  • DeepSeek 3.1 LLM for multimodal reasoning and text generation.
  • n8n for automation and orchestration of backend workflows.

The system is optimized for local hardware: a desktop with AMD 12-core CPU, 64 GB RAM, and an Nvidia RTX 2080 Super GPU (8 GB VRAM), running Garuda Linux.

2. Data Profile

  • Total dataset size: ~5 TB
  • Data types: Text (DOCX, Markdown), Images (PNG, JPG), Videos, XLSX, PDFs, complex multimodal documents.

3. Embedding Strategy

The selected embedding model is timm/ViT-B-16-SigLIP2-256, an open-source multimodal model optimized for cross-modal retrieval tasks.

Embedding Process

  • Textual data (DOCX, Markdown, XLSX): Convert into plain text segments, embedding directly via SigLIP.
  • Images: Direct embedding with SigLIP image encoder.
  • Videos: Extract keyframes (1 every 10–30 seconds) and embed each frame with SigLIP image encoder.
  • Complex documents (PDF, DOCX with visuals): Render visually-rich pages as images and embed them via SigLIP, in addition to embedding the textual content separately.

Advantages of SigLIP ViT-B-16-256

  • Efficient 256-dimensional vector size reduces storage and computational overhead.
  • Superior cross-modal embedding quality, enabling effective multimodal search.
  • Optimized for local hardware, ensuring fast inference and low GPU VRAM consumption.

4. Vector Database: Qdrant

Key Reasons for Choosing Qdrant

  • Highly efficient local performance via scalar quantization and disk-based vector storage.
  • Robust HNSW indexing, optimized for large-scale datasets (5TB).
  • Excellent Flowise integration.

Qdrant Structure

  • Separate collections for different modalities (e.g., texts, images, videos).
  • Metadata-rich records, including original filenames, timestamps, modality type, and semantic labels.
  • Scalar quantization to optimize RAM usage and speed up searches.

5. Large Language Model (LLM): DeepSeek 3.1

Role in Architecture

  • Performs complex multimodal reasoning tasks based on retrieved content.
  • Provides generation of human-like, detailed answers and supports visual question-answering.

Integration with Stack

  • Direct integration via Flowise pipelines.
  • Accepts context retrieved from Qdrant embeddings, providing insightful responses.

6. Workflow Automation: n8n

Automation Capabilities

  • Data ingestion and embedding pipelines.
  • Scheduled embedding updates and Qdrant data synchronization.
  • Automated backup and maintenance tasks.

Advantages

  • No-code approach aligning perfectly with Flowise.
  • Easy automation of repetitive workflows, simplifying operations significantly.

7. Workflow Orchestration: Flowise

Role in System

  • Provides no-code orchestration for embedding processes, Qdrant integration, and DeepSeek LLM response workflows.
  • Visual management and intuitive pipeline creation.

8. Complete Hardware Setup

  • CPU: AMD 12-core
  • RAM: 64 GB
  • GPU: Nvidia RTX 2080 Super (8 GB VRAM)
  • OS: Garuda Linux

Performance Expectations

  • Local inference of SigLIP embeddings within milliseconds.
  • Efficient batch embedding, leveraging GPU parallelism and CPU optimization.
  • Comfortable handling of large-scale multimodal dataset.

9. Comparison with Alternative Solutions

  • OpenAI Embedding API: Slightly better accuracy, but significantly higher cost, higher latency, lower privacy.
  • Weaviate vs. Qdrant: Qdrant provides better local performance, lower resource demands, simpler integration, and greater multimodal embedding efficiency.
  • Higher Dimension Embeddings (384 or 512): Higher dimensional embeddings provide marginal accuracy improvements at the expense of performance and scalability. The chosen 256-dimensional embeddings balance accuracy and efficiency optimally.

10. MacBook Pro M4 Performance Comparison

  • MacBook Pro M4 (48 GB Unified Memory) could perform embedding tasks faster and with higher batch throughput due to unified memory and efficiency of Apple Silicon.
  • For desktop workloads, current RTX 2080 Super is sufficient, though the M4 MacBook could be superior for portable embedding workflows.

11. Privacy and Compliance

  • Local-only setup ensures complete data privacy and full compliance with data protection regulations.
  • No external API calls or cloud dependencies, thus maintaining stringent data security.

12. Final Technical Stack

The final technology stack of Flowise + SigLIP (timm/ViT-B-16-SigLIP2-256) + Qdrant + DeepSeek 3.1 + n8n is strongly recommended, providing optimal performance, scalability, ease-of-use, cost efficiency, and data privacy for the multimodal Agentic RAG system implementation.