MCP and MCP Servers - Sudo-Samurai

The **Model Context Protocol (MCP)** is a relatively new concept in the realm of large language models (LLMs) that focuses on how models interact with **external tools, APIs, agents, and memory systems** in a structured and standardized way. It’s essentially a **communication protocol** for managing the context that surrounds a model and enables **persistent, tool-augmented reasoning** across different sessions or tasks. --- ## 🧠 What is MCP (Model Context Protocol)? At a high level, **MCP** is a **standardized way to manage and exchange context** between an AI model (like GPT-4) and a set of tools, memories, and user interfaces. It allows: - **Stateful interactions**: the model can recall past actions, user preferences, memory, and tool outputs. - **Composition of multiple tools**: like search, databases, web browsers, vector stores, file systems, and more. - **Interoperability**: making models more modular and easily integrated into different applications and environments. ### 🎯 Goals of MCP: 1. **Context Management**: How can LLMs keep track of what’s going on across different interactions or tools? 2. **Tool Invocation**: How do LLMs invoke external tools in a structured way (APIs, DBs, vector stores)? 3. **Agentic Behavior**: How do LLMs become more like _thinking agents_ that act in steps, remember things, and learn? --- ## 🖥️ What is an MCP Server? An **MCP Server** is a backend system that acts as a **middleware** between the LLM and the tools/memory systems. It: - **Manages context**: Stores and retrieves memory or previous interactions. - **Routes tool calls**: The model can call tools via the server using structured messages (like JSON). - **Logs and audits interactions**: For debugging, introspection, or reproducibility. - **Enables persistent state** across sessions. Think of the MCP Server as the "brain stem" that keeps the AI's memories, tools, and capabilities wired together. --- ## 🧰 Practical Example: AI Assistant with MCP Let’s say you’re building a **custom AI personal assistant** for Mangesh that: 1. Answers questions from a private PDF knowledge base (via Qdrant). 2. Automates workflows via n8n. 3. Writes blog posts and schedules them in WordPress. 4. Remembers user preferences (coffee brewing tips, favorite LLM settings). 5. Talks to a calendar API. Without MCP: - Each component needs to be glued together manually. - Context is ephemeral—once the chat is closed, all memory is lost. - Tool invocation is ad hoc (via plugins or custom API calls). With MCP: - The assistant knows the tools available (`Qdrant`, `n8n`, `calendar`, `wordpress`). - The assistant sends a structured `tool_call` request to fetch data from Qdrant. - It updates the user's memory via `memory.update`. - Next time you ask about espresso, it recalls you use a Lelit Bianca V3 with medium roast at 93°C. - MCP Server logs all tool calls, memory updates, and context transitions. 🧩 It’s like giving your AI both a **short-term memory (for reasoning)** and a **long-term memory (for persistence)**, with the ability to **call APIs** as easily as thinking. --- ## 🛠️ Protocol Format (Simplified) ```json { "role": "tool_call", "name": "search_documents", "args": { "query": "dialing in espresso", "top_k": 3 } } ``` The model can also use: - `memory.get`, `memory.update` - `tool_call`, `tool_response` - `observe` (to introspect its own past behavior) These structured calls go to the MCP Server which interprets and executes them. --- ## 🧪 Smithery: Tools for MCP **[Smithery](https://smithery.tools/)** is an open-source project by [Phillip Wang](https://twitter.com/philipn_w) and community that: - Implements an **MCP-compliant server**. - Supports **agentic workflows** (multi-step plans). - Manages **tool definitions, memory, sessions**. - Allows **local or cloud deployment**. ### Features of Smithery: - Fully open-source and self-hostable. - Comes with built-in tools (browser, shell, vector DBs). - Supports **tool composition**: Tools can call other tools recursively. - Great for **building personal AI agents**, especially when paired with: - Ollama (for running LLMs locally), - LangChain or LlamaIndex (for orchestration), - n8n or Airflow (for automation). ### Example Use Case with Smithery: You're building an Agentic RAG system with: - Flowise frontend (your RAG UI), - Qdrant backend (your vector DB), - DeepSeek LLM (your inference engine), - and n8n for automations. With Smithery: - You register Qdrant and n8n as tools in MCP. - Your LLM runs via Ollama or LM Studio locally. - The agent plans multi-step actions: - Search Qdrant ➝ Summarize ➝ Create automation task ➝ Send email via n8n. - Each of these steps is coordinated via the MCP server. --- ## 📝 Section Summary |Concept|Description| |---|---| |**MCP**|A protocol to manage context, tool use, memory, and reasoning across LLM sessions.| |**MCP Server**|The backend that stores memory, executes tool calls, and maintains agent state.| |**Smithery**|A fully open-source implementation of an MCP server with built-in agents and tool support.| --- ## ✅ **Where the MCP Server runs** (Local vs Remote) If you're using **Smithery**, then: 🖥️ **Yes, the MCP Server runs locally** (by default). It acts as the local brain/bridge that your LLM connects to and communicates through using the **MCP protocol**. So **every LLM invocation or interaction** goes through this local MCP Server for: - Tool routing (`tool_call`) - Memory access (`memory.get`, `memory.update`) - Context logging - Multi-step reasoning > Think of Smithery as **your local hub** that sits between your LLM (Ollama, LM Studio, or OpenAI) and your external tools (Qdrant, n8n, shell, browser, etc). --- ## 🧩 Smithery Architecture (Simplified) ```plaintext ┌────────────┐ MCP ┌────────────┐ │ You (LLM) │◄───────────►│ MCP Server │───► tools/memory/db └────────────┘ protocol └────────────┘ ``` - You interact with the **LLM interface** (can be terminal, GUI, browser, or agent framework). - The LLM sends **MCP-formatted messages** to the **Smithery MCP server**. - Smithery then: - Routes requests to tools (`tool_call`) - Fetches/stores memories (`memory.*`) - Orchestrates multi-step actions if the LLM is in "agent mode" - Finally, it sends the **tool responses** back to the LLM for reasoning or continuation. --- ## 🕸️ Does it _have_ to be local? Not at all. You **can also deploy Smithery remotely**, like: - On a cloud VM - In Docker on a private server - Even containerized inside your Flowise or n8n stack If deployed remotely, your model/agent **connects to the MCP Server's endpoint** via HTTP/WebSocket and uses the same MCP protocol. But for your use case (Agentic RAG + Flowise + n8n + Qdrant), running Smithery locally is perfectly fine—and gives you **low latency and full data control**. --- ## 🔁 Per-Invocation Connection? **Yes and no**: - The **LLM doesn't have to reconnect for every single tool call**, as long as the **session is active** and the LLM is running in agent mode or under a persistent wrapper. - But behind the scenes, **each `tool_call`, `memory.get`, `memory.update`** is a discrete request to the MCP server, even in a persistent session. - If you’re using a framework like LangGraph or LangChain Agents, they manage this under the hood. --- ## ⚙️ Example (Local Dev Setup) If you're running: ```bash smithery server ``` It starts the local MCP server on something like `http://localhost:8000`. Then your LLM agent (e.g., via Flowise or a Python script) connects and sends MCP JSON messages to this local server to: - Access memories (`get`, `update`) - Invoke tools (`tool_call`) - Track context You’re now in agent territory 😎. --- ## 📝 Section Summary |Question|Answer| |---|---| |Is the MCP server running locally in Smithery?|✅ Yes, by default.| |Do I connect to it for each invocation?|✅ Yes, each tool/memory invocation routes through the MCP server.| |Can it be remote?|✅ Yes, if you deploy it on a VM or container.| |Is it required to be always running?|✅ Yes, for agentic behavior and memory/tools to work properly.| --- That’s a great question, Mangesh — comparing **MCP Servers** and **REST APIs** is like comparing a **"thinking agent's brainstem"** to **individual service terminals**. They both route information, but their roles and capabilities are fundamentally different. Let’s break it down clearly: --- ## 🧠 MCP Server vs 🌐 REST APIs |Feature|**MCP Server**|**REST APIs**| |---|---|---| |**Purpose**|Manages the **context**, memory, and tool usage of an AI agent|Exposes **resources and services** for client-server communication| |**Client**|Typically an **LLM or agent** that communicates via structured messages|Any HTTP client (browser, app, curl, etc.)| |**Communication Format**|Structured **MCP protocol messages** (JSON with roles like `tool_call`, `memory.update`)|HTTP methods: `GET`, `POST`, `PUT`, `DELETE`| |**State Management**|Designed to manage **contextual state and memory across sessions**|Generally **stateless** (each request is independent)| |**Tool Invocation**|Central to its design — the MCP Server **orchestrates and dispatches** tool calls|Tools/APIs are independent, no orchestration layer| |**Memory Handling**|Supports `memory.get`, `memory.update` — think persistent agent memory|Not built-in; memory must be externally managed (e.g., DB)| |**Multi-tool Composition**|Supports tool chaining and agentic behavior (e.g., search → summarize → email)|Composition must be coded manually by client or backend| |**Designed For**|**Agentic LLMs**, AI assistants, autonomous workflows|General-purpose **web APIs** and microservices| |**Orchestration**|Yes — can decide which tools to use in what order|No — you have to orchestrate calls manually| |**Example Tools**|Qdrant, n8n, shell, file system, browser, calendar APIs|Stripe, Twitter API, weather API, any web service| --- ## 📦 Metaphor to Understand the Difference - **REST APIs**: Think of them as individual **counters in a government office**. Each does one thing—passport, ID card, birth certificate—and you (the client) must know which to go to, in what order. - **MCP Server**: Think of it as an **AI-powered concierge** who knows your case, remembers past visits, decides which counters to go to, and fills the forms for you. --- ## ✅ Where They Intersect An **MCP Server actually calls REST APIs** as tools. ### Example: Let’s say your LLM agent needs to schedule a meeting. 1. The LLM sends a `tool_call` to `calendar_api`. 2. The MCP Server routes it to a REST API like Google Calendar. 3. The result is returned to the LLM, possibly stored in memory, and used for next reasoning step. So: ➡️ **MCP Server uses REST APIs as its tools**. But it adds memory, state, orchestration, and agentic reasoning on top. --- ## 🔁 Can I use REST APIs without MCP? Absolutely — you’ve probably done that 100 times with Python scripts, Postman, or JS fetch calls. But without MCP: - You must manage context and state yourself. - No agentic chaining unless you manually script it. - Memory must be implemented separately. With MCP: - You delegate that complexity to the MCP server. - Your LLM becomes **intelligent and persistent**, capable of using tools like a human would. Awesome. Let’s walk through a **concrete example** of an LLM agent using an MCP Server (like Smithery) to call a REST API tool. --- ## 🧠 Scenario: You want your agent to **get the current weather** using a public weather API (like Open-Meteo or WeatherAPI). You’ve registered this weather API as a **tool** in your MCP server (Smithery), like this: ```json { "name": "get_weather", "description": "Fetches current weather info for a city.", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "City name to fetch weather for" } }, "required": ["city"] } } ``` --- ## 🔁 Step-by-Step Agentic Interaction ### 🧠 Step 1: LLM decides to use a tool The model receives a user message like: > "What's the weather in Pune right now?" It then produces a **`tool_call`** using the MCP protocol: ```json { "role": "tool_call", "name": "get_weather", "args": { "city": "Pune" } } ``` This message is sent to the **MCP Server (Smithery)**. --- ### 🧠 Step 2: MCP Server executes tool call The MCP Server receives the tool call and routes it to your **tool handler** — which internally calls a REST API: ```http GET https://api.open-meteo.com/v1/forecast?current_weather=true&city=Pune ``` It parses the JSON response, for example: ```json { "temperature": 32, "weathercode": 3, "windspeed": 12.3 } ``` And returns a **`tool_response`** back to the model: ```json { "role": "tool_response", "name": "get_weather", "content": { "temperature": 32, "condition": "Partly Cloudy", "windspeed": 12.3 } } ``` --- ### 🧠 Step 3: LLM responds to the user Now the LLM receives that tool response and generates a reply to the user: > "The current temperature in Pune is 32°C with partly cloudy skies and wind speed around 12.3 km/h." --- ## 🛠️ Behind the scenes |Component|Role| |---|---| |**LLM**|Thinks, decides, generates `tool_call`| |**MCP Server (Smithery)**|Routes call to the actual API tool, manages memory/logs| |**REST API**|Delivers the raw weather data| |**LLM (again)**|Synthesizes a natural response using tool output| --- ## 🤖 Bonus: Add Memory Let’s say the user now asks: > “How does it compare to yesterday?” If you had previously stored weather results in the **MCP memory**, the LLM could call: ```json { "role": "memory.get", "args": { "key": "weather_yesterday_pune" } } ``` Compare it with today's data, and reply intelligently. --- ## 🚀 Summary Flow ```plaintext User ➝ LLM ➝ [tool_call JSON] ➝ MCP Server ➝ REST API ⬑──────────────⬅︎ [tool_response JSON] ⬑ ⬑──── LLM formats reply ➝ User ``` --- ## 🧩 Section Summary (TL;DR) |MCP Server|REST API| |---|---| |Orchestrates context, memory, and tools for agents|Serves data or services on demand| |Used by LLMs to act like agents|Used by developers and clients| |High-level "agent brain"|Low-level "service endpoint"| --- ## 🔚 Final Thoughts MCP is part of a larger movement to make LLMs **less like chatbots** and more like **thinking agents**—autonomous, stateful, context-aware, and tool-augmented. It's especially powerful when paired with self-hosted setups like Smithery, n8n, and Flowise, giving you full control over your AI workflows.