MCP and MCP Servers
The Model Context Protocol (MCP) is a relatively new concept in the realm of large language models (LLMs) that focuses on how models interact with external tools, APIs, agents, and memory systems in a structured and standardized way. It’s essentially a communication protocol for managing the context that surrounds a model and enables persistent, tool-augmented reasoning across different sessions or tasks.
π§ What is MCP (Model Context Protocol)?
At a high level, MCP is a standardized way to manage and exchange context between an AI model (like GPT-4) and a set of tools, memories, and user interfaces.
It allows:
- Stateful interactions: the model can recall past actions, user preferences, memory, and tool outputs.
- Composition of multiple tools: like search, databases, web browsers, vector stores, file systems, and more.
- Interoperability: making models more modular and easily integrated into different applications and environments.
π― Goals of MCP:
- Context Management: How can LLMs keep track of what’s going on across different interactions or tools?
- Tool Invocation: How do LLMs invoke external tools in a structured way (APIs, DBs, vector stores)?
- Agentic Behavior: How do LLMs become more like thinking agents that act in steps, remember things, and learn?
π₯οΈ What is an MCP Server?
An MCP Server is a backend system that acts as a middleware between the LLM and the tools/memory systems. It:
- Manages context: Stores and retrieves memory or previous interactions.
- Routes tool calls: The model can call tools via the server using structured messages (like JSON).
- Logs and audits interactions: For debugging, introspection, or reproducibility.
- Enables persistent state across sessions.
Think of the MCP Server as the “brain stem” that keeps the AI’s memories, tools, and capabilities wired together.
π§° Practical Example: AI Assistant with MCP
Let’s say you’re building a custom AI personal assistant for Mangesh that:
- Answers questions from a private PDF knowledge base (via Qdrant).
- Automates workflows via n8n.
- Writes blog posts and schedules them in WordPress.
- Remembers user preferences (coffee brewing tips, favorite LLM settings).
- Talks to a calendar API.
Without MCP:
- Each component needs to be glued together manually.
- Context is ephemeralβonce the chat is closed, all memory is lost.
- Tool invocation is ad hoc (via plugins or custom API calls).
With MCP:
- The assistant knows the tools available (
Qdrant,n8n,calendar,wordpress). - The assistant sends a structured
tool_callrequest to fetch data from Qdrant. - It updates the user’s memory via
memory.update. - Next time you ask about espresso, it recalls you use a Lelit Bianca V3 with medium roast at 93Β°C.
- MCP Server logs all tool calls, memory updates, and context transitions.
π§© It’s like giving your AI both a short-term memory (for reasoning) and a long-term memory (for persistence), with the ability to call APIs as easily as thinking.
π οΈ Protocol Format (Simplified)
{
"role": "tool_call",
"name": "search_documents",
"args": {
"query": "dialing in espresso",
"top_k": 3
}
}The model can also use:
memory.get,memory.updatetool_call,tool_responseobserve(to introspect its own past behavior)
These structured calls go to the MCP Server which interprets and executes them.
π§ͺ Smithery: Tools for MCP
Smithery is an open-source project by Phillip Wang and community that:
- Implements an MCP-compliant server.
- Supports agentic workflows (multi-step plans).
- Manages tool definitions, memory, sessions.
- Allows local or cloud deployment.
Features of Smithery:
- Fully open-source and self-hostable.
- Comes with built-in tools (browser, shell, vector DBs).
- Supports tool composition: Tools can call other tools recursively.
- Great for building personal AI agents, especially when paired with:
- Ollama (for running LLMs locally),
- LangChain or LlamaIndex (for orchestration),
- n8n or Airflow (for automation).
Example Use Case with Smithery:
You’re building an Agentic RAG system with:
- Flowise frontend (your RAG UI),
- Qdrant backend (your vector DB),
- DeepSeek LLM (your inference engine),
- and n8n for automations.
With Smithery:
- You register Qdrant and n8n as tools in MCP.
- Your LLM runs via Ollama or LM Studio locally.
- The agent plans multi-step actions:
- Search Qdrant β Summarize β Create automation task β Send email via n8n.
- Each of these steps is coordinated via the MCP server.
π Section Summary
| Concept | Description |
|---|---|
| MCP | A protocol to manage context, tool use, memory, and reasoning across LLM sessions. |
| MCP Server | The backend that stores memory, executes tool calls, and maintains agent state. |
| Smithery | A fully open-source implementation of an MCP server with built-in agents and tool support. |
β Where the MCP Server runs (Local vs Remote)
If you’re using Smithery, then:
π₯οΈ Yes, the MCP Server runs locally (by default).
It acts as the local brain/bridge that your LLM connects to and communicates through using the MCP protocol.
So every LLM invocation or interaction goes through this local MCP Server for:
- Tool routing (
tool_call) - Memory access (
memory.get,memory.update) - Context logging
- Multi-step reasoning
Think of Smithery as your local hub that sits between your LLM (Ollama, LM Studio, or OpenAI) and your external tools (Qdrant, n8n, shell, browser, etc).
π§© Smithery Architecture (Simplified)
ββββββββββββββ MCP ββββββββββββββ
β You (LLM) ββββββββββββββΊβ MCP Server βββββΊ tools/memory/db
ββββββββββββββ protocol ββββββββββββββ- You interact with the LLM interface (can be terminal, GUI, browser, or agent framework).
- The LLM sends MCP-formatted messages to the Smithery MCP server.
- Smithery then:
- Routes requests to tools (
tool_call) - Fetches/stores memories (
memory.*) - Orchestrates multi-step actions if the LLM is in “agent mode”
- Routes requests to tools (
- Finally, it sends the tool responses back to the LLM for reasoning or continuation.
πΈοΈ Does it have to be local?
Not at all. You can also deploy Smithery remotely, like:
- On a cloud VM
- In Docker on a private server
- Even containerized inside your Flowise or n8n stack
If deployed remotely, your model/agent connects to the MCP Server’s endpoint via HTTP/WebSocket and uses the same MCP protocol.
But for your use case (Agentic RAG + Flowise + n8n + Qdrant), running Smithery locally is perfectly fineβand gives you low latency and full data control.
π Per-Invocation Connection?
Yes and no:
- The LLM doesn’t have to reconnect for every single tool call, as long as the session is active and the LLM is running in agent mode or under a persistent wrapper.
- But behind the scenes, each
tool_call,memory.get,memory.updateis a discrete request to the MCP server, even in a persistent session. - If you’re using a framework like LangGraph or LangChain Agents, they manage this under the hood.
βοΈ Example (Local Dev Setup)
If you’re running:
smithery serverIt starts the local MCP server on something like http://localhost:8000.
Then your LLM agent (e.g., via Flowise or a Python script) connects and sends MCP JSON messages to this local server to:
- Access memories (
get,update) - Invoke tools (
tool_call) - Track context
You’re now in agent territory π.
π Section Summary
| Question | Answer |
|---|---|
| Is the MCP server running locally in Smithery? | β Yes, by default. |
| Do I connect to it for each invocation? | β Yes, each tool/memory invocation routes through the MCP server. |
| Can it be remote? | β Yes, if you deploy it on a VM or container. |
| Is it required to be always running? | β Yes, for agentic behavior and memory/tools to work properly. |
That’s a great question, Mangesh β comparing MCP Servers and REST APIs is like comparing a “thinking agent’s brainstem” to individual service terminals. They both route information, but their roles and capabilities are fundamentally different.
Let’s break it down clearly:
π§ MCP Server vs π REST APIs
| Feature | MCP Server | REST APIs |
|---|---|---|
| Purpose | Manages the context, memory, and tool usage of an AI agent | Exposes resources and services for client-server communication |
| Client | Typically an LLM or agent that communicates via structured messages | Any HTTP client (browser, app, curl, etc.) |
| Communication Format | Structured MCP protocol messages (JSON with roles like tool_call, memory.update) | HTTP methods: GET, POST, PUT, DELETE |
| State Management | Designed to manage contextual state and memory across sessions | Generally stateless (each request is independent) |
| Tool Invocation | Central to its design β the MCP Server orchestrates and dispatches tool calls | Tools/APIs are independent, no orchestration layer |
| Memory Handling | Supports memory.get, memory.update β think persistent agent memory | Not built-in; memory must be externally managed (e.g., DB) |
| Multi-tool Composition | Supports tool chaining and agentic behavior (e.g., search β summarize β email) | Composition must be coded manually by client or backend |
| Designed For | Agentic LLMs, AI assistants, autonomous workflows | General-purpose web APIs and microservices |
| Orchestration | Yes β can decide which tools to use in what order | No β you have to orchestrate calls manually |
| Example Tools | Qdrant, n8n, shell, file system, browser, calendar APIs | Stripe, Twitter API, weather API, any web service |
π¦ Metaphor to Understand the Difference
- REST APIs: Think of them as individual counters in a government office. Each does one thingβpassport, ID card, birth certificateβand you (the client) must know which to go to, in what order.
- MCP Server: Think of it as an AI-powered concierge who knows your case, remembers past visits, decides which counters to go to, and fills the forms for you.
β Where They Intersect
An MCP Server actually calls REST APIs as tools.
Example:
Let’s say your LLM agent needs to schedule a meeting.
- The LLM sends a
tool_calltocalendar_api. - The MCP Server routes it to a REST API like Google Calendar.
- The result is returned to the LLM, possibly stored in memory, and used for next reasoning step.
So:
β‘οΈ MCP Server uses REST APIs as its tools.
But it adds memory, state, orchestration, and agentic reasoning on top.
π Can I use REST APIs without MCP?
Absolutely β you’ve probably done that 100 times with Python scripts, Postman, or JS fetch calls.
But without MCP:
- You must manage context and state yourself.
- No agentic chaining unless you manually script it.
- Memory must be implemented separately.
With MCP:
- You delegate that complexity to the MCP server.
- Your LLM becomes intelligent and persistent, capable of using tools like a human would.
Awesome. Let’s walk through a concrete example of an LLM agent using an MCP Server (like Smithery) to call a REST API tool.
π§ Scenario:
You want your agent to get the current weather using a public weather API (like Open-Meteo or WeatherAPI).
You’ve registered this weather API as a tool in your MCP server (Smithery), like this:
{
"name": "get_weather",
"description": "Fetches current weather info for a city.",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string", "description": "City name to fetch weather for" }
},
"required": ["city"]
}
}π Step-by-Step Agentic Interaction
π§ Step 1: LLM decides to use a tool
The model receives a user message like:
“What’s the weather in Pune right now?”
It then produces a tool_call using the MCP protocol:
{
"role": "tool_call",
"name": "get_weather",
"args": {
"city": "Pune"
}
}This message is sent to the MCP Server (Smithery).
π§ Step 2: MCP Server executes tool call
The MCP Server receives the tool call and routes it to your tool handler β which internally calls a REST API:
GET https://api.open-meteo.com/v1/forecast?current_weather=true&city=PuneIt parses the JSON response, for example:
{
"temperature": 32,
"weathercode": 3,
"windspeed": 12.3
}And returns a tool_response back to the model:
{
"role": "tool_response",
"name": "get_weather",
"content": {
"temperature": 32,
"condition": "Partly Cloudy",
"windspeed": 12.3
}
}π§ Step 3: LLM responds to the user
Now the LLM receives that tool response and generates a reply to the user:
“The current temperature in Pune is 32Β°C with partly cloudy skies and wind speed around 12.3 km/h.”
π οΈ Behind the scenes
| Component | Role |
|---|---|
| LLM | Thinks, decides, generates tool_call |
| MCP Server (Smithery) | Routes call to the actual API tool, manages memory/logs |
| REST API | Delivers the raw weather data |
| LLM (again) | Synthesizes a natural response using tool output |
π€ Bonus: Add Memory
Let’s say the user now asks:
“How does it compare to yesterday?”
If you had previously stored weather results in the MCP memory, the LLM could call:
{
"role": "memory.get",
"args": {
"key": "weather_yesterday_pune"
}
}Compare it with today’s data, and reply intelligently.
π Summary Flow
User β LLM β [tool_call JSON] β MCP Server β REST API
β¬βββββββββββββββ¬
οΈ
[tool_response JSON] β¬
β¬ββββ LLM formats reply β Userπ§© Section Summary (TL;DR)
| MCP Server | REST API |
|---|---|
| Orchestrates context, memory, and tools for agents | Serves data or services on demand |
| Used by LLMs to act like agents | Used by developers and clients |
| High-level “agent brain” | Low-level “service endpoint” |
π Final Thoughts
MCP is part of a larger movement to make LLMs less like chatbots and more like thinking agentsβautonomous, stateful, context-aware, and tool-augmented. It’s especially powerful when paired with self-hosted setups like Smithery, n8n, and Flowise, giving you full control over your AI workflows.