MCP and MCP Servers

The Model Context Protocol (MCP) is a relatively new concept in the realm of large language models (LLMs) that focuses on how models interact with external tools, APIs, agents, and memory systems in a structured and standardized way. It’s essentially a communication protocol for managing the context that surrounds a model and enables persistent, tool-augmented reasoning across different sessions or tasks.

🧠 What is MCP (Model Context Protocol)?

At a high level, MCP is a standardized way to manage and exchange context between an AI model (like GPT-4) and a set of tools, memories, and user interfaces.

It allows:

Stateful interactions: the model can recall past actions, user preferences, memory, and tool outputs.
Composition of multiple tools: like search, databases, web browsers, vector stores, file systems, and more.
Interoperability: making models more modular and easily integrated into different applications and environments.

🎯 Goals of MCP:

Context Management: How can LLMs keep track of what’s going on across different interactions or tools?
Tool Invocation: How do LLMs invoke external tools in a structured way (APIs, DBs, vector stores)?
Agentic Behavior: How do LLMs become more like thinking agents that act in steps, remember things, and learn?

🖥️ What is an MCP Server?

An MCP Server is a backend system that acts as a middleware between the LLM and the tools/memory systems. It:

Manages context: Stores and retrieves memory or previous interactions.
Routes tool calls: The model can call tools via the server using structured messages (like JSON).
Logs and audits interactions: For debugging, introspection, or reproducibility.
Enables persistent state across sessions.

Think of the MCP Server as the “brain stem” that keeps the AI’s memories, tools, and capabilities wired together.

🧰 Practical Example: AI Assistant with MCP

Let’s say you’re building a custom AI personal assistant for Mangesh that:

Answers questions from a private PDF knowledge base (via Qdrant).
Automates workflows via n8n.
Writes blog posts and schedules them in WordPress.
Remembers user preferences (coffee brewing tips, favorite LLM settings).
Talks to a calendar API.

Without MCP:

Each component needs to be glued together manually.
Context is ephemeral—once the chat is closed, all memory is lost.
Tool invocation is ad hoc (via plugins or custom API calls).

With MCP:

The assistant knows the tools available (Qdrant, n8n, calendar, wordpress).
The assistant sends a structured tool_call request to fetch data from Qdrant.
It updates the user’s memory via memory.update.
Next time you ask about espresso, it recalls you use a Lelit Bianca V3 with medium roast at 93°C.
MCP Server logs all tool calls, memory updates, and context transitions.

🧩 It’s like giving your AI both a short-term memory (for reasoning) and a long-term memory (for persistence), with the ability to call APIs as easily as thinking.

🛠️ Protocol Format (Simplified)

{
  "role": "tool_call",
  "name": "search_documents",
  "args": {
    "query": "dialing in espresso",
    "top_k": 3
  }
}

The model can also use:

memory.get, memory.update
tool_call, tool_response
observe (to introspect its own past behavior)

These structured calls go to the MCP Server which interprets and executes them.

🧪 Smithery: Tools for MCP

Smithery is an open-source project by Phillip Wang and community that:

Implements an MCP-compliant server.
Supports agentic workflows (multi-step plans).
Manages tool definitions, memory, sessions.
Allows local or cloud deployment.

Features of Smithery:

Fully open-source and self-hostable.
Comes with built-in tools (browser, shell, vector DBs).
Supports tool composition: Tools can call other tools recursively.
Great for building personal AI agents, especially when paired with:
- Ollama (for running LLMs locally),
- LangChain or LlamaIndex (for orchestration),
- n8n or Airflow (for automation).

Example Use Case with Smithery:

You’re building an Agentic RAG system with:

Flowise frontend (your RAG UI),
Qdrant backend (your vector DB),
DeepSeek LLM (your inference engine),
and n8n for automations.

With Smithery:

You register Qdrant and n8n as tools in MCP.
Your LLM runs via Ollama or LM Studio locally.
The agent plans multi-step actions:
- Search Qdrant ➝ Summarize ➝ Create automation task ➝ Send email via n8n.
Each of these steps is coordinated via the MCP server.

📝 Section Summary

Concept	Description
MCP	A protocol to manage context, tool use, memory, and reasoning across LLM sessions.
MCP Server	The backend that stores memory, executes tool calls, and maintains agent state.
Smithery	A fully open-source implementation of an MCP server with built-in agents and tool support.

✅ Where the MCP Server runs (Local vs Remote)

If you’re using Smithery, then:

🖥️ Yes, the MCP Server runs locally (by default).
It acts as the local brain/bridge that your LLM connects to and communicates through using the MCP protocol.

So every LLM invocation or interaction goes through this local MCP Server for:

Tool routing (tool_call)
Memory access (memory.get, memory.update)
Context logging
Multi-step reasoning

Think of Smithery as your local hub that sits between your LLM (Ollama, LM Studio, or OpenAI) and your external tools (Qdrant, n8n, shell, browser, etc).

🧩 Smithery Architecture (Simplified)

┌────────────┐     MCP     ┌────────────┐
│  You (LLM) │◄───────────►│ MCP Server │───► tools/memory/db
└────────────┘  protocol   └────────────┘

You interact with the LLM interface (can be terminal, GUI, browser, or agent framework).
The LLM sends MCP-formatted messages to the Smithery MCP server.
Smithery then:
- Routes requests to tools (tool_call)
- Fetches/stores memories (memory.*)
- Orchestrates multi-step actions if the LLM is in “agent mode”
Finally, it sends the tool responses back to the LLM for reasoning or continuation.

🕸️ Does it have to be local?

Not at all. You can also deploy Smithery remotely, like:

On a cloud VM
In Docker on a private server
Even containerized inside your Flowise or n8n stack

If deployed remotely, your model/agent connects to the MCP Server’s endpoint via HTTP/WebSocket and uses the same MCP protocol.

But for your use case (Agentic RAG + Flowise + n8n + Qdrant), running Smithery locally is perfectly fine—and gives you low latency and full data control.

🔁 Per-Invocation Connection?

Yes and no:

The LLM doesn’t have to reconnect for every single tool call, as long as the session is active and the LLM is running in agent mode or under a persistent wrapper.
But behind the scenes, each tool_call, memory.get, memory.update is a discrete request to the MCP server, even in a persistent session.
If you’re using a framework like LangGraph or LangChain Agents, they manage this under the hood.

⚙️ Example (Local Dev Setup)

If you’re running:

smithery server

It starts the local MCP server on something like http://localhost:8000.

Then your LLM agent (e.g., via Flowise or a Python script) connects and sends MCP JSON messages to this local server to:

Access memories (get, update)
Invoke tools (tool_call)
Track context

You’re now in agent territory 😎.

📝 Section Summary

Question	Answer
Is the MCP server running locally in Smithery?	✅ Yes, by default.
Do I connect to it for each invocation?	✅ Yes, each tool/memory invocation routes through the MCP server.
Can it be remote?	✅ Yes, if you deploy it on a VM or container.
Is it required to be always running?	✅ Yes, for agentic behavior and memory/tools to work properly.

That’s a great question, Mangesh — comparing MCP Servers and REST APIs is like comparing a “thinking agent’s brainstem” to individual service terminals. They both route information, but their roles and capabilities are fundamentally different.

Let’s break it down clearly:

🧠 MCP Server vs 🌐 REST APIs

Feature	MCP Server	REST APIs
Purpose	Manages the context, memory, and tool usage of an AI agent	Exposes resources and services for client-server communication
Client	Typically an LLM or agent that communicates via structured messages	Any HTTP client (browser, app, curl, etc.)
Communication Format	Structured MCP protocol messages (JSON with roles like `tool_call`, `memory.update`)	HTTP methods: `GET`, `POST`, `PUT`, `DELETE`
State Management	Designed to manage contextual state and memory across sessions	Generally stateless (each request is independent)
Tool Invocation	Central to its design — the MCP Server orchestrates and dispatches tool calls	Tools/APIs are independent, no orchestration layer
Memory Handling	Supports `memory.get`, `memory.update` — think persistent agent memory	Not built-in; memory must be externally managed (e.g., DB)
Multi-tool Composition	Supports tool chaining and agentic behavior (e.g., search → summarize → email)	Composition must be coded manually by client or backend
Designed For	Agentic LLMs, AI assistants, autonomous workflows	General-purpose web APIs and microservices
Orchestration	Yes — can decide which tools to use in what order	No — you have to orchestrate calls manually
Example Tools	Qdrant, n8n, shell, file system, browser, calendar APIs	Stripe, Twitter API, weather API, any web service

📦 Metaphor to Understand the Difference

REST APIs: Think of them as individual counters in a government office. Each does one thing—passport, ID card, birth certificate—and you (the client) must know which to go to, in what order.
MCP Server: Think of it as an AI-powered concierge who knows your case, remembers past visits, decides which counters to go to, and fills the forms for you.

✅ Where They Intersect

An MCP Server actually calls REST APIs as tools.

Example:

Let’s say your LLM agent needs to schedule a meeting.

The LLM sends a tool_call to calendar_api.
The MCP Server routes it to a REST API like Google Calendar.
The result is returned to the LLM, possibly stored in memory, and used for next reasoning step.

So:

➡️ MCP Server uses REST APIs as its tools.
But it adds memory, state, orchestration, and agentic reasoning on top.

🔁 Can I use REST APIs without MCP?

Absolutely — you’ve probably done that 100 times with Python scripts, Postman, or JS fetch calls.

But without MCP:

You must manage context and state yourself.
No agentic chaining unless you manually script it.
Memory must be implemented separately.

With MCP:

You delegate that complexity to the MCP server.
Your LLM becomes intelligent and persistent, capable of using tools like a human would.

Awesome. Let’s walk through a concrete example of an LLM agent using an MCP Server (like Smithery) to call a REST API tool.

🧠 Scenario:

You want your agent to get the current weather using a public weather API (like Open-Meteo or WeatherAPI).

You’ve registered this weather API as a tool in your MCP server (Smithery), like this:

{
  "name": "get_weather",
  "description": "Fetches current weather info for a city.",
  "parameters": {
    "type": "object",
    "properties": {
      "city": { "type": "string", "description": "City name to fetch weather for" }
    },
    "required": ["city"]
  }
}

🔁 Step-by-Step Agentic Interaction

🧠 Step 1: LLM decides to use a tool

The model receives a user message like:

“What’s the weather in Pune right now?”

It then produces a tool_call using the MCP protocol:

{
  "role": "tool_call",
  "name": "get_weather",
  "args": {
    "city": "Pune"
  }
}

This message is sent to the MCP Server (Smithery).

🧠 Step 2: MCP Server executes tool call

The MCP Server receives the tool call and routes it to your tool handler — which internally calls a REST API:

GET https://api.open-meteo.com/v1/forecast?current_weather=true&city=Pune

It parses the JSON response, for example:

{
  "temperature": 32,
  "weathercode": 3,
  "windspeed": 12.3
}

And returns a tool_response back to the model:

{
  "role": "tool_response",
  "name": "get_weather",
  "content": {
    "temperature": 32,
    "condition": "Partly Cloudy",
    "windspeed": 12.3
  }
}

🧠 Step 3: LLM responds to the user

Now the LLM receives that tool response and generates a reply to the user:

“The current temperature in Pune is 32°C with partly cloudy skies and wind speed around 12.3 km/h.”

🛠️ Behind the scenes

Component	Role
LLM	Thinks, decides, generates `tool_call`
MCP Server (Smithery)	Routes call to the actual API tool, manages memory/logs
REST API	Delivers the raw weather data
LLM (again)	Synthesizes a natural response using tool output

🤖 Bonus: Add Memory

Let’s say the user now asks:

“How does it compare to yesterday?”

If you had previously stored weather results in the MCP memory, the LLM could call:

{
  "role": "memory.get",
  "args": {
    "key": "weather_yesterday_pune"
  }
}

Compare it with today’s data, and reply intelligently.

🚀 Summary Flow

User ➝ LLM ➝ [tool_call JSON] ➝ MCP Server ➝ REST API
                                 ⬑──────────────⬅︎
                         [tool_response JSON] ⬑
                                 ⬑──── LLM formats reply ➝ User

🧩 Section Summary (TL;DR)

MCP Server	REST API
Orchestrates context, memory, and tools for agents	Serves data or services on demand
Used by LLMs to act like agents	Used by developers and clients
High-level “agent brain”	Low-level “service endpoint”

🔚 Final Thoughts

MCP is part of a larger movement to make LLMs less like chatbots and more like thinking agents—autonomous, stateful, context-aware, and tool-augmented. It’s especially powerful when paired with self-hosted setups like Smithery, n8n, and Flowise, giving you full control over your AI workflows.

LLaMA 3.2 Vision Model and OpenAI Embeddings Nomic Text Embeddings vs. LLaMA