MCP in Production: Connector Limits and Retrieval Architecture

A good deal of recent enterprise AI discussion treats MCP as if it described a complete retrieval architecture. It does not.

The Model Context Protocol is a standardized way for an LLM application to discover and invoke capabilities exposed by an external server. It’s defined around three primitives: Tools (model-invoked actions), Resources (application-provided context), and Prompts (user-invoked templates). The protocol itself is agnostic to what happens behind the server interface. An MCP server can wrap a thin source-system API. It can also sit in front of a deeply pre-indexed corpus. Both are valid implementations.

What has actually shaped the conversation about MCP's strengths and limitations is not the protocol. It's the prevailing implementation pattern: connectors that wrap an enterprise SaaS API and pass through its native search and document-fetch endpoints, without adding their own retrieval intelligence on top.

This is a genuinely useful pattern for many workloads. It's also been over-applied, partly because it's the easiest pattern to ship, and partly because the distinction between "MCP the protocol" and "thin-connector MCP server" rarely surfaces in vendor messaging. For teams making architectural decisions, that distinction matters.

What follows is where retrieval intelligence lives in the connector pattern, what it's good at, where it runs into structural limits, and why those limits are properties of the architecture rather than of MCP.

What the Connector Pattern Actually Does

A typical thin-connector MCP server exposes a small number of Tools, usually a search tool and a fetch tool, that map directly onto the source system's API. When a user asks something like "find recent broker notes that discuss competitive threats to Company C," the LLM application invokes the search tool, the connector forwards a query to the source system's search endpoint, the source returns a ranked list of documents, and the model selects which to fetch. Each fetch call retrieves a document's contents. The contents arrive in the model's context window as text, with whatever metadata the source API surfaces alongside.

Two structural properties fall out of this pattern, regardless of which source system is behind the connector or which model is in front of it.

The retrieval quality is whatever the source system's search produces. The connector doesn't re-rank, doesn't apply its own embedding model, doesn't filter by document type or sentiment because it has no notion of either. If the source system's search returns the right document on page seven, the connector returns it on page seven. If the source's search is keyword-biased and the user's query uses different terminology than the document, the document doesn't surface. Agentic systems can partially compensate — retrying with reformulated queries, broadening scope when initial recall is low — but that shifts the cost to latency and token budget rather than eliminating it.

Document understanding happens at synthesis time, not before. Some connectors attach lightweight metadata to fetched documents. Google Drive surfaces auto-tags, Notion exposes page properties, SharePoint passes site context. That's not nothing. But it's generic surface-level signal: author, date, file type, maybe a coarse category. It's not a domain-aware classification of what the document means.

A pitch deck, an investment committee memo, an analyst initiation note, and a marketing brochure arrive at the model as text with thin file metadata. The interpretive work — figuring out what kind of document this is, what conventions it follows, how its claims should be weighted — happens at query time, against whatever fits in the context window.

These aren't accidents. They follow from a connector pattern that pushes retrieval intelligence to query time.

What the Pattern is Genuinely Good At

The thin-connector pattern is excellent for a real and growing class of workloads.

Live structured data. Questions where the answer is "what does the source system show right now" — current ARR by account, today's open ticket count, this morning's pipeline state — are perfectly suited to the pattern. Pre-indexing this kind of data is the wrong move; the value is in retrieving live state.

Real-time enrichment. When the goal is to attach a fresh signal to an answer being generated elsewhere — the latest commit when discussing a code change, today's calendar when planning a meeting, a current invoice balance — the runtime fetch model is exactly right. Stale would be worse than thin.

Long-tail and low-priority sources. A connector can be stood up against a new source in days. For systems where the cost of a multi-week ingestion project isn't justified by the value of the data, that velocity is a real advantage.

Small, highly structured corpora. When operating against localized repositories like a codebase, lexical search is inherently sufficient. Pointing a thin connector at these sources works well because native tools (like grep or basic file-search APIs) are perfectly tuned for exact-match variable or function lookups. For these limited scopes, keyword-biased search is exactly the behavior you want, providing high velocity without complex pre-indexing.

These workloads are why the connector pattern proliferated, and they're why it should remain part of any serious enterprise AI stack. Thin connectors are useful. The question is whether they're sufficient for every workload they're being applied to.

Where the Pattern Runs into Structural Limits

The limits show up most clearly in workloads that demand depth: research that requires synthesis across many documents, comparisons that hinge on document-type-specific conventions, queries where the right answer might use different terminology than what's in the corpus.

Some of these limits are properties of thin implementations and can be addressed by a more sophisticated MCP server. Others, particularly around how identity is handled, are structural to the connector pattern itself, independent of how sophisticated the server is.

Inherited search quality. The connector inherits the source system's search behavior wholesale. Most enterprise SaaS platforms aren't search companies; their search components are built to support browsing within a UI, not to maximize recall on natural-language queries. When the platform's search is keyword-biased, the AI on top of it is keyword-biased. When the platform's search ignores document type or freshness in ranking, the AI on top of it does too.

Rate limits and pagination caps. Third-party APIs cap result counts per call, and for workloads where the model needs to scan a large set of documents to produce a complete answer, those caps become a recall ceiling. The model has no way to introspect what the source excluded.

Whole-document context loading. When the model decides which documents to inspect, the connector returns full document contents into the context window. If the answer lives in a single paragraph of a 30-page document, the model gets the 30 pages and has to find the paragraph. As more documents arrive in context, relevance density drops, and the model has more surface area to draw from priors rather than retrieved facts during synthesis.

Large, unstructured corpora. Thin implementations hit severe limits when scaled to sprawling enterprise repositories that demand semantic depth. Across tens of thousands of documents and beyond, the relevant answer often requires extracting and mapping many signals beyond the terms in the user's prompt. A basic search cannot bridge this semantic gap, resulting in missed context that compounds in agentic systems.

Per-user authentication friction. When the connector operates as the authenticated user, every user has to walk through OAuth for every connector at every scope change. Correct for permissions, but it scales poorly to large user populations and many connectors. This isn't a thin-connector problem — it's a consequence of how the pattern handles identity, and a more sophisticated server doesn't change it.

None of these are claims about MCP the protocol. They're claims about the consequence of putting retrieval intelligence at query time, against a search layer the application doesn't control.

The Architectural Alternative

The alternative isn't "don't use MCP." It's to do the retrieval intelligence work upstream of the query.

Documents go through processing at ingestion: classification against a domain-aware taxonomy, structure-aware chunking that respects the conventions of the document type (footnotes in filings preserved as units, earnings calls chunked by speaker turn and Q&A pair, contracts respecting clause boundaries), metadata extraction and inference, entity resolution against a canonical graph, sentiment scoring per statement, boilerplate identification, redundancy clustering. The enriched corpus is indexed in a system that supports hybrid lexical and dense retrieval with metadata-driven filtering. By the time a user query arrives, the heavy interpretive work is done.

This architecture can be exposed through MCP. It can be exposed through other interfaces. The protocol is orthogonal to where the intelligence lives but not to the workload fit. For high-stakes research, where finding the right document and grounding answers in specific spans matters more than freshness, the upstream model produces materially better recall, precision, and answer fidelity.

Choosing Per Workload, Not Per Ideology

The architectural debate isn't connector-pattern versus pre-indexed in the abstract. It's which pattern fits which workload, in a system that probably needs both. Live structured data favors the connector pattern. Document corpora where depth matters favor pre-indexed retrieval. Most production systems serving real users will need to route between the two depending on what's being asked.

Ingest-time intelligence isn't free either. Ingestion pipelines have to be maintained as sources change shape, taxonomies drift and need periodic recalibration, permissions have to be replicated and kept in sync with the source of truth, and there's always some lag between what the source system shows right now and what the index reflects. For workloads where freshness is the value, those costs aren't worth paying. For workloads where depth and auditability are the value, they are. The point isn't that one architecture is cheaper. The costs land in different places, and the workload should determine where you're willing to pay them.

The mistake worth avoiding is conflating the protocol with the architecture. MCP is a perfectly good way to expose either pattern. The question that matters is where the retrieval intelligence sits, and whether that placement matches the depth the workload demands. That's the lens worth bringing to vendor evaluation, not "do they use MCP." A platform's retrieval depth is determined by what it does with documents at ingestion, what it does with metadata at query time, and how it handles the workloads where the connector pattern's structural limits show up most.