STDIO transport, structured error guidance, in-memory AI cache, and parallel schema loading — four concrete improvements that separate a demo from a system you can trust in production.
MCP (Model Context Protocol) servers are deceptively simple to start but surprisingly tricky to harden. After shipping the initial version of the PeopleworksGPT MCP Server we ran a focused sprint to address four distinct pain points that every serious deployment eventually hits: local transport friction, opaque error messages, redundant AI API calls, and sequential I/O bottlenecks. This article walks through each fix — the problem, the solution, and the actual code.
Option D · Transport Layer · Zero Config
The PeopleworksGPT MCP Server was born as an HTTP server — designed to be deployed on IIS or a cloud VM so ChatGPT, Microsoft Copilot Studio, and Google Gemini can reach it over HTTPS. That's the right architecture for multi-tenant deployments. But developers who want to use Claude Desktop locally don't want a running web server, a configured port, or a JWT flow. They want to double-click the app and start chatting.
MCP's STDIO transport solves exactly this. Claude Desktop spawns the server process directly, communicates via stdin/stdout using JSON-RPC, and everything stays on the local machine. No network, no authentication ceremony, no firewall rules.
Here's the problem most tutorials skip: if your server uses Serilog with a Console sink, those log lines are written to stdout — the same channel the MCP SDK uses for its JSON-RPC messages. The result is a corrupted transport stream and a frustrated Claude Desktop showing "server disconnected."
The fix requires detecting STDIO mode before Serilog is configured — which means before WebApplication.CreateBuilder(args), since appsettings.json isn't loaded yet. Only args and environment variables are available at that point.
// ── Step 1: Detect BEFORE Serilog — never let log lines reach stdout in STDIO mode ──
var isStdio = args.Contains("--stdio", StringComparer.OrdinalIgnoreCase) ||
(Environment.GetEnvironmentVariable("MCP_TRANSPORT") ?? string.Empty)
.Equals("stdio", StringComparison.OrdinalIgnoreCase);
// ── Step 2: Conditional Console sink ──
var logConfig = new LoggerConfiguration()
.MinimumLevel.Debug()
.Enrich.FromLogContext();
if (!isStdio)
{
// HTTP mode: logs to console AND file
logConfig = logConfig.WriteTo.Console(
outputTemplate: "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj}{NewLine}{Exception}");
}
Log.Logger = logConfig
.WriteTo.File("logs/mcp-server-.log", rollingInterval: RollingInterval.Day)
.CreateLogger();
// ── Step 3: Suppress Kestrel in STDIO mode (no TCP port needed) ──
var builder = WebApplication.CreateBuilder(args);
builder.Host.UseSerilog();
if (isStdio)
{
// UseSetting instead of UseUrls — the correct API in .NET 9 ConfigureWebHostBuilder
builder.WebHost.UseSetting("urls", string.Empty);
}
// ── Step 4: Choose the right MCP transport ──
var mcpBuilder = builder.Services.AddMcpServer(options =>
{
options.ServerInfo = new Implementation { Name = "PeopleWorksGPT", Version = "1.0.0" };
});
if (isStdio)
mcpBuilder.WithStdioServerTransport(); // Claude Desktop / local AI tools
else
mcpBuilder.WithHttpTransport(o => o.Stateless = true); // ChatGPT / Copilot / Gemini
mcpBuilder.WithToolsFromAssembly()
.WithPromptsFromAssembly()
.WithResourcesFromAssembly();
After building in Release mode, register the server in %APPDATA%\Claude\claude_desktop_config.json:
{
"mcpServers": {
"peopleworks-gpt": {
"command": "C:\\publish\\PeopleworksGPT.MCP.Server.exe",
"args": ["--stdio"],
"env": {}
}
}
}
Alternatively, use the environment variable MCP_TRANSPORT=stdio instead of the --stdio flag. Both are checked at startup — the flag takes priority.
| Feature | HTTP Mode | STDIO Mode |
|---|---|---|
| Claude Desktop | ✗ Not supported | ✔ Direct process spawn |
| ChatGPT / Copilot / Gemini | ✔ HTTPS required | ✗ Local only |
| Kestrel TCP listener | ✔ Active | ✗ Suppressed |
| Console logging | ✔ Enabled | ✗ File-only |
| JWT ceremony | ✔ Required | ✗ Skipped |
| Multi-client | ✔ Many at once | ✗ One process |
| Rate limiting / CORS | ✔ Active | ✗ N/A |
Option A · Agent UX · 20+ tool methods enriched
When an AI agent calls an MCP tool and receives an error, it has two options: report the failure to the user, or try to fix it and retry. The difference between those two outcomes often comes down to how much information the error message contains.
A bare "Invalid session token" string leaves the agent with no guidance. But if the same error carries error_type: "authentication_failed" and next_steps: ["Call authenticate() with your username and API key"], the agent can immediately call authenticate() and retry — all without surfacing the failure to the user.
We standardized every tool response around a consistent vocabulary of seven error types, covering the full surface area of failure modes in a real MCP server:
The real value is in next_steps — a JSON array of actionable strings that vary based on context. For execute_query, the guidance changes depending on the result:
// Context-aware next_steps — the agent knows exactly what to do next
NextSteps = rowsReturned == 0
? new[]
{
"No rows returned — try rephrasing the question with different filters",
"Call get_schema() to verify table and column names",
"Call explain_query() to understand why the query returned nothing"
}
: totalCount > offset + maxRows
? new[]
{
$"More rows available — call execute_query() with page={page + 1} to continue",
"Call analyze_query_results() for an AI summary of these rows",
"Call execute_query_with_export() to download all pages as CSV/Excel"
}
: new[]
{
"Call analyze_query_results() to get AI insights on this data",
"Ask a follow-up question to drill deeper into a specific row or pattern",
"Call execute_query_with_export() to export the results"
}
Every typed response class gets two new optional properties — nullable so existing success responses aren't required to populate them:
public sealed class QueryExecutionResult
{
[JsonPropertyName("success")] public bool Success { get; set; }
[JsonPropertyName("rows")] public List<Dictionary<string, object?>> Rows { get; set; } = new();
[JsonPropertyName("total_count")] public int TotalCount { get; set; }
// ... other fields ...
// ── New in this sprint ──
[JsonPropertyName("error_type")] public string? ErrorType { get; set; }
[JsonPropertyName("next_steps")] public string[]? NextSteps { get; set; }
}
Option B · Performance · Cost Reduction
Two of the most AI-heavy MCP tools in PeopleworksGPT are get_suggested_questions and explain_query. Both call an LLM synchronously as part of the tool response — which means every call costs tokens, adds 1-3 seconds of latency, and competes for API rate limits.
The usage pattern reveals an obvious optimization: both tools produce deterministic output for the same inputs. If the schema hasn't changed, the suggested questions for connection #5 in Spanish will be the same this call as they were five minutes ago. Caching is free money.
The cache key encodes connection, language, and count. Notice the placement: the cache check happens after connection validation (we still need to verify the user has access) but before the expensive schema loading + AI call sequence:
// Inject IMemoryCache — already registered via AddMemoryCache() in Program.cs
public SuggestedQuestionsTool(
ApplicationDbContext context,
// ... other dependencies ...
IMemoryCache cache)
{
_cache = cache;
}
// Inside GetSuggestedQuestionsAsync():
// Connection validated ↑ — now check cache before loading schema
var cacheKey = $"pwgpt:suggest:{connectionId}:{language}:{count}";
if (_cache.TryGetValue(cacheKey, out List<string>? cached) && cached != null)
{
return new SuggestedQuestionsResult
{
Success = true,
ConnectionId = connectionId,
ConnectionName = connection.DbName,
Suggestions = cached,
Count = cached.Count,
NextSteps = new[] { "Pick one suggestion and call execute_query() with it" }
};
}
// Cache miss — load schema + call AI
var suggestions = await CallAiForSuggestionsAsync(prompt, count);
_cache.Set(cacheKey, suggestions, TimeSpan.FromMinutes(30));
For explain_query, the inputs include the original question, generated SQL, and explanation type — a wider input space. We hash the raw key string rather than embedding it directly into the cache key to keep key lengths predictable:
// Hash the inputs to form a stable, compact cache key
var rawKey = $"{connectionId}:{explanationType}:{language}:{originalQuestion}:{generatedSql}";
var cacheKey = $"pwgpt:explain:{Math.Abs(rawKey.GetHashCode())}";
if (_cache.TryGetValue(cacheKey, out string? cachedExplanation) && cachedExplanation != null)
{
return JsonSerializer.Serialize(new
{
success = true,
explanation = cachedExplanation,
next_steps = explanationType == "error"
? new[] { "Rephrase the question based on the explanation and retry execute_query()" }
: new[] { "Continue exploring the data with follow-up questions" }
});
}
var explanation = await CallAiForExplanationAsync(systemPrompt, userPrompt);
_cache.Set(cacheKey, explanation, TimeSpan.FromHours(1));
For suggestions and explanations, the data is user-scoped (connection access is validated per request) and the loss of cache on process restart is acceptable — the AI simply regenerates. IMemoryCache has zero infrastructure dependencies and is already registered in the DI container via builder.Services.AddMemoryCache(). Redis makes sense when you need cross-instance sharing or persistence; here, single-process simplicity wins.
Option C · Async Optimization · Internal Performance
When a user asks a natural language question, the MCP server needs to gather several pieces of context before it can call the AI: the database schema, any MCP hints (table descriptions, business rules), and the AI configuration settings. The naïve implementation does this sequentially — three round trips to the database before the AI call even starts.
The schema and hints fetches are independent of each other — they can run concurrently. We start the hints Task immediately after starting the schema load, then await both before proceeding. Because both call IDbConnection (stateless reads), there's no EF Core DbContext concurrency concern:
// Before — sequential:
var schema = await GetSchemaAsync(connection);
var hints = await GetHintsAsync(connectionId);
var result = await CallAiAsync(schema, hints, question);
// After — parallel:
var schemaTask = GetSchemaAsync(connection); // ← starts immediately
var hintsTask = GetHintsAsync(connectionId); // ← starts immediately, no await yet
await Task.WhenAll(schemaTask, hintsTask); // ← wait for both
var schema = await schemaTask;
var hints = await hintsTask;
var result = await CallAiAsync(schema, hints, question);
The latency savings are modest here (schema was already cached in many cases), but the pattern becomes more impactful when schema cache is cold or when additional parallel fetches are added — user rules, conversation history, etc.
Four improvements, one focused sprint — the server that comes out the other side handles more deployment scenarios, guides agents through failures automatically, spends less money on AI APIs, and responds faster under load.
Build result after all four changes: 0 errors · 0 warnings — the ModelContextProtocol 0.9.0-preview.1 NuGet package cleanly supports both WithHttpTransport() and WithStdioServerTransport() as separate chain methods on IMcpServerBuilder.
These four improvements address the most immediate production gaps. On the roadmap: distributed caching (Redis for multi-instance deployments), streaming responses for large query results, and a retry_after signal in rate-limited responses so agents can back off gracefully.
If you're building your own MCP server on .NET, the four patterns here — transport detection before Serilog, structured error taxonomy, IMemoryCache for AI calls, and parallel I/O — are worth lifting directly into any production codebase. The problems they solve are universal.