MCP Server Production Readiness

Validation checklist for Recall's MCP server — what to run before shipping changes that touch the LLM-facing surface.

Recall exposes its own MCP server at /api/mcp/copilot (consumed by Claude Desktop, Cursor, and any external MCP client). This page is the checklist for verifying that surface is production-ready after changes.

This is the server-side readiness check. For configuring MCP clients (external tools you connect to Recall), see Using MCP Tools.

What the MCP server exposes

ToolResourceOperations
recall_workspacesWorkspaceslist, get, create, rename, update
recall_workflowsWorkflows + workflow folders25 ops including run, deploy, version, folder CRUD
recall_docsWorkspace docs + doc folders17 ops including read, write, patch, glob, search, folder CRUD
recall_tablesWorkspace tables22 ops including row CRUD, schema, column management, import
recall_knowledgeKnowledge bases18 ops including KB CRUD, documents, tags, connectors
recall_jobsScheduled jobs9 ops including create, pause, resume, logs
recall_envEnvironment variableslist, set, delete
recall_credentialsOAuth credentials + API keyslist, auth_link, rename, delete, generate_api_key
recall_mcp_serversWorkflow + external MCP servers8 ops
recall_skillsWorkspace skillslist, add, edit, delete
recall_memoryWorkspace memorylist, search, add, correct, delete
recall_logsExecution / workflow / job logsexecution_summary, workflow_logs, job_logs
recall_platformPlatform discovery (blocks, triggers, docs search, VFS)8 read-only ops

Plus a subagent layer (recall_auth, recall_workflow, recall_docs_agent, etc.) for natural-language tasks.

Pre-ship checklist

1. Type check passes

cd apps/web && bun run type-check
cd apps/copilot && bun run type-check

Both must exit cleanly. TypeScript catches the most common breakage class (renamed types, missing imports, schema/handler drift).

2. Unit + integration tests pass

# bun:test files (MCP definitions, auth, tool-id-aliases)
cd apps/web && bun test ./lib/copilot/tools/mcp/definitions.test.ts \
                       ./lib/copilot/tools/tool-id-aliases.test.ts \
                       ./lib/copilot/auth/extract-oauth-urls.test.ts

# vitest files (routes, hooks, services)
cd apps/web && bunx vitest run \
  lib/copilot/ \
  lib/folders/ \
  app/api/mcp/ \
  app/api/v1/ \
  app/api/workspaces/ \
  app/api/files/ \
  app/api/chats/ \
  hooks/queries/

Expected: ~900 tests pass. Two dashboard test failures (message-content.component.test.tsx, agent-group.test.tsx) are pre-existing on main (QueryClient setup issue) and are tracked separately.

3. End-to-end certification harness against a live deployment

mcp-certify.ts is the production gate. It calls every tool, every operation, against a real workspace and validates response envelopes, error codes, confirmation gates, and structured output contracts.

cd apps/web

# Required environment:
export RECALL_BASE_URL="https://www.tryrecall.com"   # or staging
export RECALL_COPILOT_API_KEY="rk_..."                # workspace API key
# Optional:
export RECALL_WORKSPACE_ID="wsp_..."                  # reuse existing workspace
export RECALL_OAUTH_BEARER_TOKEN="..."                # if testing bearer auth path

bun run mcp:certify

The harness exercises (in order):

  • Auth: unauthenticated requests rejected; invalid API key rejected; optional bearer-token auth
  • tools/list public contract
  • Workspace CRUD via recall_workspaces
  • All domain operation envelopes
  • Workflows + workflow folders
  • Docs + doc folders (including create_folder, rename_folder, move_folder, delete_folder cascade)
  • Tables
  • Knowledge bases
  • Jobs
  • Environment + credentials
  • MCP servers (workflow + external)
  • Skills + memory
  • Logs + platform discovery
  • recall_auth subagent (verifies structured oauthUrls field — strict-by-default since PR #80)
  • Response envelope + redaction invariants

Set RECALL_REQUIRE_STRUCTURED_AUTH=0 only when running against a pre-C4 deploy. Otherwise the harness requires the structured data.oauthUrls field.

4. Smoke test (lighter weight)

cd apps/web && bun run mcp:smoke

The smoke harness runs a single happy-path call against each tool. Useful for fast verification after small changes; not a replacement for the cert harness.

Production-readiness criteria

The MCP server is production-ready when:

  • Every domain tool has a rich description (when-to-use, when-not, ops, errors, examples, side effects) - see apps/web/lib/copilot/tools/mcp/domain-tools.ts
  • Every domain tool has per-op discriminated JSON Schema (oneOf branches keyed on operation) - see apps/web/lib/copilot/tools/mcp/per-op-schemas.ts
  • Every destructive operation requires confirm: true and returns confirmation_required when missing
  • Every response is a structured envelope: {success, message?, data?, error?}
  • recall_auth returns a structured data.oauthUrls: Array<{provider, url}> field
  • No legacy recall_files references in tool descriptions or prompts
  • No legacy 'workspace_file' etc. canonical tool IDs (alias map handles historical chat replay)
  • Folder ops live on the parent resource tool (recall_docs.{,*}_folder, recall_workflows.{,*}_folder)
  • Public REST /api/v1/docs and /api/v1/docs/folders match MCP semantics
  • OpenAPI spec documents folderId on all relevant ops + DocMetadata.folderId
  • Audit log writes on every folder mutation (create, update, delete) regardless of surface

All of these are enforced by the unit tests, the cert harness, or both.

When something fails

  • Type check fails → fix the type error. Don't ship.
  • Unit test fails (new failure) → fix. Pre-existing failures on main are tracked but don't block.
  • Cert harness fails on a domain tool → that tool has a regression. The harness output tells you which operation. Check the dispatch in apps/web/app/api/mcp/copilot/route.ts for that tool's case.
  • Cert harness fails on response envelope (returned success=true without message or data) → the response builder for that operation forgot to set message or data. Common when extending a tool.
  • Cert harness fails on recall_auth structured shape → the auth subagent's <credential> tag parsing broke. Check apps/web/lib/copilot/auth/extract-oauth-urls.ts + its 13 unit tests.

Building production-ready workflows + agents via MCP

The tool surface is intentionally designed so an LLM client can construct a full Recall application without falling through to natural-language subagents:

  1. Discover the workspacerecall_platform.blocks to learn block types, recall_platform.triggers for trigger types, recall_platform.tool_search to discover integration tools by regex.
  2. Set up infrastructurerecall_credentials.auth_link to OAuth-connect external services; recall_env.set for API keys; recall_knowledge.create for KBs.
  3. Build workflowsrecall_workflows.create_folder to organize, recall_workflows.create to make a workflow, then use the recall_workflow subagent for the actual canvas edits (it accepts natural language and dispatches to edit_workflow internally).
  4. Iteraterecall_workflows.run / run_until_block / run_from_block for partial execution; recall_workflows.get_logs for execution logs.
  5. Deployrecall_workflows.deploy_api, deploy_chat, or deploy_mcp (all destructive + confirm-gated).
  6. Manage runtime artifactsrecall_docs.{write,read,patch} for prompts/configs/outputs; recall_tables.{insert_row,query_rows} for structured data; recall_jobs.create for scheduled runs.

The MCP surface is the canonical way to programmatically build on Recall. Every operation here has a matching dashboard control, but the dashboard is for humans; MCP is for agents and external code.

Capabilities the server advertises

The MCP server advertises only the tools primitive in its initialize response:

{ "capabilities": { "tools": {} } }

resources and prompts are not advertised today. This is intentional:

  • Tools are the right primitive for Recall's API surface — every operation is an action the LLM can take.
  • Resources would let MCP clients reference docs/tables as MCP-native URIs. Not currently needed because every doc and table is already addressable via recall_docs and recall_tables tools. Adding resources would create two ways to do the same thing.
  • Prompts would expose canned prompt templates. Not currently needed because the LLM-facing prompt-building happens client-side.

If a future use case requires either, both can be added without breaking existing tools/* callers.

Idempotency and replay

The MCP server does not implement idempotency keys. Duplicate tools/call requests execute twice. This is by design: every mutating operation is either:

  1. Naturally idempotent (e.g. rename to the same name, set env var to the same value)
  2. Auto-deduplicated by the underlying handler (e.g. create_folder auto-suffixes on name collision)
  3. Confirm-gated and destructive (e.g. delete, deploy_*, revert_to_version) — the LLM must explicitly opt in with confirm: true, making accidental double-execution unlikely

If your client retries on transient failures, prefer exponential backoff over immediate retry, and check the previous call's outcome via the equivalent list/get operation before retrying a destructive write.

Error code reference

The MCP server emits these structured error codes via error.code on failed envelopes:

CodeWhen
invalid_paramsRequired field missing or fails JSON Schema validation
permission_deniedCaller lacks the required read/write/admin permission
not_foundTargeted resource doesn't exist or is soft-deleted
workspace_mismatchResource exists but belongs to a different workspace
confirmation_requiredDestructive op called without confirm: true
duplicate_nameAuto-suffix retry cap exhausted on a folder/resource name
cycle_detectedFolder move would create a parent-child cycle
ambiguous_table_nametableName resolves to multiple tables in the workspace
unsupported_operationOperation name not recognized for this tool
tool_failedUnderlying server tool raised; original error in error.details.output
internal_errorUnexpected error; check logs

Tool descriptions reference these by name in their ERRORS: section. Errors from the underlying tool layer (e.g. embedding failures, OAuth misconfiguration, storage quota) are surfaced as tool_failed with the original message in error.details.output rather than getting their own top-level codes — keeps the public contract small.

On this page