MCP Server Production Readiness
Validation checklist for Recall's MCP server — what to run before shipping changes that touch the LLM-facing surface.
Recall exposes its own MCP server at /api/mcp/copilot (consumed by Claude Desktop, Cursor, and any external MCP client). This page is the checklist for verifying that surface is production-ready after changes.
This is the server-side readiness check. For configuring MCP clients (external tools you connect to Recall), see Using MCP Tools.
What the MCP server exposes
| Tool | Resource | Operations |
|---|---|---|
recall_workspaces | Workspaces | list, get, create, rename, update |
recall_workflows | Workflows + workflow folders | 25 ops including run, deploy, version, folder CRUD |
recall_docs | Workspace docs + doc folders | 17 ops including read, write, patch, glob, search, folder CRUD |
recall_tables | Workspace tables | 22 ops including row CRUD, schema, column management, import |
recall_knowledge | Knowledge bases | 18 ops including KB CRUD, documents, tags, connectors |
recall_jobs | Scheduled jobs | 9 ops including create, pause, resume, logs |
recall_env | Environment variables | list, set, delete |
recall_credentials | OAuth credentials + API keys | list, auth_link, rename, delete, generate_api_key |
recall_mcp_servers | Workflow + external MCP servers | 8 ops |
recall_skills | Workspace skills | list, add, edit, delete |
recall_memory | Workspace memory | list, search, add, correct, delete |
recall_logs | Execution / workflow / job logs | execution_summary, workflow_logs, job_logs |
recall_platform | Platform discovery (blocks, triggers, docs search, VFS) | 8 read-only ops |
Plus a subagent layer (recall_auth, recall_workflow, recall_docs_agent, etc.) for natural-language tasks.
Pre-ship checklist
1. Type check passes
cd apps/web && bun run type-check
cd apps/copilot && bun run type-checkBoth must exit cleanly. TypeScript catches the most common breakage class (renamed types, missing imports, schema/handler drift).
2. Unit + integration tests pass
# bun:test files (MCP definitions, auth, tool-id-aliases)
cd apps/web && bun test ./lib/copilot/tools/mcp/definitions.test.ts \
./lib/copilot/tools/tool-id-aliases.test.ts \
./lib/copilot/auth/extract-oauth-urls.test.ts
# vitest files (routes, hooks, services)
cd apps/web && bunx vitest run \
lib/copilot/ \
lib/folders/ \
app/api/mcp/ \
app/api/v1/ \
app/api/workspaces/ \
app/api/files/ \
app/api/chats/ \
hooks/queries/Expected: ~900 tests pass. Two dashboard test failures (message-content.component.test.tsx, agent-group.test.tsx) are pre-existing on main (QueryClient setup issue) and are tracked separately.
3. End-to-end certification harness against a live deployment
mcp-certify.ts is the production gate. It calls every tool, every operation, against a real workspace and validates response envelopes, error codes, confirmation gates, and structured output contracts.
cd apps/web
# Required environment:
export RECALL_BASE_URL="https://www.tryrecall.com" # or staging
export RECALL_COPILOT_API_KEY="rk_..." # workspace API key
# Optional:
export RECALL_WORKSPACE_ID="wsp_..." # reuse existing workspace
export RECALL_OAUTH_BEARER_TOKEN="..." # if testing bearer auth path
bun run mcp:certifyThe harness exercises (in order):
- Auth: unauthenticated requests rejected; invalid API key rejected; optional bearer-token auth
tools/listpublic contract- Workspace CRUD via
recall_workspaces - All domain operation envelopes
- Workflows + workflow folders
- Docs + doc folders (including
create_folder,rename_folder,move_folder,delete_foldercascade) - Tables
- Knowledge bases
- Jobs
- Environment + credentials
- MCP servers (workflow + external)
- Skills + memory
- Logs + platform discovery
recall_authsubagent (verifies structuredoauthUrlsfield — strict-by-default since PR #80)- Response envelope + redaction invariants
Set RECALL_REQUIRE_STRUCTURED_AUTH=0 only when running against a pre-C4 deploy. Otherwise the harness requires the structured data.oauthUrls field.
4. Smoke test (lighter weight)
cd apps/web && bun run mcp:smokeThe smoke harness runs a single happy-path call against each tool. Useful for fast verification after small changes; not a replacement for the cert harness.
Production-readiness criteria
The MCP server is production-ready when:
- Every domain tool has a rich description (when-to-use, when-not, ops, errors, examples, side effects) - see
apps/web/lib/copilot/tools/mcp/domain-tools.ts - Every domain tool has per-op discriminated JSON Schema (
oneOfbranches keyed onoperation) - seeapps/web/lib/copilot/tools/mcp/per-op-schemas.ts - Every destructive operation requires
confirm: trueand returnsconfirmation_requiredwhen missing - Every response is a structured envelope:
{success, message?, data?, error?} -
recall_authreturns a structureddata.oauthUrls: Array<{provider, url}>field - No legacy
recall_filesreferences in tool descriptions or prompts - No legacy
'workspace_file'etc. canonical tool IDs (alias map handles historical chat replay) - Folder ops live on the parent resource tool (
recall_docs.{,*}_folder,recall_workflows.{,*}_folder) - Public REST
/api/v1/docsand/api/v1/docs/foldersmatch MCP semantics - OpenAPI spec documents
folderIdon all relevant ops +DocMetadata.folderId - Audit log writes on every folder mutation (create, update, delete) regardless of surface
All of these are enforced by the unit tests, the cert harness, or both.
When something fails
- Type check fails → fix the type error. Don't ship.
- Unit test fails (new failure) → fix. Pre-existing failures on
mainare tracked but don't block. - Cert harness fails on a domain tool → that tool has a regression. The harness output tells you which operation. Check the dispatch in
apps/web/app/api/mcp/copilot/route.tsfor that tool's case. - Cert harness fails on response envelope (
returned success=true without message or data) → the response builder for that operation forgot to setmessageordata. Common when extending a tool. - Cert harness fails on
recall_authstructured shape → the auth subagent's<credential>tag parsing broke. Checkapps/web/lib/copilot/auth/extract-oauth-urls.ts+ its 13 unit tests.
Building production-ready workflows + agents via MCP
The tool surface is intentionally designed so an LLM client can construct a full Recall application without falling through to natural-language subagents:
- Discover the workspace →
recall_platform.blocksto learn block types,recall_platform.triggersfor trigger types,recall_platform.tool_searchto discover integration tools by regex. - Set up infrastructure →
recall_credentials.auth_linkto OAuth-connect external services;recall_env.setfor API keys;recall_knowledge.createfor KBs. - Build workflows →
recall_workflows.create_folderto organize,recall_workflows.createto make a workflow, then use therecall_workflowsubagent for the actual canvas edits (it accepts natural language and dispatches toedit_workflowinternally). - Iterate →
recall_workflows.run/run_until_block/run_from_blockfor partial execution;recall_workflows.get_logsfor execution logs. - Deploy →
recall_workflows.deploy_api,deploy_chat, ordeploy_mcp(all destructive + confirm-gated). - Manage runtime artifacts →
recall_docs.{write,read,patch}for prompts/configs/outputs;recall_tables.{insert_row,query_rows}for structured data;recall_jobs.createfor scheduled runs.
The MCP surface is the canonical way to programmatically build on Recall. Every operation here has a matching dashboard control, but the dashboard is for humans; MCP is for agents and external code.
Capabilities the server advertises
The MCP server advertises only the tools primitive in its initialize response:
{ "capabilities": { "tools": {} } }resources and prompts are not advertised today. This is intentional:
- Tools are the right primitive for Recall's API surface — every operation is an action the LLM can take.
- Resources would let MCP clients reference docs/tables as MCP-native URIs. Not currently needed because every doc and table is already addressable via
recall_docsandrecall_tablestools. Addingresourceswould create two ways to do the same thing. - Prompts would expose canned prompt templates. Not currently needed because the LLM-facing prompt-building happens client-side.
If a future use case requires either, both can be added without breaking existing tools/* callers.
Idempotency and replay
The MCP server does not implement idempotency keys. Duplicate tools/call requests execute twice. This is by design: every mutating operation is either:
- Naturally idempotent (e.g.
renameto the same name,setenv var to the same value) - Auto-deduplicated by the underlying handler (e.g.
create_folderauto-suffixes on name collision) - Confirm-gated and destructive (e.g.
delete,deploy_*,revert_to_version) — the LLM must explicitly opt in withconfirm: true, making accidental double-execution unlikely
If your client retries on transient failures, prefer exponential backoff over immediate retry, and check the previous call's outcome via the equivalent list/get operation before retrying a destructive write.
Error code reference
The MCP server emits these structured error codes via error.code on failed envelopes:
| Code | When |
|---|---|
invalid_params | Required field missing or fails JSON Schema validation |
permission_denied | Caller lacks the required read/write/admin permission |
not_found | Targeted resource doesn't exist or is soft-deleted |
workspace_mismatch | Resource exists but belongs to a different workspace |
confirmation_required | Destructive op called without confirm: true |
duplicate_name | Auto-suffix retry cap exhausted on a folder/resource name |
cycle_detected | Folder move would create a parent-child cycle |
ambiguous_table_name | tableName resolves to multiple tables in the workspace |
unsupported_operation | Operation name not recognized for this tool |
tool_failed | Underlying server tool raised; original error in error.details.output |
internal_error | Unexpected error; check logs |
Tool descriptions reference these by name in their ERRORS: section. Errors from the underlying tool layer (e.g. embedding failures, OAuth misconfiguration, storage quota) are surfaced as tool_failed with the original message in error.details.output rather than getting their own top-level codes — keeps the public contract small.