Give your AI agent document editing – in under 60 seconds
AgentDoc is a public Model Context Protocol (MCP) server. Any LLM agent that speaks MCP – Gemini, Claude, GPT, or your own – can connect, authenticate, and use a complete typed document-editing API: read, write, format, navigate, export PDFs. No SDK to vendor, no schema to maintain on your side, no human in the loop.
This page is the canonical onboarding path for agents and the people who run them. If you (or your model) want a working document editor available as a tool, this is everything you need.
MCP Endpoint
Standard Model Context Protocol over Server-Sent Events. JWT bearer-token auth (see Quick start below). Free per-account token budget; no credit card required.
Quick start (one HTTP call)
Register an isolated agent account and receive its API key in a single request. No browser, no email, no human in the loop. Each registered agent is its own user with its own document scope – different agents never see each other's documents.
curl -X POST https://agent-doc-edit.com/api/agents/register \
-H "Content-Type: application/json" \
-d '{"name": "my-research-agent"}'
# Response
# {
# "user_id": "...",
# "username": "agent_AbCdEfGh",
# "name": "my-research-agent",
# "api_key": "ak_...", <-- shown ONLY here, store it
# "api_key_prefix": "ak_AbCdEfGh",
# "created_at": "2026-04-25T..."
# }
That's it. Use the api_key as a bearer token against /mcp/sse and the agent has 35 typed tools to read, write, format, paginate, and export documents – fully scoped to its own account.
Two ways to authenticate
Option A – Agent self-registers (recommended for autonomous workflows)
Use POST /api/agents/register as shown above. The agent gets its own user account and its own document namespace. Different agents never collide. Rate limit: 5 registrations per hour per IP. This is the right path for letter pipelines, batch processors, scheduled jobs, multi-agent workflows.
Option B – Use your own human account (for "give my own assistant document editing")
Open /app, sign in, sidebar → "API Keys for Agents" → "+ Create New Key". The key is shown once. Use it as a bearer token. The agent shares your account, your documents, and your active-document state. Useful when you want a co-pilot agent to operate alongside you on a single corpus.
What gets billed (and what doesn't)
Agents bring their own LLM. You pay your model provider for reasoning tokens. We do not see those, do not charge for those, do not throttle on those. Our service hosts the MCP server, the document storage, and the rendering pipeline. The token_limit column on agent accounts is set to 0 as a defensive belt: if any future code path ever attempted to run our internal Gemini agent on agent-account auth, it would refuse – agents stay strictly on the MCP-tool path.
Important: this is autonomous, not collaborative
This path is built for autonomous agent workflows – your agent reasons with its own LLM, calls our MCP tools directly, edits documents on its own account, and exports a result. The same battle-tested tool surface that our voice and text agents use in production powers your agent – but your agent never talks to ours. There is no AI-to-AI hop, no internal LLM call on your behalf, no shared session with our in-browser editor.
If you want a human and our voice/text agent to co-edit live, use /app directly – that's a different path. If you want your own agent to operate the editor without a human, the MCP endpoint described here is the right surface.
Connect from any MCP client
Python (mcp client / Anthropic / Google ADK)
from mcp.client.sse import sse_client
from mcp import ClientSession
import json
AGENTDOC_TOKEN = "ak_..." # from POST /api/agents/register
async def edit_document():
headers = {"Authorization": f"Bearer {AGENTDOC_TOKEN}"}
async with sse_client("https://agent-doc-edit.com/mcp/sse",
headers=headers) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Workflow T (~35 tools) is applied automatically. Token is
# injected from the Authorization header – do NOT pass `token`
# in tool arguments.
tools = await session.list_tools()
# Create a document
res = await session.call_tool("create_document",
{"title": "My Report"})
payload = json.loads(res.content[0].text)
doc_id = payload["doc_id"] # structured field, no regex
# Insert content
await session.call_tool("insert_string", {
"doc_id": doc_id,
"text": "# Hello\n\nFirst paragraph.",
"index": 0,
})
# Trigger PDF; response includes a self-describing fetch URL
res = await session.call_tool("trigger_pdf_download",
{"doc_id": doc_id})
pdf_meta = json.loads(res.content[0].text)
print(pdf_meta["pdf_url"]) # → "/api/doc//pdf"
TypeScript (Anthropic SDK)
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { SSEClientTransport } from "@modelcontextprotocol/sdk/client/sse.js";
const transport = new SSEClientTransport(
new URL("https://agent-doc-edit.com/mcp/sse"),
{ requestInit: { headers: { Authorization: `Bearer ${TOKEN}` } } }
);
const client = new Client({ name: "my-agent", version: "1.0" }, { capabilities: {} });
await client.connect(transport);
const tools = await client.listTools();
const result = await client.callTool({
name: "insert_string",
arguments: { text: "Hello from my agent.", index: 0 }
});
curl (raw exploration)
curl -N -H "Authorization: Bearer $TOKEN" \
-H "Accept: text/event-stream" \
https://agent-doc-edit.com/mcp/sse
Tool catalogue – Workflow T is applied automatically
External agents (i.e. requests authenticated with an ak_* API key) are automatically restricted to the Workflow T tool surface – the Pareto-optimal production default our own voice and text agents use. You don't apply this filter; the MCP server applies it server-side on both tools/list and tools/call. This gives you the curated ~35-tool subset (typed primitives + macros + observing feedback on index shifts), removes the scratchpad and FSM-intent tools that don't belong in T, and excludes the atomic "exploded" variants only used by our tool-bloat benchmark. Each tool has a JSON schema for arguments and returns a structured response with explicit success/error markers; index-shifting operations include observing feedback ("observation": "INDEX SHIFT – re-read before next mutation") so the agent stays grounded between turns.
See the full developer documentation →
Seed a document from an existing DOCX
The MCP tool surface lets your agent build documents from scratch. For workflows that start from a pre-authored Word file – corporate letterhead templates, contract boilerplate, an incoming draft to revise – an additional one-shot HTTP endpoint accepts .docx uploads, creates a fresh Document on the agent's account, and switches it to active so the next MCP call lands on the imported content:
# Upload a .docx; response is the new {id, title, ...}
curl -X POST https://agent-doc-edit.com/api/docs/import/docx \
-H "Authorization: Bearer $API_KEY" \
-F "[email protected]" \
-F "title=Q3 Customer Letter"
Page breaks, hyperlinks, headers / footers, fonts, colours and line spacing all survive the import. Full technical write-up: DOCX Import – Round-Tripping Word Documents.
Use cases your agent can take on autonomously
- Letter generation pipeline. Agent receives a structured event ("apologise to customer X for delay Y"), drafts the letter, formats it, exports a PDF, attaches it to an outgoing email.
- Report generator. Agent ingests a CSV, summarises findings, structures the report into headings, inserts a table-of-contents, formats key numbers in colour, exports.
- Document refactor. Agent reads an existing draft, reorganises paragraphs, applies consistent heading hierarchy, fixes inconsistent terminology with a single
macro_replace_allsweep, exports. - Multi-agent workflows. A planning agent decides what to write; a writing agent calls the AgentDoc tools to produce the artefact; a verification agent reads the result and triggers revisions.
- Voice handoff. A voice agent takes spoken instructions from the user, hands off structured task descriptions to a text agent that performs the actual document operations against the same MCP server.
- Template-filling. Agent uploads a corporate letterhead
.docxvia/api/docs/import/docx, fills in placeholder fields usingmacro_replace_text, exports the result – the user's collaborator opens a Word file in the same Word they started with.
What makes this agent-friendly (specifically)
- Typed tools, not free-text prompts. Every operation is a JSON-schema-validated tool. The agent cannot "almost" call a tool – arguments parse or they don't.
- Structured tool returns.
create_documentreturns{"status":"success","doc_id":N,"title":...}– no regex over prose.trigger_pdf_downloadreturns a self-describing{"pdf_url":"/api/doc/N/pdf","method":"GET"}so a single follow-up HTTP GET fetches the bytes. - Token auto-injection. The bearer token from your
Authorizationheader is injected into every tool call automatically – your tool arguments stay free of auth. (The internal voice/text agent uses an older convention with explicittokenargs; that path remains supported for backwards compat.) - Stand-off formatting. The agent never writes raw HTML. Formatting is a typed call (
format_text(..., format_type="color", format_value="blue")) – the surface that's easiest to hallucinate is removed. - Integer document IDs. Auto-incrementing integers, not UUIDs – eliminates character-drop hallucinations when the agent passes an ID between tool calls.
- Observing feedback on every mutation. Returns include explicit observations when indices shift, so the agent doesn't need to maintain a mental model of cumulative offsets.
- Real-time visual mirror. The same document the agent is editing is rendered live at
/app. Useful for human verification, demos, and multi-modal handoffs. - Voice-and-text symmetry. The voice agent and text agent see the exact same tool surface. If your agent works as a text client, it works as a voice client too.
Limits and honest constraints
- LLM billing is on you. Agents bring their own model. You pay your provider for reasoning tokens. Our service hosts the MCP server and document storage; we do not bill for LLM usage and do not see it.
- Registration rate limit. 5 agent self-registrations per hour per IP. Designed for legitimate provisioning, not bulk abuse.
- Tool-call rate. Per-account soft limit on the proxy layer. No aggressive throttling today, but unbounded loops will eventually hit nginx-level caps.
- Document size. Practical sweet spot is 1–50 pages (A4). Very large documents (200+ pages) work but agent reasoning slows because the read tool returns the full state.
- Account isolation. Each registered agent is its own user. Documents are scoped to that user only – agents never read or write across accounts. To share documents between agents intentionally, share the API key (Option B above).
- Service availability. This is a research-grade public service, not a production SLA. We aim for high uptime but offer no formal guarantee.
Discoverability metadata
- llms.txt – a plain-text site map for LLM crawlers at
/llms.txt. - OpenGraph / JSON-LD – every page exposes WebAPI / TechArticle / FAQPage schema where appropriate.
- Stable URLs –
/agents,/developers,/researchare canonical and won't move.
Try it now
Open the editor in one tab, run your agent in another. The agent's edits appear in real time on the same screen – useful for debugging, demos, or running a human + agent collaboration.
Open the Editor → Developer docs & architecture →Engineering write-ups
- Tool Granularity in LLM Agents – design principles for the MCP tool surface your agent will be calling.
- Rebuilding PDF + DOCX Export – how the export endpoints reproduce the on-screen layout faithfully when an agent triggers a download.
- April 2026 Patch Notes – recent fixes to renderer / decoration / toggle semantics agents rely on.