How do I let my LLM agent edit documents?

Point any MCP-compatible client at https://agent-doc-edit.com/mcp/sse, authenticate with a JWT bearer token from your AgentDoc account, and the agent gains access to typed tools for reading, writing, formatting, navigating, and exporting documents. Works with Gemini, Claude, GPT, and any custom client speaking the Model Context Protocol over SSE.

Which LLMs work with this MCP server?

Any LLM whose runtime supports MCP tool calling: Google Gemini (3.x and Live), Anthropic Claude (via the official Claude Code MCP client), OpenAI GPT (via tool-calling SDKs that bridge to MCP), and any open-source model running through a host that speaks MCP. The server is model-agnostic – it exposes typed tools, not model-specific endpoints.

Is the MCP endpoint free to use?

Yes, with a per-account token budget (3 million tokens/month default). No credit card required to start. The token budget covers both the document operations themselves and the LLM-side reasoning the host model performs while orchestrating the tool calls.

What document operations can my agent perform?

Read raw Markdown and rendered HTML, find text by regex, insert and delete substrings by index, format with 15 colors, 12 fonts, 7 sizes, headings, bold/italic/underline/strikethrough/sub/superscript, page breaks, table styling, table-of-contents generation, header/footer editing, page navigation, and PDF export. Header and footer are isolated areas with their own indexed addresses.

Give your AI agent document editing – in under 60 seconds

AgentDoc is a public Model Context Protocol (MCP) server. Any LLM agent that speaks MCP – Gemini, Claude, GPT, or your own – can connect, authenticate, and use a complete typed document-editing API: read, write, format, navigate, export PDFs. No SDK to vendor, no schema to maintain on your side, no human in the loop.

This page is the canonical onboarding path for agents and the people who run them. If you (or your model) want a working document editor available as a tool, this is everything you need.

MCP Endpoint

https://agent-doc-edit.com/mcp/sse

Standard Model Context Protocol over Server-Sent Events. JWT bearer-token auth (see Quick start below). Free per-account token budget; no credit card required.

Quick start (one HTTP call)

Register an isolated agent account and receive its API key in a single request. No browser, no email, no human in the loop. Each registered agent is its own user with its own document scope – different agents never see each other's documents.

curl -X POST https://agent-doc-edit.com/api/agents/register \
  -H "Content-Type: application/json" \
  -d '{"name": "my-research-agent"}'

# Response
# {
#   "user_id":        "...",
#   "username":       "agent_AbCdEfGh",
#   "name":           "my-research-agent",
#   "api_key":        "ak_...",          <-- shown ONLY here, store it
#   "api_key_prefix": "ak_AbCdEfGh",
#   "created_at":     "2026-04-25T..."
# }

That's it. Use the api_key as a bearer token against /mcp/sse and the agent has 35 typed tools to read, write, format, paginate, and export documents – fully scoped to its own account.

Two ways to authenticate

Option A – Agent self-registers (recommended for autonomous workflows)

Use POST /api/agents/register as shown above. The agent gets its own user account and its own document namespace. Different agents never collide. Rate limit: 5 registrations per hour per IP. This is the right path for letter pipelines, batch processors, scheduled jobs, multi-agent workflows.

Option B – Use your own human account (for "give my own assistant document editing")

Open /app, sign in, sidebar → "API Keys for Agents" → "+ Create New Key". The key is shown once. Use it as a bearer token. The agent shares your account, your documents, and your active-document state. Useful when you want a co-pilot agent to operate alongside you on a single corpus.

What gets billed (and what doesn't)

Agents bring their own LLM. You pay your model provider for reasoning tokens. We do not see those, do not charge for those, do not throttle on those. Our service hosts the MCP server, the document storage, and the rendering pipeline. The token_limit column on agent accounts is set to 0 as a defensive belt: if any future code path ever attempted to run our internal Gemini agent on agent-account auth, it would refuse – agents stay strictly on the MCP-tool path.

Important: this is autonomous, not collaborative

This path is built for autonomous agent workflows – your agent reasons with its own LLM, calls our MCP tools directly, edits documents on its own account, and exports a result. The same battle-tested tool surface that our voice and text agents use in production powers your agent – but your agent never talks to ours. There is no AI-to-AI hop, no internal LLM call on your behalf, no shared session with our in-browser editor.

If you want a human and our voice/text agent to co-edit live, use /app directly – that's a different path. If you want your own agent to operate the editor without a human, the MCP endpoint described here is the right surface.

Connect from any MCP client

Python (mcp client / Anthropic / Google ADK)

from mcp.client.sse import sse_client
from mcp import ClientSession
import json

AGENTDOC_TOKEN = "ak_..."  # from POST /api/agents/register

async def edit_document():
    headers = {"Authorization": f"Bearer {AGENTDOC_TOKEN}"}
    async with sse_client("https://agent-doc-edit.com/mcp/sse",
                          headers=headers) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            # Workflow T (~35 tools) is applied automatically. Token is
            # injected from the Authorization header – do NOT pass `token`
            # in tool arguments.
            tools = await session.list_tools()

            # Create a document
            res = await session.call_tool("create_document",
                                          {"title": "My Report"})
            payload = json.loads(res.content[0].text)
            doc_id = payload["doc_id"]   # structured field, no regex

            # Insert content
            await session.call_tool("insert_string", {
                "doc_id": doc_id,
                "text":   "# Hello\n\nFirst paragraph.",
                "index":  0,
            })

            # Trigger PDF; response includes a self-describing fetch URL
            res = await session.call_tool("trigger_pdf_download",
                                          {"doc_id": doc_id})
            pdf_meta = json.loads(res.content[0].text)
            print(pdf_meta["pdf_url"])  # → "/api/doc//pdf"

TypeScript (Anthropic SDK)

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { SSEClientTransport } from "@modelcontextprotocol/sdk/client/sse.js";

const transport = new SSEClientTransport(
  new URL("https://agent-doc-edit.com/mcp/sse"),
  { requestInit: { headers: { Authorization: `Bearer ${TOKEN}` } } }
);
const client = new Client({ name: "my-agent", version: "1.0" }, { capabilities: {} });
await client.connect(transport);

const tools = await client.listTools();
const result = await client.callTool({
  name: "insert_string",
  arguments: { text: "Hello from my agent.", index: 0 }
});

curl (raw exploration)

curl -N -H "Authorization: Bearer $TOKEN" \
     -H "Accept: text/event-stream" \
     https://agent-doc-edit.com/mcp/sse

Tool catalogue – Workflow T is applied automatically

External agents (i.e. requests authenticated with an ak_* API key) are automatically restricted to the Workflow T tool surface – the Pareto-optimal production default our own voice and text agents use. You don't apply this filter; the MCP server applies it server-side on both tools/list and tools/call. This gives you the curated ~35-tool subset (typed primitives + macros + observing feedback on index shifts), removes the scratchpad and FSM-intent tools that don't belong in T, and excludes the atomic "exploded" variants only used by our tool-bloat benchmark. Each tool has a JSON schema for arguments and returns a structured response with explicit success/error markers; index-shifting operations include observing feedback ("observation": "INDEX SHIFT – re-read before next mutation") so the agent stays grounded between turns.

get_document_context

Returns raw Markdown + rendered HTML in one call. Primary read tool.

find

Regex-powered search. Returns all matches with [start, end) indices and 150-char context.

insert_string / delete_substring

Index-based text mutations. Header/footer variants for isolated areas.

replace_substring

Atomic delete + insert. Avoids index drift between two separate calls.

format_text

15 colors, 12 fonts, 7 sizes, bold/italic/underline/strike/sub/sup, alignment, indentation, links.

format_table

Border style/color/width, backgrounds, alignment, column widths, padding, striping.

macro_replace_all / macro_format_all_matches

Atomic bulk operations. Process matches in reverse index order to prevent drift.

insert_page_break / delete_page_break / find_page_breaks

Page break primitives – invisible DOM markers, not character substrings.

generate_table_of_contents

Auto-injects a hyperlinked TOC at a given index based on existing heading structure.

create_document / rename_document / set_active_document / list_documents

Document management. The agent's session is auto-routed to the active document.

navigate_to_page / set_page_layout

Page navigation, margin and page-size adjustment.

trigger_pdf_download

Emits a PDF-export event the user (or downstream agent) can collect.

Seed a document from an existing DOCX

The MCP tool surface lets your agent build documents from scratch. For workflows that start from a pre-authored Word file – corporate letterhead templates, contract boilerplate, an incoming draft to revise – an additional one-shot HTTP endpoint accepts .docx uploads, creates a fresh Document on the agent's account, and switches it to active so the next MCP call lands on the imported content:

# Upload a .docx; response is the new {id, title, ...}
curl -X POST https://agent-doc-edit.com/api/docs/import/docx \
  -H "Authorization: Bearer $API_KEY" \
  -F "[email protected]" \
  -F "title=Q3 Customer Letter"

Page breaks, hyperlinks, headers / footers, fonts, colours and line spacing all survive the import. Full technical write-up: DOCX Import – Round-Tripping Word Documents.

Use cases your agent can take on autonomously

Letter generation pipeline. Agent receives a structured event ("apologise to customer X for delay Y"), drafts the letter, formats it, exports a PDF, attaches it to an outgoing email.
Report generator. Agent ingests a CSV, summarises findings, structures the report into headings, inserts a table-of-contents, formats key numbers in colour, exports.
Document refactor. Agent reads an existing draft, reorganises paragraphs, applies consistent heading hierarchy, fixes inconsistent terminology with a single macro_replace_all sweep, exports.
Multi-agent workflows. A planning agent decides what to write; a writing agent calls the AgentDoc tools to produce the artefact; a verification agent reads the result and triggers revisions.
Voice handoff. A voice agent takes spoken instructions from the user, hands off structured task descriptions to a text agent that performs the actual document operations against the same MCP server.
Template-filling. Agent uploads a corporate letterhead .docx via /api/docs/import/docx, fills in placeholder fields using macro_replace_text, exports the result – the user's collaborator opens a Word file in the same Word they started with.

What makes this agent-friendly (specifically)

Typed tools, not free-text prompts. Every operation is a JSON-schema-validated tool. The agent cannot "almost" call a tool – arguments parse or they don't.
Structured tool returns. create_document returns {"status":"success","doc_id":N,"title":...} – no regex over prose. trigger_pdf_download returns a self-describing {"pdf_url":"/api/doc/N/pdf","method":"GET"} so a single follow-up HTTP GET fetches the bytes.
Token auto-injection. The bearer token from your Authorization header is injected into every tool call automatically – your tool arguments stay free of auth. (The internal voice/text agent uses an older convention with explicit token args; that path remains supported for backwards compat.)
Stand-off formatting. The agent never writes raw HTML. Formatting is a typed call (format_text(..., format_type="color", format_value="blue")) – the surface that's easiest to hallucinate is removed.
Integer document IDs. Auto-incrementing integers, not UUIDs – eliminates character-drop hallucinations when the agent passes an ID between tool calls.
Observing feedback on every mutation. Returns include explicit observations when indices shift, so the agent doesn't need to maintain a mental model of cumulative offsets.
Real-time visual mirror. The same document the agent is editing is rendered live at /app. Useful for human verification, demos, and multi-modal handoffs.
Voice-and-text symmetry. The voice agent and text agent see the exact same tool surface. If your agent works as a text client, it works as a voice client too.

Limits and honest constraints

LLM billing is on you. Agents bring their own model. You pay your provider for reasoning tokens. Our service hosts the MCP server and document storage; we do not bill for LLM usage and do not see it.
Registration rate limit. 5 agent self-registrations per hour per IP. Designed for legitimate provisioning, not bulk abuse.
Tool-call rate. Per-account soft limit on the proxy layer. No aggressive throttling today, but unbounded loops will eventually hit nginx-level caps.
Document size. Practical sweet spot is 1–50 pages (A4). Very large documents (200+ pages) work but agent reasoning slows because the read tool returns the full state.
Account isolation. Each registered agent is its own user. Documents are scoped to that user only – agents never read or write across accounts. To share documents between agents intentionally, share the API key (Option B above).
Service availability. This is a research-grade public service, not a production SLA. We aim for high uptime but offer no formal guarantee.

Discoverability metadata

llms.txt – a plain-text site map for LLM crawlers at /llms.txt.
OpenGraph / JSON-LD – every page exposes WebAPI / TechArticle / FAQPage schema where appropriate.
Stable URLs – /agents is canonical and won't move.

Try it now

Open the editor in one tab, run your agent in another. The agent's edits appear in real time on the same screen – useful for debugging, demos, or running a human + agent collaboration.

Open the Editor →

Engineering write-ups

Tool Granularity in LLM Agents – design principles for the MCP tool surface your agent will be calling.
Rebuilding PDF + DOCX Export – how the export endpoints reproduce the on-screen layout faithfully when an agent triggers a download.
April 2026 Patch Notes – recent fixes to renderer / decoration / toggle semantics agents rely on.