AgentDoc writes, formats, and structures your document live as you talk β on your phone or your laptop. No typing, no menus.
Whether you're commuting, dictating a draft on your phone, or simply faster at talking than typing β AgentDoc is built around the way you already think.
Anyone who'd rather speak than type. Dictate a cover letter on the bus, draft a report between meetings, or write a long email without unlocking the keyboard. The agent listens, drafts, and formats live.
Designed for people with motor disabilities, repetitive strain injuries, or anyone for whom a mouse-and-keyboard interface is tiring or impossible. Every operation is reachable by voice alone.
The same backend exposes every document operation as a typed MCP tool, so autonomous LLM agents can read, write, format, and navigate documents with zero human intervention. See /agents and /developers.
A focused, carefully designed toolset β built around the primitives that matter most to agents and to voice-first users.
Powered by Google Gemini Live. Speak naturally β the agent understands context, remembers previous edits, and confirms every action out loud.
Every operation is exposed as an MCP tool. AI agents can autonomously create, edit, and navigate documents via a standardized protocol β no hacks required.
Colors, fonts, sizes, highlights, bold, italic, subscript, superscript, indentation β all applied with natural language, no toolbar needed.
Automatic A4 pagination. Export pixel-perfect PDFs or native Word (.docx) on demand. Import existing Word documents too β fonts, colours, page breaks, headers and footers all preserved.
Dual WebSocket architecture ensures the display updates the instant the agent mutates the document β no polling, no refresh required.
JWT-based authentication, per-user document isolation, DOMPurify sanitization, and Cloudflare edge protection β production-hardened from day one.
Whether you type, speak, or run an automated agent β the flow is always the same simple loop.
Open the chat panel or press the microphone. Describe what you want in plain language: "Create a heading called Introduction" or "Make the second paragraph italic."
The AI agent translates your intent into precise MCP tool calls β finding the right character indices, inserting or deleting strings, and applying formatting tokens.
The backend publishes a WebSocket event. The editor re-renders the paginated A4 view in real time. The agent confirms what it did β in voice or text.
AgentDoc is the empirical testbed for an ongoing thesis evaluating how tool design affects AI agent reliability.
This project underpins a scientific thesis evaluating tool granularity, tool bloat, and workflow constraints on LLM agents operating in a document editing context. The benchmarking harness measures agent accuracy via Levenshtein distance, token consumption, and hallucination rates across 20 controlled workflow configurations (AβT) and 13 benchmark scenarios.