Who uses it

Made for people who'd rather speak.

Whether you're commuting, dictating a draft on your phone, or simply faster at talking than typing — AgentDoc is built around the way you already think.

🎤

Voice-First Writers

Anyone who'd rather speak than type. Dictate a cover letter on the bus, draft a report between meetings, or write a long email without unlocking the keyboard. The agent listens, drafts, and formats live.

Full voice control — no clicking, no menus
Real-time formatting while you speak
Works on phone, tablet, and laptop
Export to .docx or PDF when you're done

♿️

Accessibility Use-Cases

Designed for people with motor disabilities, repetitive strain injuries, or anyone for whom a mouse-and-keyboard interface is tiring or impossible. Every operation is reachable by voice alone.

No clicking, dragging, or typing required
Spoken feedback after every change
Screen-reader-friendly DOM structure
Free, web-based, no install

🤖

For Developers: an Agent-First API

The same backend exposes every document operation as a typed MCP tool, so autonomous LLM agents can read, write, format, and navigate documents with zero human intervention. See /agents and /developers.

Full MCP tool suite (read, insert, delete, format)
Real-time WebSocket sync after every mutation
Benchmarking harness for evaluating agent accuracy

Capabilities

Everything you need, nothing you don't

A focused, carefully designed toolset – built around the primitives that matter most to agents and to voice-first users.

🎤

Native Voice Control

Powered by Google Gemini Live. Speak naturally – the agent understands context, remembers previous edits, and confirms every action out loud.

🤖

Agent-First Architecture

Every operation is exposed as an MCP tool. AI agents can autonomously create, edit, and navigate documents via a standardized protocol – no hacks required.

🌟

Rich Text Formatting

Colors, fonts, sizes, highlights, bold, italic, subscript, superscript, indentation – all applied with natural language, no toolbar needed.

📄

A4 Pagination, PDF & Word

Automatic A4 pagination. Export pixel-perfect PDFs or native Word (.docx) on demand. Import existing Word documents too – fonts, colours, page breaks, headers and footers all preserved.

⚡

Real-Time Sync

Dual WebSocket architecture ensures the display updates the instant the agent mutates the document – no polling, no refresh required.

🔒

Secure Multi-User

JWT-based authentication, per-user document isolation, DOMPurify sanitization, and Cloudflare edge protection – production-hardened from day one.

How It Works

Three steps to a completed document

Whether you type, speak, or run an automated agent – the flow is always the same simple loop.

Say or type your intent

Open the chat panel or press the microphone. Describe what you want in plain language: "Create a heading called Introduction" or "Make the second paragraph italic."

The agent executes the tools

The AI agent translates your intent into precise MCP tool calls – finding the right character indices, inserting or deleting strings, and applying formatting tokens.

Your document updates instantly

The backend publishes a WebSocket event. The editor re-renders the paginated A4 view in real time. The agent confirms what it did – in voice or text.

Academic Context

Built as a research platform

AgentDoc is the empirical testbed for an ongoing thesis evaluating how tool design affects AI agent reliability.

Agent-Driven Voice-Only Interfaces

This project underpins a scientific thesis evaluating tool granularity, tool bloat, and workflow constraints on LLM agents operating in a document editing context. The benchmarking harness measures agent accuracy via Levenshtein distance, token consumption, and hallucination rates across 20 controlled workflow configurations (A–T) and 13 benchmark scenarios.

Gemini 3 Flash MCP / FastMCP ReAct FSM Tool Bloat Levenshtein Distance Index Drift

Try the editor →

Write letters and documents by voice.