🎤 Speak, don't type

Write letters and documents by voice.

AgentDoc writes, formats, and structures your document live as you talk β€” on your phone or your laptop. No typing, no menus.

Sign up & dictate β†’ See how it works

Made for people who'd rather speak.

Whether you're commuting, dictating a draft on your phone, or simply faster at talking than typing β€” AgentDoc is built around the way you already think.

🎤

Voice-First Writers

Anyone who'd rather speak than type. Dictate a cover letter on the bus, draft a report between meetings, or write a long email without unlocking the keyboard. The agent listens, drafts, and formats live.

  • Full voice control β€” no clicking, no menus
  • Real-time formatting while you speak
  • Works on phone, tablet, and laptop
  • Export to .docx or PDF when you're done
♿️

Accessibility Use-Cases

Designed for people with motor disabilities, repetitive strain injuries, or anyone for whom a mouse-and-keyboard interface is tiring or impossible. Every operation is reachable by voice alone.

  • No clicking, dragging, or typing required
  • Spoken feedback after every change
  • Screen-reader-friendly DOM structure
  • Free, web-based, no install
🤖

For Developers: an Agent-First API

The same backend exposes every document operation as a typed MCP tool, so autonomous LLM agents can read, write, format, and navigate documents with zero human intervention. See /agents and /developers.

  • Full MCP tool suite (read, insert, delete, format)
  • Real-time WebSocket sync after every mutation
  • Benchmarking harness for evaluating agent accuracy

Everything you need, nothing you don't

A focused, carefully designed toolset – built around the primitives that matter most to agents and to voice-first users.

🎤

Native Voice Control

Powered by Google Gemini Live. Speak naturally – the agent understands context, remembers previous edits, and confirms every action out loud.

🤖

Agent-First Architecture

Every operation is exposed as an MCP tool. AI agents can autonomously create, edit, and navigate documents via a standardized protocol – no hacks required.

🌟

Rich Text Formatting

Colors, fonts, sizes, highlights, bold, italic, subscript, superscript, indentation – all applied with natural language, no toolbar needed.

📄

A4 Pagination, PDF & Word

Automatic A4 pagination. Export pixel-perfect PDFs or native Word (.docx) on demand. Import existing Word documents too – fonts, colours, page breaks, headers and footers all preserved.

Real-Time Sync

Dual WebSocket architecture ensures the display updates the instant the agent mutates the document – no polling, no refresh required.

🔒

Secure Multi-User

JWT-based authentication, per-user document isolation, DOMPurify sanitization, and Cloudflare edge protection – production-hardened from day one.

Three steps to a completed document

Whether you type, speak, or run an automated agent – the flow is always the same simple loop.

1

Say or type your intent

Open the chat panel or press the microphone. Describe what you want in plain language: "Create a heading called Introduction" or "Make the second paragraph italic."

2

The agent executes the tools

The AI agent translates your intent into precise MCP tool calls – finding the right character indices, inserting or deleting strings, and applying formatting tokens.

3

Your document updates instantly

The backend publishes a WebSocket event. The editor re-renders the paginated A4 view in real time. The agent confirms what it did – in voice or text.

Built as a research platform

AgentDoc is the empirical testbed for an ongoing thesis evaluating how tool design affects AI agent reliability.

Agent-Driven Voice-Only Interfaces

This project underpins a scientific thesis evaluating tool granularity, tool bloat, and workflow constraints on LLM agents operating in a document editing context. The benchmarking harness measures agent accuracy via Levenshtein distance, token consumption, and hallucination rates across 20 controlled workflow configurations (A–T) and 13 benchmark scenarios.

Gemini 3 Flash MCP / FastMCP ReAct FSM Tool Bloat Levenshtein Distance Index Drift
Try the editor β†’

Engineering notes & release patches

Short, dense write-ups about how AgentDoc is built – architecture decisions, benchmarks, and the bugs we ship fixes for.

All posts β†’

Ready to edit without a mouse or keyboard?

Open the editor and speak your first instruction. Your document will respond.

Open the Editor β†’