Extract PDF text in your browser with LiteParse for the web

· ai web · Source ↗

TLDR

  • Simon Willison ported LlamaIndex’s LiteParse PDF-to-text CLI into a pure browser app in 59 minutes using Claude Code and Opus 4.7.

Key Takeaways

  • LiteParse uses “spatial text parsing” heuristics to handle multi-column PDF layouts without any AI models, falling back to Tesseract.js OCR for image-based PDFs.
  • The browser port runs entirely client-side on PDF.js and Tesseract.js; no data leaves the browser and no network requests are made during parsing.
  • Willison used Claude Code with a plan-first workflow: wrote notes.md from initial research, generated plan.md before coding, then ran build it and queued follow-up prompts.
  • Cross-browser bugs (Safari ReadableStream failure) were caught via Playwright TDD and fixed without Willison reviewing any of the HTML or TypeScript directly.
  • The app is deployed via GitHub Pages using a Vite build step configured by Claude Code; CI runs tests on every push before deploying.

Why It Matters

  • A working, privacy-safe, zero-cost browser PDF parser now exists that any developer can fork or embed without a server or API key.
  • The build log is a concrete benchmark: one developer, one 59-minute Claude Code session, red/green TDD, small commits, cross-browser QA, and a deployed GitHub Pages app.
  • Willison distinguishes “vibe coding” by whether the developer reviews the output, not whether AI wrote it; he argues low blast-radius static tools tolerate the tradeoff better than server-side code.

Simon Willison, Simon Willison’s Weblog · 2026-04-23 · Read the original