A custom parser, not a wrapper
A from-scratch Rust PDF engine with its own tokenizer, font decoding, and CMap/ToUnicode handling — no pdfium, no poppler, no native bindings.
Dongler is a from-scratch Rust engine that turns PDFs and 15+ formats into clean Markdown, LaTeX, and typed JSON — locally, in milliseconds, from Python, TypeScript, Rust, or the CLI.
import dongler
doc = dongler.load("report.pdf")
print(doc.to_markdown())pip install donglerOne API, identical output across every binding.
Load a path once. The engine parses, lays out, and structures the document, then renders it in the format your pipeline needs.
PDF and 15+ formats — born-digital or messy.
Parse · font metrics · reading order · tables.
Markdown, LaTeX, or a typed JSON document.
No cloud, no OCR fallback by default, no third-party PDF runtime — just a purpose-built Rust core measured against real benchmarks.
A from-scratch Rust PDF engine with its own tokenizer, font decoding, and CMap/ToUnicode handling — no pdfium, no poppler, no native bindings.
Glyph boxes derived from real font ascent/descent and the text matrix, rotation-aware, so geometry stays tight under scaling and /Rotate.
Multi-column layouts are detected and re-sequenced into natural reading order instead of raw stream order.
Ruled, aligned, and implied tables are recovered into a real cell grid — including merged column headers.
Accuracy is measured with TEDS, GriTS, CER/WER, edit-similarity, and bbox IoU across a 1,400-PDF benchmark suite.
No hosted service, API key, or model dependency in the default path. The same input always yields the same structured output.
The Python and TypeScript packages are thin wrappers over the Rust core, so every binding returns the identical document model.
pip install donglernpm install @cristianexer/donglercargo add dongler-corecargo install donglerInstall, point it at a document, and read structured output back.