We built LiteParse, the fastest document parsing solution on the planet and made it open source.
And it just hit 10k github stars. 🦙
Fast to run. Fast to love.
Thanks for building with us. If you haven't tried it already, repo at: github.com/run-llama/lite…
LlamaIndex 🦙
3,917 posts
The world's best AI Document OCR
LlamaParse: cloud.llamaindex.ai
Docs: developers.llamaindex.ai/python/cloud/
Joined December 2022
- Bring your hot takes and your drop shots.🔥 Next week, we're co-hosting The Agent Open: an afternoon of pickleball, food, drinks, and industry-leading speakers who ace their code and commit to their serves. 🏓 Catch our co-founder and CEO, Jerry Liu after Day 2 of AIE World's
- LlamaIndex 🦙 repostedWe parsed this SpaceX equity research PDF faster than the time it took for Screen Studio to zoom in ⚡️🔥 liteparse is now the best open-source document parsing tool out there. There’s no reason to not use it as a first pass, even if you do have docs that require heavier VLM
00:00
00:26We built the fastest PDF -> markdown parser in the world 🚀⚡️ AND it’s more accurate than any other open-source, model-free parser (pymupdf4llm, opendataloader, pdf-inspector, markitdown) on 3 standardized benchmarks: olmOCR0-bench, opendataloader-bench, ParseBench Introducing - LlamaIndex 🦙 repostedAs agents are generating more and more documents, they need a better agent-native document format 🤖📄 So far the two main containers are markdown and HTML: 1️⃣ Markdown: Easily readable/reviewable by humans, but lacks rich visual output/interactivity 2️⃣ HTML: Providers richer
- LlamaIndex 🦙 repostedIt's kind of crazy how well LiteParse does on markdown document parsing even compared against frontier VLMs - when it doesn't use VLMs or any AI/OCR models at all. It's pure code. On ParseBench, it outperforms Qwen 3.5-9B / GLM-OCR. There's still a gap vs. models like Gemma 4
00:26We built the fastest PDF -> markdown parser in the world 🚀⚡️ AND it’s more accurate than any other open-source, model-free parser (pymupdf4llm, opendataloader, pdf-inspector, markitdown) on 3 standardized benchmarks: olmOCR0-bench, opendataloader-bench, ParseBench Introducing - LlamaIndex 🦙 repostedWe built the fastest PDF -> markdown parser in the world 🚀⚡️ AND it’s more accurate than any other open-source, model-free parser (pymupdf4llm, opendataloader, pdf-inspector, markitdown) on 3 standardized benchmarks: olmOCR0-bench, opendataloader-bench, ParseBench Introducing
00:00
00:26LiteParse v2.1 is here, and its bringing the fastest markdown output possible. In this release, we are fulfilling our top request: markdown output. But in the spirit of "lite"-ness, we are doing this completely LLM-free and fast. Not only is it fast, it also beats all other - LiteParse v2.1 is here, and its bringing the fastest markdown output possible. In this release, we are fulfilling our top request: markdown output. But in the spirit of "lite"-ness, we are doing this completely LLM-free and fast. Not only is it fast, it also beats all other
00:00 - ~90% of enterprise data is unstructured, locked in the documents that power the majority of knowledge work. The next massive frontier? AI agents that can deeply understand, reason over, and edit these files at scale to automate entire workflows. Our CEO, Jerry Liu is speaking at
- Vector databases or pure grep? Teams are split on the right retrieval architecture for agents. The reality? You need both. Semantic search for a fast first pass; grep and file reads for surgical precision when top-k chunks cut off mid-answer. On June 29, our Head of
- Our CEO Jerry Liu is joining founders from LangChain, CrewAI, and others at @databricks #DataAISummit today for a panel on the The Agentic Stack — what the stack looks like, where it's headed, and what happens when agents become the primary consumers of infrastructure, not
- How much can good documentation save an AI agent in cost and time? Turns out, a lot. We built a custom skill that teaches Claude how to parse PDFs more efficiently, then used real usage traces to find where it was wasting time and money (re-reading the same file over and over,
- Contracts are where business commitments live, but most organizations still manage them manually, searching PDFs for renewal dates, chasing down payment terms, and hoping nothing slips through the cracks. The problem isn't just volume. Legacy OCR treats contracts like flat text,
- We're headed back to @databricks #DataAISummit to parse your PDFs next week 🦙 Catch our co-founder & CEO @jerryjliu0 twice: 📄 Automating Document Work with Long-Horizon AI Agents — databricks.com/dataaisummit/s… 🧱 The Agentic Stack: founder panel with LangChain, CrewAI, Agno +











