LlamaIndex 🦙 (@llama

LlamaIndex 🦙

3,917 posts

LlamaIndex 🦙

@llama_index

The world's best AI Document OCR LlamaParse: cloud.llamaindex.ai Docs: developers.llamaindex.ai/python/cloud/

Joined December 2022

LlamaIndex 🦙
@llama_index
16h
We built LiteParse, the fastest document parsing solution on the planet and made it open source. And it just hit 10k github stars. 🦙 Fast to run. Fast to love. Thanks for building with us. If you haven't tried it already, repo at: github.com/run-llama/lite…
2.8K
LlamaIndex 🦙
@llama_index
Jun 22
Bring your hot takes and your drop shots.🔥 Next week, we're co-hosting The Agent Open: an afternoon of pickleball, food, drinks, and industry-leading speakers who ace their code and commit to their serves. 🏓 Catch our co-founder and CEO, Jerry Liu after Day 2 of AIE World's
The Agent Open: AI's pickleball tournament · Luma
From luma.com
6.7K
LlamaIndex 🦙 reposted
Jerry Liu
@jerryjliu0
Jun 21
We parsed this SpaceX equity research PDF faster than the time it took for Screen Studio to zoom in ⚡️🔥 liteparse is now the best open-source document parsing tool out there. There’s no reason to not use it as a first pass, even if you do have docs that require heavier VLM
00:00
00:26
Jerry Liu
@jerryjliu0
Jun 18
We built the fastest PDF -> markdown parser in the world 🚀⚡️ AND it’s more accurate than any other open-source, model-free parser (pymupdf4llm, opendataloader, pdf-inspector, markitdown) on 3 standardized benchmarks: olmOCR0-bench, opendataloader-bench, ParseBench Introducing
42K
LlamaIndex 🦙 reposted
Jerry Liu
@jerryjliu0
Jun 20
As agents are generating more and more documents, they need a better agent-native document format 🤖📄 So far the two main containers are markdown and HTML: 1️⃣ Markdown: Easily readable/reviewable by humans, but lacks rich visual output/interactivity 2️⃣ HTML: Providers richer
17K
LlamaIndex 🦙 reposted
Jerry Liu
@jerryjliu0
Jun 19
It's kind of crazy how well LiteParse does on markdown document parsing even compared against frontier VLMs - when it doesn't use VLMs or any AI/OCR models at all. It's pure code. On ParseBench, it outperforms Qwen 3.5-9B / GLM-OCR. There's still a gap vs. models like Gemma 4
00:26
Jerry Liu
@jerryjliu0
Jun 18
We built the fastest PDF -> markdown parser in the world 🚀⚡️ AND it’s more accurate than any other open-source, model-free parser (pymupdf4llm, opendataloader, pdf-inspector, markitdown) on 3 standardized benchmarks: olmOCR0-bench, opendataloader-bench, ParseBench Introducing
20K
LlamaIndex 🦙 reposted
Jerry Liu
@jerryjliu0
Jun 18
We built the fastest PDF -> markdown parser in the world 🚀⚡️ AND it’s more accurate than any other open-source, model-free parser (pymupdf4llm, opendataloader, pdf-inspector, markitdown) on 3 standardized benchmarks: olmOCR0-bench, opendataloader-bench, ParseBench Introducing
00:00
00:26
LlamaIndex 🦙
@llama_index
Jun 18
LiteParse v2.1 is here, and its bringing the fastest markdown output possible. In this release, we are fulfilling our top request: markdown output. But in the spirit of "lite"-ness, we are doing this completely LLM-free and fast. Not only is it fast, it also beats all other
319K
LlamaIndex 🦙
@llama_index
Jun 18
LiteParse v2.1 is here, and its bringing the fastest markdown output possible. In this release, we are fulfilling our top request: markdown output. But in the spirit of "lite"-ness, we are doing this completely LLM-free and fast. Not only is it fast, it also beats all other
00:00
263K
LlamaIndex 🦙
@llama_index
Jun 18
~90% of enterprise data is unstructured, locked in the documents that power the majority of knowledge work. The next massive frontier? AI agents that can deeply understand, reason over, and edit these files at scale to automate entire workflows. Our CEO, Jerry Liu is speaking at
5.9K
LlamaIndex 🦙
@llama_index
Jun 17
Vector databases or pure grep? Teams are split on the right retrieval architecture for agents. ⁣ ⁣ The reality? You need both. Semantic search for a fast first pass; grep and file reads for surgical precision when top-k chunks cut off mid-answer. ⁣ ⁣ On June 29, our Head of
13K
LlamaIndex 🦙
@llama_index
Jun 17
Our CEO Jerry Liu is joining founders from LangChain, CrewAI, and others at @databricks #DataAISummit today for a panel on the The Agentic Stack — what the stack looks like, where it's headed, and what happens when agents become the primary consumers of infrastructure, not
2.9K
LlamaIndex 🦙
@llama_index
Jun 16
How much can good documentation save an AI agent in cost and time? Turns out, a lot. We built a custom skill that teaches Claude how to parse PDFs more efficiently, then used real usage traces to find where it was wasting time and money (re-reading the same file over and over,
13K
LlamaIndex 🦙
@llama_index
Jun 15
Contracts are where business commitments live, but most organizations still manage them manually, searching PDFs for renewal dates, chasing down payment terms, and hoping nothing slips through the cracks. The problem isn't just volume. Legacy OCR treats contracts like flat text,
11K
LlamaIndex 🦙
@llama_index
Jun 12
We're headed back to @databricks #DataAISummit to parse your PDFs next week 🦙 Catch our co-founder & CEO @jerryjliu0 twice: 📄 Automating Document Work with Long-Horizon AI Agents — databricks.com/dataaisummit/s… 🧱 The Agentic Stack: founder panel with LangChain, CrewAI, Agno +
4.5K