Hrishi (@hrishioa) / X

Hrishi

4,189 posts

Hrishi

@hrishioa

Trying to build systems of lasting value at Southbridge.ai. Previously CTO, Greywing (YC W21). Chop wood carry water.

Long form thoughts 🫱

Joined June 2013

Pinned
Hrishi
@hrishioa
Mar 20
I wasn't sure if we were going to share this, because knowing what doesn't work is often more valuable than seeing what worked. That - and being nervous about sharing your failures. Here's a technical retrospective on our 2025:
RISC won: Building towards Data AGI
From southbridge.ai
4.3K
Hrishi
@hrishioa
Jul 12, 2025
Kimi K2 is genuinely impressive. On the same tasks and the same agentic harness, one on one beats Grok 4. Also does it without CoT or thinking tokens looks like.
GitHub - MoonshotAI/Kimi-K2: Kimi K2 is the large language model series developed by Moonshot AI...
From github.com
278K
Hrishi
@hrishioa
Jul 13, 2025
Kimi is the real deal. Unless it's really Sonnet in a trench coat, this is the best agentic open-source model I've tested - BY A MILE. Here's a slice* of a 4 HOUR run (~1 second per minute) with not much more than 'keep going' from me every 90 minutes or so. The task involved
00:00
211K
Hrishi
@hrishioa
Dec 11, 2023
A week ago one of our customers handed us 1000 pages of this (10,000 more to come), and asked us for RAG solution. We said yes - because we said yes before we saw the document. But we've solved it - and there's a chance it's a strong improvement on all RAG SoTA.
455K
Hrishi
@hrishioa
Dec 13, 2023
No more waiting - we finally we have a demo for multi-modal, 'walking' RAG! Still blows my mind - this is an AI that's reading complex diagrams in a document like a human, 'looking' at pages, then 'walking' to more relevant pages until it's found an answer. More details below
00:00
392K
Hrishi
@hrishioa
Dec 10, 2023
How many years until open source models take over? Is it never? I've been manually testing 20-30 different models all claiming impressive scores on benchmarks against OpenAI and Anthropic. What I've found: * Super tiny models are becoming insanely good * Medium models are
470K
Hrishi
@hrishioa
Dec 29, 2023
How exactly do Language Models perceive time? This is one of the best papers I've read this year (from Kai Nylund, @ssgrn, @nlpnoah), and here's what it suggests (IMO) 👇
222K
Hrishi
@hrishioa
Nov 28, 2023
This is genuinely blowing my mind - four years of everything we've done at Greywing, finished in 60 seconds The rest is just me fooling around. Before you ask it's not the Assistants API - that's why we have interactive charts, abort, <200ms latency.
00:00
453K
Hrishi
@hrishioa
Jun 1, 2025
I decompiled Claude Code from just the minified code. Took me 8-10 hours, multiple subagents, and every flagship model from every provider. Holy shit there's a lot in there. Claude Code is NOT just Claude in a loop - there's so much to learn from.
Claude Code: An analysis | Notion
From southbridge-research.notion.site
130K
Hrishi
@hrishioa
Jan 19, 2024
This is scary - ETL pipelines and ORMs are likely going away - or at least I shouldn't be getting paid for doing them anymore. This is AI generating thousands of lines of typespecs and DDLs (with no more context than the dataset), and somehow it's all 100% correct. Rant?👇
00:00
166K
Hrishi
@hrishioa
Mar 24, 2024
github.com/hrishioa/lumen… leave this here
142K
Hrishi
@hrishioa
Jun 21, 2025
Another way to make Claude Code a 10x engineer for a complex change: 1. Make a plan for the change (if you need it) with Gemini. 2. Open a new branch. 3. Ask Claude to implement the change and maintain a scratchpad.md that is an APPEND-ONLY log with gotchas, judgement
140K
Hrishi
@hrishioa
Nov 3, 2023
Mindblown - this is a 7b local model with 128K context combining Metamorphosis with The Last Question to write a new story using just 10 GB of RAM Even two months ago this would be unfathomable. Next is to try 20k tokens of SQL DDLs, for complex data (Model below)
00:00
134K
Hrishi
@hrishioa
Oct 15, 2024
Turns out I was wrong. Gemini is 30x cheaper for transcription (same quality) if you prompt right and segment to stay under 128k. So how good is it? It's crazy for clean audio (source+code in 🧵) AssemblyAI: 92.06% ($0.21) Flash-002: 92.68% ($0.00679) 🤯 Let me say more 👇
137K