As AI becomes more capable, the question is not only who builds the best models.
It is who gets to build them at all.
A founder’s view from @eisokant on where Poolside stands today.
another banger from @pupposandro and the @luceboxai team
Luce Spark runs Laguna XS.2 in 14.6 GiB at ~100 tok/s on an RTX 3090, versus ~119 tok/s fully resident.
you can now run Laguna below the 16 GiB line and use it for local evals, agent traces, routing analysis,
Excited to launch Luce Spark: now a 35B MoE runs on a 16GB GPU, with no offload tax.
An A3B model fires ~8 of its 256 experts per token, but to keep it resident you pay VRAM for all 256. Spark pins the experts your traffic actually hits, offloads the rest to CPU, and decodes the
Just finished reading the latest technical report released by @poolsideai for Laguna. It is so well written and information-dense, covering all stages of a large-scale training run. Each decision and assumption was clearly explained and concisely referenced.
The Model Factory
Love seeing the work @RedHat_AI and @vllm_project are doing to make Laguna XS.2 easier to run.
Red Hat AI trained a DFlash speculator: a 0.6B drafter that predicts 8 tokens per pass, with Laguna verifying the output.
So builders get faster generation without changing output
Laguna XS.2 from @poolsideai is a 33B MoE built for agentic coding.
Red Hat AI trained a DFlash speculator for it: 0.6B drafter, 8 tokens per pass, no quality loss.
FP8, NVFP4, and INT4 checkpoints via LLM Compressor.
Models in comments. Speedup with @vllm_project:
Super comprehensive writeup that covers many frameworks & case studies on async RL. I learned a lot from the discussion of adding bias to the objective and how techniques that introduce bias (e.g., TIS + CISPO) help stabilize smaller batches but scale more poorly.
New blog! Is frontier asynchronous RL solved?
The blog covers Async RL theory and infrastructure, surveying 8 open-weight frontier labs for the algorithmic techniques and systems fixes to handle train-inference mismatch. Also answered: why do current methods still fail at high
this week I was at the @poolsideai talk hosted by @CrusoeAI and heard @varunrandery discuss what he calls the "agent API."
tldr; we stop sending text and getting text back, and start sending a unit of work and getting the finished result back, technically it's clean, and I liked
What a weekend. Around 30 teams showed up to build on Laguna XS.2, and the bar was very, very high.
Winners below 🏆
1st: Overthinking Machines Labs
@emilfristed
Pseudo-full-duplex with text-only models through dialogue modeling with silence tokens.
huggingface.co/spaces/poolsid…
Loving the @latentspacepod breakdown of our Laguna M.1/XS.2 Technical Report! The Latent Space paper club just did a deep dive, and their takeaways perfectly capture what we set out to build with our Model Factory. A few quotes from the video 🧵👇 (1/6)