Together AI (@togethercompute) / X

Together AI

2,799 posts

Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CA

Joined November 2022

Pinned
Together AI
@togethercompute
Jun 17
Closed-source models aren't worth the premium. We generated 12 landing pages with Kimi K2.7 Code and Claude Fable 5. Kimi came in 16x cheaper with comparable quality, especially once we gave it visual context through a design MCP server. Open-source models are already a
Hassan
@nutlope
Jun 17
Article
Kimi K2.7 Code vs Claude Fable 5: Landing pages that cost 94% less
We ran an experiment where we had Kimi K2.7 Code and Claude Fable 5 each produce 12 landing pages for a side‑by‑side comparison. Overall, Kimi K2.7 Code cost about 94% less (16x less) than Fable 5 and...
18K
Together AI
@togethercompute
18h
This is what open-model tokenomics look like in production. When teams are running billions of tokens, small differences in caching, throughput, and serving efficiency become product-level economics. MiniMax M3 on Together AI is a strong example: frontier-adjacent quality,
Julian Pscheid
@JulianPscheid
Jun 17
M3 by @MiniMax_AI is the best value in AI. At @HedyAI_ we run close to a billion tokens through the model each day, and @togethercompute's input caching brings our cost down to $0.128/million input tokens. For a model that is close to the frontier in intelligence and the
5.8K
Together AI
@togethercompute
21h
🤝
ollama
@ollama
Jun 17
let's go open models! ❤️
2K
Together AI
@togethercompute
23h
Open models are what make collective agent intelligence possible. James Zou from Together AI and Venkat Srinivasan from NVIDIA are joining us July 1 at AI Engineer World's Fair to dig into exactly that.
1.5K
Together AI
@togethercompute
23h
Link to attend:
What open agents can actually do: a happy hour with Together AI & NVIDIA · Luma
From luma.com
956
Together AI reposted
Hassan
@nutlope
Jun 18
Replying to @nutlope
GLM 5.2 is available now on @togethercompute! Very fast speeds (200+ tps), try it out & let me know what you think! Video is not sped up. api.together.ai/playground/zai…
00:00
20K
Together AI
@togethercompute
Jun 18
Introducing GLM-5.2 from @Zai_org, Z.ai’s latest flagship open model for long-horizon tasks with 1M context, flexible thinking effort, and stronger agentic coding. Now available on Together AI, GLM-5.2 runs on research-powered inference for long-context,
9.2K
Together AI
@togethercompute
Jun 18
Highlights: 👉 1M context built to sustain long-horizon work 👉 Stronger coding with flexible effort levels to balance latency and depth 👉 Improved architecture with IndexShare, reducing per-token FLOPs 2.9x at 1M context 👉 MIT-licensed open weights for broad technical access
1.3K
Together AI
@togethercompute
Jun 18
GLM-5.2 is live on Together AI. Try it now:
GLM-5.2 API | Together AI
From together.ai
1.2K
Together AI reposted
Vipul Ved Prakash
@vipulved
Jun 17
Just added gobs of H100s, H200s and B200s on our on-demand compute platform.
NVIDIA GPU Clusters: H100, H200, B200, GB200 | Together AI
From together.ai
2.4K
Together AI reposted
Decagon
@DecagonAI
Jun 16
🤝
Together AI
@togethercompute
Jun 16
Article
How Decagon Engineered Sub-Second Voice AI with Together AI
The challenge with running voice agents Voice latency is audible. Decagon’s leadership frames voice as the most demanding surface because latency is immediately perceptible. Long pauses create awkward...
2.9K
Together AI
@togethercompute
Jun 16
We tested closed and open models by asking them to build small, playable games. Open models were much cheaper and faster, while producing games that were often close in quality. → Opus 4.8 was 15x more expensive than MiniMax M3 → GPT-5.5 was 10x more expensive than Nemotron
Hassan
00:00
Hassan
7.9K
Together AI reposted
Hassan
@nutlope
Jun 16
Built a visual benchmark where I asked closed and open source models to build small games. Main takeaway: OSS models were a lot faster, cheaper, & produced games with similar quality. Specifically: * Opus 4.8 was 15x more expensive than MiniMax M3 * GPT-5.5 was 10x more
00:00
10K
Together AI
@togethercompute
Jun 16
.@DecagonAI cut voice agent cost per turn nearly 6x with Together AI. They moved from closed models to fine-tuned open models, while keeping latency low enough for real-time voice: → <400ms p95 model latency per turn → custom speculators and prompt caching → optimized
Together AI
@togethercompute
Jun 16
Article
How Decagon Engineered Sub-Second Voice AI with Together AI
The challenge with running voice agents Voice latency is audible. Decagon’s leadership frames voice as the most demanding surface because latency is immediately perceptible. Long pauses create awkward...
5K
Together AI
@togethercompute
Jun 16
Article
How Decagon Engineered Sub-Second Voice AI with Together AI
The challenge with running voice agents Voice latency is audible. Decagon’s leadership frames voice as the most demanding surface because latency is immediately perceptible. Long pauses create awkward...
7.8K
Together AI
@togethercompute
Jun 15
Optimizing GLM 5.1 came down to three things: -> Rewrote the indexer topk kernel -> Fused the indexer kernel to reduce memory and launch overhead -> Eliminated CPU overhead that was gating prefill throughput The bigger win was in the indexer. Once we fixed that, the rest made
15K
Together AI
@togethercompute
Jun 15
Try GLM 5.1 today:
GLM-5.1 API | Together AI
From together.ai
1.9K