Log inSign up
Together AI
2,799 posts
Image
user avatar
Together AI
@togethercompute
Accelerate inference, model shaping, and pre-training on a research-optimized platform.
San Francisco, CA
together.ai
Joined November 2022
375
Following
56.5K
Followers
  • Pinned
    user avatar
    Together AI
    @togethercompute
    Jun 17
    Closed-source models aren't worth the premium. We generated 12 landing pages with Kimi K2.7 Code and Claude Fable 5. Kimi came in 16x cheaper with comparable quality, especially once we gave it visual context through a design MCP server. Open-source models are already a
    user avatar
    Hassan
    Together AI
    @nutlope
    Jun 17
    Article cover image
    Article
    Kimi K2.7 Code vs Claude Fable 5: Landing pages that cost 94% less
    We ran an experiment where we had Kimi K2.7 Code and Claude Fable 5 each produce 12 landing pages for a side‑by‑side comparison. Overall, Kimi K2.7 Code cost about 94% less (16x less) than Fable 5 and...
    18K
  • user avatar
    Together AI
    @togethercompute
    18h
    This is what open-model tokenomics look like in production. When teams are running billions of tokens, small differences in caching, throughput, and serving efficiency become product-level economics. MiniMax M3 on Together AI is a strong example: frontier-adjacent quality,
    user avatar
    Julian Pscheid
    @JulianPscheid
    Jun 17
    M3 by @MiniMax_AI is the best value in AI. At @HedyAI_ we run close to a billion tokens through the model each day, and @togethercompute's input caching brings our cost down to $0.128/million input tokens. For a model that is close to the frontier in intelligence and the
    Image
    5.8K
  • user avatar
    Together AI
    @togethercompute
    21h
    🤝
    user avatar
    ollama
    @ollama
    Jun 17
    let's go open models! ❤️
    2K
  • user avatar
    Together AI
    @togethercompute
    23h
    Open models are what make collective agent intelligence possible. James Zou from Together AI and Venkat Srinivasan from NVIDIA are joining us July 1 at AI Engineer World's Fair to dig into exactly that.
    Image
    1.5K
    user avatar
    Together AI
    @togethercompute
    23h
    Link to attend:
    Image
    What open agents can actually do: a happy hour with Together AI & NVIDIA · Luma
    From luma.com
    956
  • Together AI reposted
    user avatar
    Hassan
    Together AI
    @nutlope
    Jun 18
    Replying to @nutlope
    GLM 5.2 is available now on @togethercompute! Very fast speeds (200+ tps), try it out & let me know what you think! Video is not sped up. api.together.ai/playground/zai…
    Image
    00:00
    20K
  • user avatar
    Together AI
    @togethercompute
    Jun 18
    Introducing GLM-5.2 from @Zai_org, Z.ai’s latest flagship open model for long-horizon tasks with 1M context, flexible thinking effort, and stronger agentic coding. Now available on Together AI, GLM-5.2 runs on research-powered inference for long-context,
    Image
    9.2K
    user avatar
    Together AI
    @togethercompute
    Jun 18
    Highlights: 👉 1M context built to sustain long-horizon work 👉 Stronger coding with flexible effort levels to balance latency and depth 👉 Improved architecture with IndexShare, reducing per-token FLOPs 2.9x at 1M context 👉 MIT-licensed open weights for broad technical access
    1.3K
    user avatar
    Together AI
    @togethercompute
    Jun 18
    GLM-5.2 is live on Together AI. Try it now:
    Image
    GLM-5.2 API | Together AI
    From together.ai
    1.2K
  • Together AI reposted
    user avatar
    Vipul Ved Prakash
    Together AI
    @vipulved
    Jun 17
    Just added gobs of H100s, H200s and B200s on our on-demand compute platform.
    Image
    NVIDIA GPU Clusters: H100, H200, B200, GB200 | Together AI
    From together.ai
    2.4K
  • Together AI reposted
    user avatar
    Decagon
    @DecagonAI
    Jun 16
    🤝
    user avatar
    Together AI
    @togethercompute
    Jun 16
    Article cover image
    Article
    How Decagon Engineered Sub-Second Voice AI with Together AI
    The challenge with running voice agents Voice latency is audible. Decagon’s leadership frames voice as the most demanding surface because latency is immediately perceptible. Long pauses create awkward...
    2.9K
  • user avatar
    Together AI
    @togethercompute
    Jun 16
    We tested closed and open models by asking them to build small, playable games. Open models were much cheaper and faster, while producing games that were often close in quality. → Opus 4.8 was 15x more expensive than MiniMax M3 → GPT-5.5 was 10x more expensive than Nemotron
    Image
    user avatar
    Hassan
    Together AI
    00:00
    user avatar
    Hassan
    Together AI
    7.9K
  • Together AI reposted
    user avatar
    Hassan
    Together AI
    @nutlope
    Jun 16
    Built a visual benchmark where I asked closed and open source models to build small games. Main takeaway: OSS models were a lot faster, cheaper, & produced games with similar quality. Specifically: * Opus 4.8 was 15x more expensive than MiniMax M3 * GPT-5.5 was 10x more
    Image
    00:00
    10K
  • user avatar
    Together AI
    @togethercompute
    Jun 16
    .@DecagonAI cut voice agent cost per turn nearly 6x with Together AI. They moved from closed models to fine-tuned open models, while keeping latency low enough for real-time voice: → <400ms p95 model latency per turn → custom speculators and prompt caching → optimized
    user avatar
    Together AI
    @togethercompute
    Jun 16
    Article cover image
    Article
    How Decagon Engineered Sub-Second Voice AI with Together AI
    The challenge with running voice agents Voice latency is audible. Decagon’s leadership frames voice as the most demanding surface because latency is immediately perceptible. Long pauses create awkward...
    5K
  • user avatar
    Together AI
    @togethercompute
    Jun 16
    Article cover image
    Article
    How Decagon Engineered Sub-Second Voice AI with Together AI
    The challenge with running voice agents Voice latency is audible. Decagon’s leadership frames voice as the most demanding surface because latency is immediately perceptible. Long pauses create awkward...
    7.8K
  • user avatar
    Together AI
    @togethercompute
    Jun 15
    Optimizing GLM 5.1 came down to three things: -> Rewrote the indexer topk kernel -> Fused the indexer kernel to reduce memory and launch overhead -> Eliminated CPU overhead that was gating prefill throughput The bigger win was in the indexer. Once we fixed that, the rest made
    Image
    15K
    user avatar
    Together AI
    @togethercompute
    Jun 15
    Try GLM 5.1 today:
    Image
    GLM-5.1 API | Together AI
    From together.ai
    1.9K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up