Vipul Ved Prakash (@vipulved) / X

Vipul Ved Prakash

2,692 posts

Vipul Ved Prakash

@vipulved

Co-founder, CEO @togethercompute

San Francisco, CA

Joined April 2008

Vipul Ved Prakash
@vipulved
Dec 22, 2023
When it comes to LLMs, 2023 was the year of Open Source AI. At the end of 2022, the quality delta between best open (Bloom) and closed (GPT-3.5) LLM, as measured by MMLU scores, was 90%. At the end of 2023, this delta between GPT-4 and Mixtral-MoE-7B stands at 13%.
89K
Vipul Ved Prakash
@vipulved
Jun 30, 2023
The era of sub-quadratic LLMs is about to begin. At @togethercompute we've been building next gen models with large space state architectures and training them on very long sequences and the results from the recent builds are... incredible. Will share more as we get closer to
80K
Vipul Ved Prakash
@vipulved
Aug 22, 2023
Together.ai API now offers a 32K context model, built with FlashAttention-2 for $0.20 per 1 M tokens. 300x cheaper than closest commercial model at 32K context (GPT-4). Smaller, but for many long context tasks like RAG, it’s excellent. And you can fine tune it.
71K
Vipul Ved Prakash
@vipulved
Jul 3, 2023
Now hearing fairly regularly how well RedPajama-INCITE-7B performs across enterprise use cases. Several companies have replaced OpenAI with it, and we will soon announce a new partner who is deploying solutions in regulated industries based on the model. huggingface.co/togethercomput…
103K
Vipul Ved Prakash
@vipulved
Aug 17, 2023
We just got 1024 A100s up and running at @togethercompute!! We are offering short-term dedicated access to AI startups anywhere from 16-128 GPUs. Clusters come pre-configured with distributed training software. Available immediately (while supplies last) 🚀🚀🚀
66K
Vipul Ved Prakash
@vipulved
Apr 1, 2025
The @togethercompute inference team achieved another performance milestone. Now serving 140 TPS on 671B param R1 model, ~3x faster than Azure, ~5.5x faster than DeepSeek API on @nvidia GPUs. APIs @ api.together.xyz and chat + web search @ chat.together.ai
31K
Vipul Ved Prakash
@vipulved
Jun 14, 2025
.@togethercompute is building 2 gigawatts of AI factories (~100,000 GPUs) in the EU over the next 4 years with the first phase live in H2 '2025. AI compute is at <1% saturation relative to our 2035 forecast and we are starting early to build a large-scale sustainable AI cloud
NVIDIA GPU Clusters: H100, H200, B200, GB200 | Together AI
From together.ai
35K
Vipul Ved Prakash
@vipulved
Dec 12, 2023
OpenAI API compatibility shipped for 100+ models on @togethercompute API. Replace GPT calls with Mixtral or Llama-70B, get faster responses and for less $$ 🚀🚀🚀
Together AI
@togethercompute
Dec 12, 2023
Transitioning from OpenAI to Mixtral? Simply add your TOGETHER_API_KEY, change the base URL to api.together.xyz, and swap the model name. Oh, and Mixtral Instruct v0.1 is now live on Together API 🙌
63K
Vipul Ved Prakash
@vipulved
Nov 22, 2023
The RedPajama-V2 dataset has been downloaded 1.2M times in the last month on @huggingface. It’s a great metric of the level of agency in core AI development today, and how vast the open source (and custom) AI surface is going to be.
togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face
From huggingface.co
59K
Vipul Ved Prakash
@vipulved
Apr 24, 2024
Llama-3 is Linux.
22K
Vipul Ved Prakash
@vipulved
Jul 2, 2025
The first @togethercompute GB200 cluster CDUs imbibing coolant in prep to go live next week! Each rack here is 1.4 exaflops of inference performance!
9.9K
Vipul Ved Prakash
@vipulved
Feb 8, 2025
Rolling out a new inference stack for DeepSeek R1 @togethercompute that gets up to 110 t/s on the 671B parameter model!
23K
Vipul Ved Prakash
@vipulved
Dec 22, 2023
Wow @anyscalecompute is benchmark washing their API’s terrible performance. All you need is curl and time. Same request @togethercompute 3x faster for Llama2 70B model — 72 t/s vs 23 t/s (7.04s vs 21.87s) And this model is under heavy load! Our dedicated instances are
Anyscale
@anyscalecompute
Dec 21, 2023
📈We’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market. Our goal with this leaderboard is to equip users and developers with a clear understanding of the
78K
Vipul Ved Prakash
@vipulved
Jul 18, 2024
We released Turbo and Lite versions of Llama-3 today that incorporate our latest research in optimization and quantization. Lite models are 6x cheaper than GPT-4o mini, possibly the most cost efficient inference in the world right now. Turbo models provide best
Together AI
@togethercompute
Jul 18, 2024
Replying to @togethercompute
Together Lite endpoints provide the lowest cost for Llama 3, making high-quality AI models more affordable than ever, with Llama 3 8B Lite priced at $0.10 per million tokens, 6x lower cost than GPT-4o-mini. Together Lite leverages a number of optimizations including INT4
26K