Distributed fine-tuning LLM is more cost effective than fine-tuning on a single instance! Check out the blog post on how to fine-tune and serve LLM simply, cost effectively using Ray + DeepSpeed and 🤗
ray
1,925 posts
A distributed compute framework for scaling AI workloads. Created and developed by @anyscalecompute.
Joined August 2019
- Ray is a powerful ML framework, but with great power comes massive documentation. How can we make it more accessible? Now, using @langchain and Ray, we can build and deploy a doc search engine in about 100 lines of code -- with a self-hosted LLM! 1/n
- Announcing a new Ray + 🤗 @huggingface integration! RAG is a new NLP model that uses external documents to augment its knowledge. We’ve integrated Ray with RAG: - 🚄Speeding up retrieval calls by 2x - 💫Improving the scalability of fine tuning Blog:
- We're releasing RaySGD, a pytorch library that makes distributed training cheap and simple! Features: - fp16 training support - elastic training (automatic fault tolerance) - Integrated distributed HPO (w/ RayTune) - intuitive and pytorch-friendly APIs
- Announcing Ray 2.4.0: Infrastructure for LLM training, tuning, inference, and serving. 🧠 LLM features 💽 Ray data for ease of use & stability 📊 Serve observability 🤖 RLlib’s module for custom reinforcement learning 🏢Ray scalability for large clusters
- ML serving infra has evolved, and there are 3 key requirements - Framework agnostic (@TensorFlow, @PyTorch, pure Python, ...) - Pure Python (intuitive for developers) - Out of the box scalability Why? How does this relate to Ray and @huggingface? 🤗 👇
- @BytedanceTalk, the company behind TikTok, uses Ray for fast & cheap offline inference with multi-modal #LLMs. They generate embeddings for a staggering 200 TB of image and text data using a model with >10B parameters. anyscale.com/blog/how-byted… 🧵 Thread below 👇
- You can now tune your @huggingface transformer Trainer with RayTune (tune.io) in 1 line of code! ⚡️Access Bayesian Optimization, Population-based Training to superpower your model 🧙♂️Use Multi-GPU and Multi-node support Blog post: anyscale.com/blog/hyperpara…
- Ray 1.0 is up on Github and PyPI (w/ new beautiful docs - docs.ray.io/en/latest/inde…)! 🎉This is a huge and important release, with many new APIs and tons of new committers! 🔖 Read about Ray 1.0 on our blog post (anyscale.com/blog/announcin…)
- 🎉 Say hello to Ray Lightning — a faster and simpler path to multi-node distributed training for @pytorchlightnin⚡️. Change 1 line to scale your PyTorch Lightning training to a multi-node GPU cluster. Give it a try and let us know what you think!
- Part 2 of our Ray + LangChain Series is ready, in this part we’ll show you how to turbocharge generation of embeddings. See the video(9 minutes) at hubs.ly/Q01Np5sh0 and blog post at hubs.ly/Q01Np8090
- ByteScale is a new LLM training framework - Evaluated 7B to 141B param models - 256K to 2048K context lengths - 12,000 GPUs - Optimized for mixed long and short sequences The crux of it is a much more dynamic parallelism strategy (as opposed to a static mesh) to account for
- vLLM + Ray is a powerful combo for post-training.OpenRLHF is a pioneering framework to use vLLM for RLHF, driving many design and implementation of vLLM's features for RLHF, making vLLM a popular choice for many RLHF frameworks. Learn more about the story at blog.vllm.ai/2025/04/23/ope…
- hyperparameter tuning for #NLProc is often overlooked, but by using @huggingface transformers + tuning techniques such as PBT, you can increase model accuracy by up to 5% on certain fine-tuning tasks *without increasing your compute budget*! 🔖 read it: medium.com/@amog_97444/c4…




