Log inSign up
Unsloth AI
687 posts
Image
user avatar
Unsloth AI
@UnslothAI
Train and run models locally! 🦥 github.com/unslothai/unsl…
San Francisco, CA
unsloth.ai
Joined November 2023
468
Following
75.5K
Followers
  • Pinned
    user avatar
    Unsloth AI
    @UnslothAI
    Mar 17
    Introducing Unsloth Studio ✨ A new open-source web UI to train and run LLMs. • Run models locally on Mac, Windows, Linux • Train 500+ models 2x faster with 70% less VRAM • Supports GGUF, vision, audio, embedding models • Auto-create datasets from PDF, CSV, DOCX •
    Image
    00:00
    1.7M
  • user avatar
    Unsloth AI
    @UnslothAI
    19h
    What’s your go-to local model right now?
    95K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 23
    1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5 We gave 3 models the same prompt and compared one-shot outputs. The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s. Which output do you like best? GGUF: huggingface.co/unsloth/GLM-5.…
    Image
    00:00
    Image
    user avatar
    Unsloth AI
    @UnslothAI
    Jun 18
    GLM-5.2 can now be run locally!🔥 The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% size). Run on a 256GB Mac or RAM/VRAM setups. GLM-5.2 is the strongest open model to date. Guide: unsloth.ai/docs/models/gl… GGUF: huggingface.co/unsloth/GLM-5.…
    1.5M
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 18
    GLM-5.2 can now be run locally!🔥 The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% size). Run on a 256GB Mac or RAM/VRAM setups. GLM-5.2 is the strongest open model to date. Guide: unsloth.ai/docs/models/gl… GGUF: huggingface.co/unsloth/GLM-5.…
    Image
    Image
    user avatar
    Z.ai
    @Zai_org
    Jun 16
    Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong
    1.8M
    user avatar
    Unsloth AI
    @UnslothAI
    Jun 18
    You can run GLM-5.2 and other models directly in Unsloth Studio:
    Image
    GitHub - unslothai/unsloth: Unsloth Studio is a web UI for training and running open models like...
    From github.com
    25K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 15
    You can now run Kimi K2.7 Code locally! 🌘 We shrank the 1T model to 325GB (-48%) via Dynamic 2-bit where important layers are upcasted. Run at >40 tok/s on 330GB RAM/VRAM setups. Run full precision on 610 GB. Guide: unsloth.ai/docs/models/ki… GGUF: huggingface.co/unsloth/Kimi-K…
    Image
    Image
    Image
    user avatar
    Kimi.ai
    @Kimi_Moonshot
    Jun 12
    🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower
    1.4M
  • Unsloth AI reposted
    user avatar
    Ivan Fioravanti ᯅ
    @ivanfioravanti
    Jun 12
    Local AI in action! MiniMax M3 unning locally on a single M3 Ultra 512GB in Unsloth Studio! 🔥 Here UD-Q5_K_XL decoding at 32.5 toks/s!
    Image
    00:00
    31K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 12
    MiniMax M3 can now be run locally!🔥 MiniMax-M3 is a new 428B (23B active) open model with 1M context that performs on par with Gemini 3.1 Pro. Run Dynamic 2-bit GGUF on 138GB RAM/VRAM or 3-bit on 165GB. GGUF: huggingface.co/unsloth/MiniMa… Guide: unsloth.ai/docs/models/mi…
    Image
    user avatar
    MiniMax (official)
    @MiniMax_AI
    Jun 12
    MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Mini… MiniMax Sparse Attention: huggingface.co/papers/2606.13…
    184K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 12
    DiffusionGemma can now run at 2000+ tokens/sec! ⚡ We made local DiffusionGemma inference 1.8× faster. Run it on 18GB RAM via Unsloth Studio. GitHub: github.com/unslothai/unsl… Guide: unsloth.ai/docs/models/di…
    Image
    00:00
    Image
    user avatar
    Unsloth AI
    @UnslothAI
    Jun 10
    Google releases DiffusionGemma.✨ The new 26B-A4B diffusion text model runs locally on 18GB RAM. It supports high-speed text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio. GGUF: huggingface.co/unsloth/diffus… Guide: unsloth.ai/docs/models/di…
    176K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 11
    Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️ MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss. Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s. GGUFs + Guide: unsloth.ai/docs/models/mtp
    Image
    219K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 10
    Google releases DiffusionGemma.✨ The new 26B-A4B diffusion text model runs locally on 18GB RAM. It supports high-speed text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio. GGUF: huggingface.co/unsloth/diffus… Guide: unsloth.ai/docs/models/di…
    Image
    Image
    00:04
    user avatar
    Google Gemma
    Google for Developers
    @googlegemma
    Jun 10
    Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
    331K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 5
    Google releases Gemma 4 QAT. ✨ You can now run Gemma 4 at 3x less memory with near original performance. Quantization-Aware Training (QAT) makes it possible to run Gemma 4 26B-A4B on 16GB RAM. GGUFs: huggingface.co/collections/un… QAT Guide: unsloth.ai/docs/models/ge…
    Image
    user avatar
    Google Gemma
    Google for Developers
    @googlegemma
    Jun 5
    We just dropped Gemma 4 Quantization-Aware Training (QAT) checkpoints on Hugging Face! All Gemma 4 model sizes and their drafters are now optimized with QAT to cut memory requirements and maximize on-device performance!
    252K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 4
    You can now run NVIDIA Nemotron 3 Ultra, a new 550B open model. Nemotron-3-Ultra-550B-A55B is NVIDIA's largest LLM yet, with 1M context, frontier coding & chat. Run 2-bit on 200GB RAM, 3-bit on 256GB, 8-bit on 600GB. GGUF: huggingface.co/unsloth/NVIDIA… Guide: unsloth.ai/docs/models/ne…
    Image
    Image
    02:59
    user avatar
    NVIDIA AI
    NVIDIA
    @NVIDIAAI
    Jun 4
    Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.
    40K
  • user avatar
    Unsloth AI
    @UnslothAI
    Jun 4
    2-bit Gemma 4 12B GGUF, only 4.66 GB on disk, managed to cite 15 sites from a single prompt. Try this locally on >6GB RAM via Unsloth Studio. GitHub: github.com/unslothai/unsl…
    Image
    00:00
    Image
    user avatar
    Unsloth AI
    @UnslothAI
    Jun 3
    Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs. Google's new model, Gemma 4 12B Unified supports image, audio and 256K context. You can run and train the model via Unsloth Studio. GGUF: huggingface.co/unsloth/gemma-… Guide: unsloth.ai/docs/models/ge…
    143K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up