Log inSign up
You Jiacheng
14.7K posts
Image
user avatar
You Jiacheng
@YouJiacheng
github.com/YouJiacheng
Joined August 2015
2,315
Following
11.9K
Followers
  • Pinned
    user avatar
    You Jiacheng
    @YouJiacheng
    Jun 16
    I didn't expect that Evaluation is not only the core of training, but also crucial in deployment (harness).
    user avatar
    You Jiacheng
    @YouJiacheng
    Jul 25, 2024
    My Bold Prediction: Data-Centric AI Era: 2022-2024 Evaluation-Centric AI Era: 2025-?
    2.5K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Mar 11, 2025
    1) WHAT
    Image
    1.9M
  • user avatar
    You Jiacheng
    @YouJiacheng
    Apr 14, 2025
    1) WHAT
    Image
    433K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Nov 12, 2024
    wow.
    Image
    00:00
    598K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Nov 9, 2024
    Based on my investigation, they are using CNC lathe with milling functionality. The first video shows the actual manufacturing process (partial). The second video will show the simulated full process. (1/2)
    Image
    00:00
    user avatar
    Samuel Stone
    @SamWStone
    Nov 8, 2024
    Replying to @AWeirdPhysicist and @boxcardavid
    It's literally impossible, they're being sold for 0.70$ each. If they were 70% manufacturing fee, and you wanted 10$/hr, on a 100k machine(minimum), you still wouldnt get there.
    502K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Apr 7, 2025
    wtf? a user reported a bug at 17:49, and DeepSeek fixed it at 17:59???
    Image
    137K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Sep 30, 2024
    Replying to @francoisfleuret
    Here is an article about how to train a single model with 4k H100s.
    Image
    From bare metal to a 70B model: infrastructure set-up and scripts
    From imbue.com
    38K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Oct 13, 2025
    sorry, but it's not random access memory. it's append-only log.
    user avatar
    Jeffrey Wang
    Exa
    @jeffzwang
    Oct 13, 2025
    Context is the new RAM
    Image
    122K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Jun 24, 2025
    > LlamaBarn > literally ZERO llama models
    user avatar
    Georgi Gerganov
    @ggerganov
    Jun 23, 2025
    LlamaBarn (sneak peek 👀)
    LlamaBarn (sneak peek)
    68K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Dec 26, 2024
    Bros, this table approximately means a SoTA model can be trained in ONE DAY on xAI's Colossus.
    Image
    106K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Feb 24, 2025
    3→3.5→3.7→3.8→3.9→3.95→3.97→3.98→3.99→3.995→3.997...
    48K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Oct 1, 2025
    what a beautiful theory!
    Image
    Image
    00:10
    user avatar
    Jeremy Cohen
    @deepcohen
    Oct 1, 2025
    Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.
    73K
  • user avatar
    You Jiacheng
    @YouJiacheng
    Dec 22, 2024
    NVIDIA will be the biggest loser if o-series take over GPT-series. (scaling inference compute instead of model size)
    168K
  • user avatar
    You Jiacheng
    @YouJiacheng
    May 1, 2025
    WHAT??? Microsoft processed 50T tokens in 202503. ByteDance Doubao processed 12.7T tokens *per day* in 202503.
    73K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up