Log inSign up
DatologyAI
210 posts
Image
user avatar
DatologyAI
@datologyai
DatologyAI builds tools to automatically select and optimize the best data on which to train AI models, leading to better, smaller models which train faster.
Redwood City, CA
datologyai.com
Joined September 2023
10
Following
3,049
Followers
  • user avatar
    DatologyAI
    @datologyai
    5h
    The "you can only catch up by distilling from a frontier model" narrative is wrong. We curated the data for @Arceeai's Trinity Large entirely from public sources, zero closed-model APIs, and it's competitive with the open frontier. Better data does the work.
    Image
    00:00
    687
    user avatar
    DatologyAI
    @datologyai
    5h
    Full episode:
    137
  • user avatar
    DatologyAI
    @datologyai
    Jun 19
    Compute scarcity is about to force the reckoning the frontier labs have avoided: efficiency. You don't need trillion-parameter models for frontier-class capability. With better data, far smaller models match the best of a year or two ago, at a fraction of the cost to serve.
    Image
    00:00
    1.1K
    user avatar
    DatologyAI
    @datologyai
    Jun 19
    Full episode:
    589
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    5/ Alexander Gurung from the University of Edinburgh presented his work on learning to reason for long-form generation. What does a reward signal look like when the goal is a good story? 📺
    358
    user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    14/ Sukjun Hwang (@sukjun_hwang) from CMU presented his work on H-Nets: Dynamic chunking for end-to-end hierarchical sequence modeling 📄
    arXiv logo
    arxiv.org
    Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
    Major progress on language models (LMs) in recent years has largely resulted from moving away from specialized models designed for specific tasks, to general models based on powerful architectures...
    187
    user avatar
    DatologyAI
    @datologyai
    Jun 18
    15/ What a lineup, and that was only year one. Summer of Data is back for 2026 and we're just getting started. Keep an eye out for our lineup announcement and new talks every week. Want to present? DM us 👀 Stay data-obsessed 🤓
    165
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    8/ Xindi Wu (@cindy_x_wu) from Princeton presented her work on data efficiencies for multimodal ML (COMPACT). How do you teach a model to compose visual capabilities, atomic to complex? 📺
    203
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    7/ Suhas Kotha (@kothasuhas) from Stanford presented his work on why standard fine-tuning inefficiently uses rare data. How do you get a model to learn from the examples that matter most? 📺
    193
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    6/ Jacob Springer (@jacspringer) from CMU presented his work on echo embeddings and why overtrained language models are harder to fine-tune. Is more pretraining always a free lunch? 📺
    773
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    4/ Shizhe Diao (@shizhediao) from Thinking Machines presented his work on CLIMB, clustering-based iterative data selection for pretraining. Can a model find its own best data blend? 📺
    283
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    3/ Maximilian Böther (@MaxiBoether) from ETH Zurich presented his work on Mixtera, a data plane for foundation model training. How do you manage what your model eats at scale? He is now working @datologyai on cool dataloader improvements 📺
    412
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    Replying to @datologyai
    2/ Charlie Snell (@sea_snell) from UC Berkeley presented his work on scaling test-time compute and predicting emergent capabilities by finetuning. When does it pay to let a model think longer? 📺
    628
  • user avatar
    DatologyAI
    @datologyai
    Jun 18
    1/ 🌞 Our Summer of Data Seminar brought together some of the sharpest minds in data curation last year. We are bringing it back in 2026! Let's recap the great talks from 2025!
    Image
    4.2K
  • DatologyAI reposted
    user avatar
    DatologyAI
    @datologyai
    Jun 15
    A spicy take from @arimorcos on @jacobeffron's Unsupervised Learning: frontier APIs may not always be there. The teams that can build their own models won't be exposed when that happens.
    Image
    00:00
    1.3K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up