Log inSign up
Max Bain
325 posts
Image
user avatar
Max Bain
@maxhbain
research scientist @googledeepmind gemini, large scale pretraining
maxbain.com
Joined April 2021
760
Following
2,141
Followers
  • Pinned
    user avatar
    Max Bain
    @maxhbain
    Jan 29, 2023
    WhisperX version 2.0 out, now with speaker diarization and character-level timestamps. github.com/m-bain/whisperX 🧵
    Image
    00:00
    179K
  • user avatar
    Max Bain
    @maxhbain
    Dec 18, 2022
    Are you using @OpenAI's Whisper for speech recognition and finding the timestamps are out of sync? Just dropped: WhisperX github.com/m-bain/whisperX with word-level timestamp accuracy by force aligning whisper with wav2vec2.0 🧵 [1/n]
    Image
    00:00
    78K
  • user avatar
    Max Bain
    @maxhbain
    Feb 23, 2024
    RIP webvid dataset, 23 Feb 2024. Today I received a cease and desist letter from @Shutterstock that I must take down WebVid, an academic video captioning dataset, and can no longer provide the urls and captions to the research community.
    Image
    95K
  • user avatar
    Max Bain
    @maxhbain
    Nov 12, 2021
    Our work on Automated Audiovisual Behaviour Recognition in Wild Primates is finally out. An end-to-end detect, track and behaviour recognition pipeline, using both the audio and visual inputs (helpful for robustness in wild footage) science.org/content/articl…
    Image
    00:00
  • user avatar
    Max Bain
    @maxhbain
    Aug 20, 2021
    Currently working on a demo for our Frozen-in-Time model, retrieving videos amongst millions in the WebVid dataset. Cool to see how sensitive our model is to small changes in the text query!
    Image
    00:00
  • user avatar
    Max Bain
    @maxhbain
    Jun 17, 2021
    WebVid: large scale text-video dataset now available. 2.5mil text-video pairs (10mil coming soon). Pretrain your E2E video-language models. m-bain.github.io/webvid-dataset/ github.com/m-bain/webvid
    Image
    00:00
  • user avatar
    Max Bain
    @maxhbain
    Dec 11, 2024
    ✨ @RekaAILabs Vibe-Eval Leaderboard Update ✨ Updated results: 🥇 Gemini Flash 2.0 @GoogleDeepMind 🥈 Sonnet 3.5 (leads on hard prompts) @AnthropicAI 🥉 GPT-4o @OpenAI 6 months in, big gains on normal prompts, but hard prompts still show little improvement. 🤔
    Image
    22K
  • user avatar
    Max Bain
    @maxhbain
    May 14, 2024
    New leader on the Reka Vibe-Eval multimodal benchmark. It actually solves some of the anti-scaling examples, nice work @OpenAI. But the hard-set is still hard (only 54%). @RekaAILabs
    Image
    31K
  • user avatar
    Max Bain
    @maxhbain
    Mar 5, 2024
    💡Advice: if you are building yourself a long-term training codebase, then avoid heavy external libraries at all costs: (HF, hydra, lightning, even wandb etc.)
    28K
  • user avatar
    Max Bain
    @maxhbain
    May 1, 2024
    Image
    Image
    user avatar
    Yi Tay
    @YiTayML
    May 1, 2024
    New paper from @RekaAILabs 🔥 (yes an actual paper). This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today. The fun part here is that we constructed it by trying to
    27K
  • user avatar
    Max Bain
    @maxhbain
    Feb 23, 2024
    Replying to @maxhbain
    RIP, we had a good run, and helped a lot of open text-video research
    Image
    GitHub - m-bain/webvid: Large-scale text-video dataset. 10 million captioned short videos.
    From github.com
    21K
  • user avatar
    Max Bain
    @maxhbain
    Feb 23, 2024
    Replying to @maxhbain
    So: only big companies who afford to pay for the shutterstock license get to train on those videos. Making it increasingly difficult for academic and independent researchers. prnewswire.com/news-releases/… investor.shutterstock.com/news-releases/…
    Image
    Shutterstock Expands Long-standing Relationship with Meta
    From prnewswire.com
    7.4K
  • user avatar
    Max Bain
    @maxhbain
    Jun 19, 2023
    Come say hi at #CVPR23 Will be presenting the project behind WhisperX 😎🎬 AutoAD: Movie Description in Context. June 22, Thu AM (Highlight, Poster 234). We train a model to automatically generate audio descriptions robots.ox.ac.uk/~vgg/research/…
    Image
    11K
  • user avatar
    Max Bain
    @maxhbain
    Apr 8, 2024
    A good day. Testing our new ✨Reka Core✨ model and its showing promising capabilities. Complex table understanding is one of them. Lmk if you are interested in early access @RekaAILabs
    Image
    Image
    31K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up