Max Bain (@maxhbain) / X

Max Bain

325 posts

Max Bain

@maxhbain

research scientist @googledeepmind gemini, large scale pretraining

Joined April 2021

Pinned
Max Bain
@maxhbain
Jan 29, 2023
WhisperX version 2.0 out, now with speaker diarization and character-level timestamps. github.com/m-bain/whisperX 🧵
00:00
179K
Max Bain
@maxhbain
Dec 18, 2022
Are you using @OpenAI's Whisper for speech recognition and finding the timestamps are out of sync? Just dropped: WhisperX github.com/m-bain/whisperX with word-level timestamp accuracy by force aligning whisper with wav2vec2.0 🧵 [1/n]
00:00
78K
Max Bain
@maxhbain
Feb 23, 2024
RIP webvid dataset, 23 Feb 2024. Today I received a cease and desist letter from @Shutterstock that I must take down WebVid, an academic video captioning dataset, and can no longer provide the urls and captions to the research community.
95K
Max Bain
@maxhbain
Nov 12, 2021
Our work on Automated Audiovisual Behaviour Recognition in Wild Primates is finally out. An end-to-end detect, track and behaviour recognition pipeline, using both the audio and visual inputs (helpful for robustness in wild footage) science.org/content/articl…
00:00
Max Bain
@maxhbain
Aug 20, 2021
Currently working on a demo for our Frozen-in-Time model, retrieving videos amongst millions in the WebVid dataset. Cool to see how sensitive our model is to small changes in the text query!
00:00
Max Bain
@maxhbain
Jun 17, 2021
WebVid: large scale text-video dataset now available. 2.5mil text-video pairs (10mil coming soon). Pretrain your E2E video-language models. m-bain.github.io/webvid-dataset/ github.com/m-bain/webvid
00:00
Max Bain
@maxhbain
Dec 11, 2024
✨ @RekaAILabs Vibe-Eval Leaderboard Update ✨ Updated results: 🥇 Gemini Flash 2.0 @GoogleDeepMind 🥈 Sonnet 3.5 (leads on hard prompts) @AnthropicAI 🥉 GPT-4o @OpenAI 6 months in, big gains on normal prompts, but hard prompts still show little improvement. 🤔
22K
Max Bain
@maxhbain
May 14, 2024
New leader on the Reka Vibe-Eval multimodal benchmark. It actually solves some of the anti-scaling examples, nice work @OpenAI. But the hard-set is still hard (only 54%). @RekaAILabs
31K
Max Bain
@maxhbain
Mar 5, 2024
💡Advice: if you are building yourself a long-term training codebase, then avoid heavy external libraries at all costs: (HF, hydra, lightning, even wandb etc.)
28K
Max Bain
@maxhbain
May 1, 2024
Yi Tay
@YiTayML
May 1, 2024
New paper from @RekaAILabs 🔥 (yes an actual paper). This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today. The fun part here is that we constructed it by trying to
27K
Max Bain
@maxhbain
Feb 23, 2024
Replying to @maxhbain
RIP, we had a good run, and helped a lot of open text-video research
GitHub - m-bain/webvid: Large-scale text-video dataset. 10 million captioned short videos.
From github.com
21K
Max Bain
@maxhbain
Feb 23, 2024
Replying to @maxhbain
So: only big companies who afford to pay for the shutterstock license get to train on those videos. Making it increasingly difficult for academic and independent researchers. prnewswire.com/news-releases/… investor.shutterstock.com/news-releases/…
Shutterstock Expands Long-standing Relationship with Meta
From prnewswire.com
7.4K
Max Bain
@maxhbain
Jun 19, 2023
Come say hi at #CVPR23 Will be presenting the project behind WhisperX 😎🎬 AutoAD: Movie Description in Context. June 22, Thu AM (Highlight, Poster 234). We train a model to automatically generate audio descriptions robots.ox.ac.uk/~vgg/research/…
11K
Max Bain
@maxhbain
Apr 8, 2024
A good day. Testing our new ✨Reka Core✨ model and its showing promising capabilities. Complex table understanding is one of them. Lmk if you are interested in early access @RekaAILabs
31K