Are you using @OpenAI's Whisper for speech recognition and finding the timestamps are out of sync?
Just dropped: WhisperX github.com/m-bain/whisperX with word-level timestamp accuracy by force aligning whisper with wav2vec2.0
🧵 [1/n]
RIP webvid dataset, 23 Feb 2024.
Today I received a cease and desist letter from @Shutterstock that I must take down WebVid, an academic video captioning dataset, and can no longer provide the urls and captions to the research community.
Our work on Automated Audiovisual Behaviour Recognition in Wild Primates is finally out. An end-to-end detect, track and behaviour recognition pipeline, using both the audio and visual inputs (helpful for robustness in wild footage)
science.org/content/articl…
Currently working on a demo for our Frozen-in-Time model, retrieving videos amongst millions in the WebVid dataset. Cool to see how sensitive our model is to small changes in the text query!
✨ @RekaAILabs Vibe-Eval Leaderboard Update ✨
Updated results:
🥇 Gemini Flash 2.0 @GoogleDeepMind
🥈 Sonnet 3.5 (leads on hard prompts) @AnthropicAI
🥉 GPT-4o @OpenAI
6 months in, big gains on normal prompts, but hard prompts still show little improvement. 🤔
New leader on the Reka Vibe-Eval multimodal benchmark. It actually solves some of the anti-scaling examples, nice work @OpenAI.
But the hard-set is still hard (only 54%).
@RekaAILabs
💡Advice: if you are building yourself a long-term training codebase, then avoid heavy external libraries at all costs: (HF, hydra, lightning, even wandb etc.)
New paper from @RekaAILabs 🔥 (yes an actual paper).
This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today.
The fun part here is that we constructed it by trying to
Come say hi at #CVPR23
Will be presenting the project behind WhisperX 😎🎬
AutoAD: Movie Description in Context. June 22, Thu AM
(Highlight, Poster 234).
We train a model to automatically generate audio descriptions
robots.ox.ac.uk/~vgg/research/…
A good day. Testing our new ✨Reka Core✨ model and its showing promising capabilities.
Complex table understanding is one of them.
Lmk if you are interested in early access @RekaAILabs