Check zlai0.github.io/CorrFlow/
We are excited to share code & model for Self-supervised Correspondence Flow (BMVC 2019 Oral) @bmvc2019,
State-of-the-art performance on video segmentation and pose tracking. @Oxford_VGG
Personal update:
After spending seven wonderful years at Oxford, I've decided to take new adventure.
I'm joining Shanghai Jiao Tong University from this year 🐯.
Tracking objects is among the first skills human infants learn, surely this must be a task without semantic understanding.
We present a SOTA self-supervised tracking approach, all you need is just 10min raw videos, zero annotations required.
arxiv.org/pdf/2006.12480…@Oxford_VGG
A tiny milestone in my academic journey.
I know these metrics do not carry much significance in today's academic landscape.
Nevertheless, they serve as a personal gauge, allowing me to assess the papers' impact and reflect on if I've contributed something meaningful.
Code:
Self-supervised Video Object Segmentation by Motion Grouping:
github.com/charigyang/mot…
We show that self-supervised segmentation can be done purely motions.
ICCV23 work on Open-vocabulary Object Segmentation with Diffusion Models
- we do visual instruction tuning on pre-trained diffusion model, to simultaneously generate image and open-vocabulary masks.
- it can create synthetic datasets for training discriminative model for free.
Can GPT-4V(vision) serve medical applications?
We present recent efforts on assessing GPT-4V for multimodal medical diagnosis, by case studies, covering 17 human body systems, across 8 clinical imaging modalities, e.g., radiology, pathology.
🔥Report: drive.google.com/file/d/1kPDWgw…
Just read Med-PaLM 2, the progress of LLMs in medical question answering is incredible ! but, I think multimodal medical question answering is quite far behind, here I present you,
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering: arxiv.org/pdf/2305.10415…
Happy to share the work, "Visual-Language Models for Efficient Video Understanding" at ECCV2022.
We benchmark 10 different datasets for various tasks, it turns out that, simply prompting CLIP can achieve comparable or sota results on many video tasks already. #ECCV2022
arxiv.org/abs/1905.00875 We investigate self-supervised learning on video correspondence flow. If done properly, the self-supervised learning can be surprisingly powerful (closing the gap to supervised learning). We demonstrate state-of-the-art results on video segmentation.
We are presenting our new paper at LUV2020 workshop today at 16:15 - 16:30pm.
MAST: A Memory-Augmented Self-Supervised Tracker,
by @LaiZihang, @erika_lu_, @Oxford_VGG.
A strong tracking model trained with no manual annotation.
Code: github.com/zlai0/MAST#VGGatCVPR2020
Also best paper on CVPR RVSU Workshop.
TL;DR:
We propose a self-supervised learning approach for segmentation based on motions, ie, Gestalt Principle.
Achieve strong performance to strong supervision on several popular benchmarks, e.g. DAVIS2016, MoCA (camouflage detection).
Happy to share the paper of "Self-supervised Tumor Segmentation with Sim2Real Adaptation" published in IEEE Journal of Biomedical and Health Informatics.
The model enables zero-shot tumor segmentation with Sim2Real training, requiring zero/few annotation from physicians.