A Closer Look at the Few-Shot Adaptation of
Large Vision-Language Models

Julio Silva-Rodríguez · Sina Hajimiri · Ismail Ben Ayed · Jose Dolz - ÉTS Montréal.

CVPR'24   -   Paper   -   Code

Highlights

  • Adapter-style efficient transfer learning allow black-box, and fast few-shot transferability of VLMs.
  • Existing Adapters learn a combination of zero-shot prototypes and support embeddings to leverage taks-specific predictions.
  • Pitfalls: prior Adapters require a validation subset to fix key hyperparameters, unrealistic on the few-shot data regime.
  • Proposed: Few-shot adapters with model selection strategy based only on the support set.
    • Zero-shot Linear Probe (ZS-LP): a surprisingly strong well-initialized Linear Probe.
    • Class-Adaptive Linear Probe (CLAP): constraining the learnt prototypes to remain close to zero-shot weights.

Few-shot VLMs Adaptation


The adaptation of Vision-Language Models using few-shots as supervision benefits from the efficient transfer of the pre-trained features. Two alternatives are currently popularized: Prompt Learning, and Adapters.

Image


Pitfalls on Existing Adapters


Existing Adapters exhibit strong performance only in narrowly-defined experimental setups, and with a careful adjustment of hyperparameters based on a large corpus of labeled samples. To outperform a carefully designed Linear Probing (ZS-LP) baseline, these methods require to optimize their hyperparameters on each target task, which is unrealistic.

Image


Class-Adaptive Linear Probing (CLAP)


We propose a novel approach that meets the requirements of real-world scenarios. We introduce a CLass-Adaptive linear Probe (CLAP) objective, that constraints the learned prototypes to retain prior zero-shot knowledge adaptely based only on the few support shots, and uses an homogeneus learning configuration accross tasks.

Image


Citation


Please cite our paper if it is helpful to your work:

@inproceedings{clap24,
    title={A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models},
    author={Julio Silva-Rodr\'iguez and Sina Hajimiri and Ismail {Ben Ayed} and Jose Dolz},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024}
}

Contact


Please feel free to contact us: julio-jose.silva-rodriguez@etsmtl.ca.