CLAP

A Closer Look at the Few-Shot Adaptation of
Large Vision-Language Models

Julio Silva-Rodríguez · Sina Hajimiri · Ismail Ben Ayed · Jose Dolz - ÉTS Montréal.

CVPR'24 - Paper - Code

Highlights

Adapter-style efficient transfer learning allow black-box, and fast few-shot transferability of VLMs.
Existing Adapters learn a combination of zero-shot prototypes and support embeddings to leverage taks-specific predictions.
Pitfalls: prior Adapters require a validation subset to fix key hyperparameters, unrealistic on the few-shot data regime.
Proposed: Few-shot adapters with model selection strategy based only on the support set.

Zero-shot Linear Probe (ZS-LP): a surprisingly strong well-initialized Linear Probe.
Class-Adaptive Linear Probe (CLAP): constraining the learnt prototypes to remain close to zero-shot weights.

Few-shot VLMs Adaptation

The adaptation of Vision-Language Models using few-shots as supervision benefits from the efficient transfer of the pre-trained features. Two alternatives are currently popularized: Prompt Learning, and Adapters.

Pitfalls on Existing Adapters

Existing Adapters exhibit strong performance only in narrowly-defined experimental setups, and with a careful adjustment of hyperparameters based on a large corpus of labeled samples. To outperform a carefully designed Linear Probing (ZS-LP) baseline, these methods require to optimize their hyperparameters on each target task, which is unrealistic.

Class-Adaptive Linear Probing (CLAP)

We propose a novel approach that meets the requirements of real-world scenarios. We introduce a CLass-Adaptive linear Probe (CLAP) objective, that constraints the learned prototypes to retain prior zero-shot knowledge adaptely based only on the few support shots, and uses an homogeneus learning configuration accross tasks.

Citation

Please cite our paper if it is helpful to your work:

@inproceedings{clap24,
    title={A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models},
    author={Julio Silva-Rodr\'iguez and Sina Hajimiri and Ismail {Ben Ayed} and Jose Dolz},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024}
}

Contact

Please feel free to contact us: julio-jose.silva-rodriguez@etsmtl.ca.