Abstract: Despite its significant progress, cross-modal retrieval still suffers from one-to-many matching cases, where the multiplicity of semantic instances in another modality could be acquired by a ...
Abstract: Benefiting from the ability to process and integrate data from various modalities, multi-modal foundation models (FMs) facilitate potential applications across a range of fields, including ...