File size: 2,174 Bytes
52f5ae5 d134079 52f5ae5 661cbef 52f5ae5 661cbef 52f5ae5 661cbef 52f5ae5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | ---
language: en
license: mit
tags:
- clip
- multimodal
- contrastive-learning
- cultural-heritage
- reevaluate
- information-retrieval
datasets:
- xuemduan/reevaluate-image-text-pairs
model-index:
- name: REEVALUATE CLIP Fine-tuned Models
results:
- task:
type: image-text-retrieval
name: Image-Text Retrieval
dataset:
name: Cultural Heritage Hybrid Dataset
type: xuemduan/reevaluate-image-text-pairs
metrics:
- name: I2T R@1
type: recall@1
value: <TOBE_FILL_IN>
- name: I2T R@5
type: recall@5
value: <TOBE_FILL_IN>
- name: T2I R@1
type: recall@1
value: <TOBE_FILL_IN>
---
# Domain-Adaptive CLIP for Multimodal Retrieval
The fine-tuned CLIP (Vit-L/14) used in **Knowledge-Enhanced Multimodal Retrieval**
---
## 📦 Available Models
| Model | Description | Data Type |
|--------|--------------|-----------|
| `reevaluate-clip` | Fine-tuned on images, query texts, and description texts | Image+Text |
---
## 🧾 Dataset
The models were trained and evaluated on the **REEVLAUATE Image-Text Pair Dataset**, which contains **43,500 image–text pairs** derived from Wikidata and Pilot Museums.
Each artefact is described by:
- `Image`: artefact image
- `Description text`: BLIP-generated natural language portion + meatadata portion
- `Query text`: User query-like text
Dataset: [xuemduan/reevaluate-image-text-pairs](https://huggingface.co/datasets/xuemduan/reevaluate-image-text-pairs)
---
## 🚀 Usage
```python
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
model = CLIPModel.from_pretrained("xuemduan/reevaluate-clip")
processor = CLIPProcessor.from_pretrained("xuemduan/reevaluate-clip")
image = Image.open("artefact.jpg")
text = "yellow flower paintings"
image_embeds = model.get_image_features(**processor(images=image, return_tensors="pt"))
text_embeds = model.get_text_features(**processor(text=[text], return_tensors="pt"))
# normalize
image_embeds = image_embeds / image_embeds.norm(dim=-1, keepdim=True)
text_embeds = text_embeds / text_embeds.norm(dim=-1, keepdim=True)
similarity = (image_embeds @ text_embeds.T)
print(similarity)
|