Sentence Similarity
Transformers
Safetensors
sentence-transformers
Arabic
bert
feature-extraction
embeddings
egyptian-arabic
arabic
triplet-loss
information-retrieval
text-embeddings-inference
Instructions to use Ahmedhisham/queen_of_embedded_egy_20k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ahmedhisham/queen_of_embedded_egy_20k with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Ahmedhisham/queen_of_embedded_egy_20k") model = AutoModel.from_pretrained("Ahmedhisham/queen_of_embedded_egy_20k") - sentence-transformers
How to use Ahmedhisham/queen_of_embedded_egy_20k with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Ahmedhisham/queen_of_embedded_egy_20k") sentences = [ "هذا شخص سعيد", "هذا كلب سعيد", "هذا شخص سعيد جدا", "اليوم هو يوم مشمس" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
masri-embed-student-10k
نموذج تعلّم تمثيلات (embeddings) للعامية المصرية مبني على:
- Base model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- Fine-tuned on a subset (~20000 عينة) من داتاسيت EgyTriplets-250K
- Training objective: triplet loss على (anchor, positive, negative) للجمل المصرية
Usage (Python)
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
model_id = "Ahmedhisham/queen_of_embedded_egy_20k"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)
def encode(texts, max_length=128, device="cuda" if torch.cuda.is_available() else "cpu"):
model.to(device)
model.eval()
with torch.no_grad():
enc = tokenizer(
texts,
padding=True,
truncation=True,
max_length=max_length,
return_tensors="pt",
).to(device)
out = model(**enc)
last_hidden = out.last_hidden_state
mask = enc["attention_mask"].unsqueeze(-1).expand(last_hidden.size()).float()
masked = last_hidden * mask
summed = masked.sum(dim=1)
counts = mask.sum(dim=1).clamp(min=1e-9)
emb = summed / counts
emb = F.normalize(emb, p=2, dim=-1)
return emb
# Example
texts = [
"عايز أروح الساحل أغير جو وأرتاح شوية",
"محتاج أجازة على البحر كام يوم",
"بحب أقرأ كتب عن الذكاء الاصطناعي",
]
embs = encode(texts)
sim = torch.matmul(embs, embs.T)
print(sim)
- Downloads last month
- 8