embedding_supper_legal

A Vietnamese embedding model specialized for semantic search and information retrieval in the legal domain.

This model is fine-tuned from bkai-foundation-models/vietnamese-bi-encoder, optimized for legal document retrieval, legal QA, and Vietnamese RAG systems.

Model Details

Base model: bkai-foundation-models/vietnamese-bi-encoder
Language: Vietnamese
Embedding dimension: 768
Max sequence length: 256
Similarity: Cosine Similarity
Training samples: 57,371
Evaluation samples: 7,172
Domain: Legal / Law / Vietnamese Legal Documents

Evaluation

Metrics: Information Retrieval
Dataset: another-symato/VMTEB-Zalo-legel-retrieval-wseg
Evaluated with: InformationRetrievalEvaluator

Model	MRR@3	MRR@5	MRR@10	NDCG@3	NDCG@5	NDCG@10
tanh17042004/dta_legal_model_sup	0.8712	0.8764	0.8785	0.8920	0.9013	0.9064
huyydangg/DEk21_hcmute_embedding	0.8632	0.8688	0.8721	0.8826	0.8927	0.9004
AITeamVN/Vietnamese_Embedding	0.8221	0.8290	0.8334	0.8427	0.8550	0.8650
BAAI/bge-m3	0.7633	0.7759	0.7803	0.7841	0.8067	0.8170
bkai-foundation-models/vietnamese-bi-encoder	0.7671	0.7765	0.7810	0.7880	0.8049	0.8155

Usage

Installation

pip install -U sentence-transformers

Load Model

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("your-username/embedding_supper_legal")

Example Inference

sentences = [
    "Labor contract signing under new regulations from 22/02/2023",
    "Article 1 of Decision 1788/QD-UBND on temporary construction cost norms"
]

embeddings = model.encode(sentences)

print(embeddings.shape)
# (2, 768)

Similarity

similarities = model.similarity(embeddings, embeddings)
print(similarities)

Use Cases

Semantic Search
Legal Document Retrieval
Question Answer Matching
Regulation / Clause Retrieval
Vietnamese RAG Systems
Dense Retrieval for Legal QA

License

Apache-2.0

Downloads last month: 68

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for tanh17042004/dta_legal_model_sup

Base model

bkai-foundation-models/vietnamese-bi-encoder

Finetuned

(55)

this model