embedding_supper_legal

A Vietnamese embedding model specialized for semantic search and information retrieval in the legal domain.

This model is fine-tuned from bkai-foundation-models/vietnamese-bi-encoder, optimized for legal document retrieval, legal QA, and Vietnamese RAG systems.

Model Details

  • Base model: bkai-foundation-models/vietnamese-bi-encoder
  • Language: Vietnamese
  • Embedding dimension: 768
  • Max sequence length: 256
  • Similarity: Cosine Similarity
  • Training samples: 57,371
  • Evaluation samples: 7,172
  • Domain: Legal / Law / Vietnamese Legal Documents

Evaluation

  • Metrics: Information Retrieval
  • Dataset: another-symato/VMTEB-Zalo-legel-retrieval-wseg
  • Evaluated with: InformationRetrievalEvaluator
Model MRR@3 MRR@5 MRR@10 NDCG@3 NDCG@5 NDCG@10
tanh17042004/dta_legal_model_sup 0.8712 0.8764 0.8785 0.8920 0.9013 0.9064
huyydangg/DEk21_hcmute_embedding 0.8632 0.8688 0.8721 0.8826 0.8927 0.9004
AITeamVN/Vietnamese_Embedding 0.8221 0.8290 0.8334 0.8427 0.8550 0.8650
BAAI/bge-m3 0.7633 0.7759 0.7803 0.7841 0.8067 0.8170
bkai-foundation-models/vietnamese-bi-encoder 0.7671 0.7765 0.7810 0.7880 0.8049 0.8155

Usage

Installation

pip install -U sentence-transformers

Load Model

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("your-username/embedding_supper_legal")

Example Inference

sentences = [
    "Labor contract signing under new regulations from 22/02/2023",
    "Article 1 of Decision 1788/QD-UBND on temporary construction cost norms"
]

embeddings = model.encode(sentences)

print(embeddings.shape)
# (2, 768)

Similarity

similarities = model.similarity(embeddings, embeddings)
print(similarities)

Use Cases

  • Semantic Search
  • Legal Document Retrieval
  • Question Answer Matching
  • Regulation / Clause Retrieval
  • Vietnamese RAG Systems
  • Dense Retrieval for Legal QA

License

Apache-2.0

Downloads last month
68
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tanh17042004/dta_legal_model_sup

Finetuned
(55)
this model