Sentence Similarity
sentence-transformers
Safetensors
English
roberta
semantic-search
legal
vietnamese
Instructions to use tanh17042004/dta_legal_model_sup with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use tanh17042004/dta_legal_model_sup with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("tanh17042004/dta_legal_model_sup") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
embedding_supper_legal
A Vietnamese embedding model specialized for semantic search and information retrieval in the legal domain.
This model is fine-tuned from bkai-foundation-models/vietnamese-bi-encoder, optimized for legal document retrieval, legal QA, and Vietnamese RAG systems.
Model Details
- Base model:
bkai-foundation-models/vietnamese-bi-encoder - Language: Vietnamese
- Embedding dimension: 768
- Max sequence length: 256
- Similarity: Cosine Similarity
- Training samples: 57,371
- Evaluation samples: 7,172
- Domain: Legal / Law / Vietnamese Legal Documents
Evaluation
- Metrics: Information Retrieval
- Dataset:
another-symato/VMTEB-Zalo-legel-retrieval-wseg - Evaluated with:
InformationRetrievalEvaluator
| Model | MRR@3 | MRR@5 | MRR@10 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|---|
| tanh17042004/dta_legal_model_sup | 0.8712 | 0.8764 | 0.8785 | 0.8920 | 0.9013 | 0.9064 |
| huyydangg/DEk21_hcmute_embedding | 0.8632 | 0.8688 | 0.8721 | 0.8826 | 0.8927 | 0.9004 |
| AITeamVN/Vietnamese_Embedding | 0.8221 | 0.8290 | 0.8334 | 0.8427 | 0.8550 | 0.8650 |
| BAAI/bge-m3 | 0.7633 | 0.7759 | 0.7803 | 0.7841 | 0.8067 | 0.8170 |
| bkai-foundation-models/vietnamese-bi-encoder | 0.7671 | 0.7765 | 0.7810 | 0.7880 | 0.8049 | 0.8155 |
Usage
Installation
pip install -U sentence-transformers
Load Model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("your-username/embedding_supper_legal")
Example Inference
sentences = [
"Labor contract signing under new regulations from 22/02/2023",
"Article 1 of Decision 1788/QD-UBND on temporary construction cost norms"
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# (2, 768)
Similarity
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Use Cases
- Semantic Search
- Legal Document Retrieval
- Question Answer Matching
- Regulation / Clause Retrieval
- Vietnamese RAG Systems
- Dense Retrieval for Legal QA
License
Apache-2.0
- Downloads last month
- 68
Model tree for tanh17042004/dta_legal_model_sup
Base model
bkai-foundation-models/vietnamese-bi-encoder