Instructions to use HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
What's the instruction you added for each task in training?
You mentioned that you added different instructions for tasks like retrieval and reranking, could you help provide those instructions? Thanks!
You mentioned that you added different instructions for tasks like retrieval and reranking, could you help provide those instructions? Thanks!
For retrieval and reranking, we use the following instruction:
Instruct: Given a query, retrieve documents that answer the query. \n Query: {query}
For other tasks (including evaluation and potential training), please refer to the instructions provided on this previous issue.
We will release the instructions and complete code for testing in the future, making it more convenient for reference at that time.
What if the query is multilingual? Do you use English instructions?
What if the query is multilingual? Do you use English instructions?
Yes, instructions for multilingual tasks are generally provided in English, as it is more universally understood.
You can also specify the language requirements in the instruction to help the model better recognize the language.
Of course, I have also seen developers use other languages directly as instructions (such as Russian), which could also be worth trying.
News: For the complete instructions on the MTEB evaluation, please refer to our technical report (in the appendix) and code.