Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a sentence-transformers model finetuned from google-bert/bert-base-uncased on the sentence-transformers/gooaq dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: PeftModelForFeatureExtraction
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/bert-base-uncased-gooaq-peft")
# Run inference
sentences = [
'what health services are covered by medicare?',
'Medicare Part A hospital insurance covers inpatient hospital care, skilled nursing facility, hospice, lab tests, surgery, home health care.',
"Elephants have the longest gestation period of all mammals. These gentle giants' pregnancies last for more than a year and a half. The average gestation period of an elephant is about 640 to 660 days, or roughly 95 weeks.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
gooaq-devInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.576 |
| cosine_accuracy@3 | 0.7295 |
| cosine_accuracy@5 | 0.7824 |
| cosine_accuracy@10 | 0.8462 |
| cosine_precision@1 | 0.576 |
| cosine_precision@3 | 0.2432 |
| cosine_precision@5 | 0.1565 |
| cosine_precision@10 | 0.0846 |
| cosine_recall@1 | 0.576 |
| cosine_recall@3 | 0.7295 |
| cosine_recall@5 | 0.7824 |
| cosine_recall@10 | 0.8462 |
| cosine_ndcg@10 | 0.7089 |
| cosine_mrr@10 | 0.6653 |
| cosine_map@100 | 0.6709 |
| dot_accuracy@1 | 0.5263 |
| dot_accuracy@3 | 0.6922 |
| dot_accuracy@5 | 0.7494 |
| dot_accuracy@10 | 0.8175 |
| dot_precision@1 | 0.5263 |
| dot_precision@3 | 0.2307 |
| dot_precision@5 | 0.1499 |
| dot_precision@10 | 0.0818 |
| dot_recall@1 | 0.5263 |
| dot_recall@3 | 0.6922 |
| dot_recall@5 | 0.7494 |
| dot_recall@10 | 0.8175 |
| dot_ndcg@10 | 0.6697 |
| dot_mrr@10 | 0.6226 |
| dot_map@100 | 0.6291 |
question and answer| question | answer | |
|---|---|---|
| type | string | string |
| details |
|
|
| question | answer |
|---|---|
can dogs get pregnant when on their period? |
2. Female dogs can only get pregnant when they're in heat. Some females will show physical signs of readiness – their discharge will lighten in color, and they will “flag,” or lift their tail up and to the side. |
are there different forms of als? |
['Sporadic ALS is the most common form. It affects up to 95% of people with the disease. Sporadic means it happens sometimes without a clear cause.', 'Familial ALS (FALS) runs in families. About 5% to 10% of people with ALS have this type. FALS is caused by changes to a gene.'] |
what is the difference between stayman and jacoby transfer? |
1. The Stayman Convention is used only with a 4-Card Major suit looking for a 4-Card Major suit fit. Jacoby Transfer bids are used with a 5-Card suit looking for a 3-Card fit. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
question and answer| question | answer | |
|---|---|---|
| type | string | string |
| details |
|
|
| question | answer |
|---|---|
is there a season 5 animal kingdom? |
the good news for the fans is that the season five was confirmed by TNT in July, 2019. The season five of Animal Kingdom was expected to release in May, 2020. |
what are cmos voltage levels? |
CMOS gate circuits have input and output signal specifications that are quite different from TTL. For a CMOS gate operating at a power supply voltage of 5 volts, the acceptable input signal voltages range from 0 volts to 1.5 volts for a “low” logic state, and 3.5 volts to 5 volts for a “high” logic state. |
dangers of drinking coke when pregnant? |
Drinking it during pregnancy was linked to poorer fine motor, visual, spatial and visual motor abilities in early childhood (around age 3). By mid-childhood (age 7), kids whose moms drank diet sodas while pregnant had poorer verbal abilities, the study findings reported. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | loss | gooaq-dev_cosine_map@100 |
|---|---|---|---|---|
| 0 | 0 | - | - | 0.2017 |
| 0.0000 | 1 | 2.584 | - | - |
| 0.0213 | 500 | 2.4164 | - | - |
| 0.0426 | 1000 | 1.1421 | - | - |
| 0.0639 | 1500 | 0.5215 | - | - |
| 0.0853 | 2000 | 0.3645 | 0.2763 | 0.6087 |
| 0.1066 | 2500 | 0.3046 | - | - |
| 0.1279 | 3000 | 0.2782 | - | - |
| 0.1492 | 3500 | 0.2601 | - | - |
| 0.1705 | 4000 | 0.2457 | 0.2013 | 0.6396 |
| 0.1918 | 4500 | 0.2363 | - | - |
| 0.2132 | 5000 | 0.2291 | - | - |
| 0.2345 | 5500 | 0.2217 | - | - |
| 0.2558 | 6000 | 0.2137 | 0.1770 | 0.6521 |
| 0.2771 | 6500 | 0.215 | - | - |
| 0.2984 | 7000 | 0.2057 | - | - |
| 0.3197 | 7500 | 0.198 | - | - |
| 0.3410 | 8000 | 0.196 | 0.1626 | 0.6594 |
| 0.3624 | 8500 | 0.1938 | - | - |
| 0.3837 | 9000 | 0.195 | - | - |
| 0.4050 | 9500 | 0.1895 | - | - |
| 0.4263 | 10000 | 0.186 | 0.1542 | 0.6628 |
| 0.4476 | 10500 | 0.1886 | - | - |
| 0.4689 | 11000 | 0.1835 | - | - |
| 0.4903 | 11500 | 0.1825 | - | - |
| 0.5116 | 12000 | 0.1804 | 0.1484 | 0.6638 |
| 0.5329 | 12500 | 0.176 | - | - |
| 0.5542 | 13000 | 0.1825 | - | - |
| 0.5755 | 13500 | 0.1785 | - | - |
| 0.5968 | 14000 | 0.1766 | 0.1436 | 0.6672 |
| 0.6182 | 14500 | 0.1718 | - | - |
| 0.6395 | 15000 | 0.1717 | - | - |
| 0.6608 | 15500 | 0.1674 | - | - |
| 0.6821 | 16000 | 0.1691 | 0.1406 | 0.6704 |
| 0.7034 | 16500 | 0.1705 | - | - |
| 0.7247 | 17000 | 0.1693 | - | - |
| 0.7460 | 17500 | 0.166 | - | - |
| 0.7674 | 18000 | 0.1676 | 0.1385 | 0.6721 |
| 0.7887 | 18500 | 0.1666 | - | - |
| 0.8100 | 19000 | 0.1658 | - | - |
| 0.8313 | 19500 | 0.1682 | - | - |
| 0.8526 | 20000 | 0.1639 | 0.1370 | 0.6705 |
| 0.8739 | 20500 | 0.1711 | - | - |
| 0.8953 | 21000 | 0.1667 | - | - |
| 0.9166 | 21500 | 0.165 | - | - |
| 0.9379 | 22000 | 0.1658 | 0.1356 | 0.6711 |
| 0.9592 | 22500 | 0.1665 | - | - |
| 0.9805 | 23000 | 0.1636 | - | - |
| 1.0 | 23457 | - | - | 0.6709 |
Carbon emissions were measured using CodeCarbon.
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
google-bert/bert-base-uncased