Filtering
Collection
12 items • Updated
This is a sentence-transformers model finetuned from intfloat/e5-small-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'query: What States Will Have the Highest Diabetes Rates in 2025?',
'passage: The incidence and prevalence of diabetes (primarily type 2 diabetes) has risen sharply since 1990. It is projected to increase another 64% between 2010 and 2025, affecting 53.1 million people and resulting in medical and societal costs of a half trillion dollars a year. We know how to prevent many cases of diabetes and how to treat it effectively. Early appropriate treatment makes a significant difference in preventing major complications and reducing premature death, but it does not cure the disease. Early detection of prediabetes, in conjunction with lifestyle changes, can reduce the number of people with diabetes. A dramatic reduction in diabetes prevalence over time will require significant lifestyle changes on the part of society as a whole. The purpose of this study is to increase public awareness of the severity of regional diabetes trends by providing detailed forecasts for all states and several metropolitan areas for 2010, 2015, and 2025. A model was created to utilize the latest national diabetes and population data and projections, and to transform these into state and metropolitan area forecasts for the whole population and major subgroups. These forecasts were then summarized in easy-to-understand briefing papers for each state and selected metro areas, which are provided online for easy public access. This research is important because little data exist that project the future prevalence and potential costs of diabetes at the state and metro area level. With this data, key stakeholders can make informed decisions concerning diabetes, its impact on their communities, and resource allocation.',
"passage: Casomorphins are the most important during the first year of life, when postnatal formation is most active and milk is the main source of both nutritive and biologically active material for infants. This study was conducted on a total of 90 infants, of which 37 were fed with breast milk and 53 were fed with formula containing cow milk. The study has firstly indicated substances with immunoreactivity of human (irHCM) and bovine (irBCM) beta-casomorphins-7 in blood plasma of naturally and artificially fed infants, respectively. irHCM and irBCM were detected both in the morning before feeding (basal level), and 3h after feeding. Elevation of irHCM and irBCM levels after feeding was detected mainly in infants in the first 3 months of life. Chromatographic characterization of the material with irBCM has demonstrated that it has the same molecular mass and polarity as synthetic bovine beta-casomorphin-7. The highest basal irHCM was observed in breast-fed infants with normal psychomotor development and muscle tone. In contrast, elevated basal irBCM was found in formula-fed infants showing delay in psychomotor development and heightened muscle tone. Among formula-fed infants with normal development, the rate of this parameter directly correlated to basal irBCM. The data indicate that breast feeding has an advantage over artificial feeding for infants' development during the first year of life and support the hypothesis for deterioration of bovine casomorphin elimination as a risk factor for delay in psychomotor development and other diseases such as autism.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.8179, -0.0767],
# [ 0.8179, 1.0000, -0.0596],
# [-0.0767, -0.0596, 1.0000]])
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
query: What Causes Liver Fat and How Can It Be Reversed? |
passage: It has become widely accepted that Type 2 diabetes is inevitably life-long, with irreversible and progressive beta cell damage. However, the restoration of normal glucose metabolism within days after bariatric surgery in the majority of people with Type 2 diabetes disproves this concept. There is now no doubt that this reversal of diabetes depends upon the sudden and profound decrease in food intake, and does not relate to any direct surgical effect. The Counterpoint study demonstrated that normal glucose levels and normal beta cell function could be restored by a very low calorie diet alone. Novel magnetic resonance methods were applied to measure intra-organ fat. The results showed two different time courses: a) resolution of hepatic insulin sensitivity within days along with a rapid fall in liver fat and normalisation of fasting glucose levels; and b) return of normal beta cell insulin secretion over weeks in step with a fall in pancreas fat. Now that it has been possible t... |
query: Does a High Fat Diet Increase Colon Cancer Risk? |
passage: Colon cancer, rare in the past, and in developing populations, currently accounts for 2 to 4% of all deaths in Western populations. Evidence suggests the primary cause to be changes in diet, which affect the bowel milieu intérieur. It is possible that in sophisticated populations, the higher concentrations of fecal bile acids and sterols, and longer transit time, favor the production of potentially carcinogenic metabolites. Of secular changes in diet, evidence suggests that the following may have etiological importance: 1) the fall in intake of fiber-containing foods with its effects on bowel physiology, and 2) the decreased fiber but increased fat intakes, in their respective capacities to raise concentrations of fecal bile acids, sterols, and other noxious substances. For possible prophylaxis against colon cancer, recommendations for a lower fat intake, or a higher intake of fiber-containing foods (apart from fiber ingestion from bran) are extremely unlikely to be adopted. F... |
query: Can Fish Catch Mad Cow Disease? |
passage: In transmissible spongiform encephalopathies (TSEs), a group of fatal neurodegenerative disorders affecting many species, the key event in disease pathogenesis is the accumulation of an abnormal conformational isoform (PrPSc) of the host-encoded cellular prion protein (PrPC). While the precise mechanism of the PrPC to PrPSc conversion is not understood, it is clear that host PrPC expression is a prerequisite for effective infectious prion propagation. Although there have been many studies on TSEs in mammalian species, little is known about TSE pathogenesis in fish. Here we show that while gilthead sea bream (Sparus aurata) orally challenged with brain homogenates prepared either from a BSE infected cow or from scrapie infected sheep developed no clinical prion disease, the brains of TSE-fed fish sampled two years after challenge did show signs of neurodegeneration and accumulation of deposits that reacted positively with antibodies raised against sea bream PrP. The control gro... |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 16,
"gather_across_devices": false
}
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
query: How Long Does Acetaldehyde Stay in Your Mouth? |
passage: The aim of this study was to explore oral exposure to carcinogenic (group 1) acetaldehyde after single sips of strong alcoholic beverages containing no or high concentrations of acetaldehyde. Eight volunteers tasted 5 ml of ethanol diluted to 40 vol.% with no acetaldehyde and 40 vol.% calvados containing 2400 μM acetaldehyde. Salivary acetaldehyde and ethanol concentrations were measured by gas chromatography. The protocol was repeated after ingestion of ethanol (0.5 g/kg body weight). Salivary acetaldehyde concentration was significantly higher after sipping calvados than after sipping ethanol at 30s both with (215 vs. 128 μmol/l, p<0.05) and without (258 vs. 89 μmol/l, p<0.05) alcohol ingestion. From 2 min onwards there were no significant differences in the decreasing salivary acetaldehyde concentration, which remained above the level of carcinogenicity still at 10 min. The systemic alcohol distribution from blood to saliva had no additional effect on salivary acetaldehyde ... |
query: Does Vitamin C Damage Muscles? |
passage: There has been no investigation to determine if the widely used over-the-counter, water-soluble antioxidants vitamin C and N-acetyl-cysteine (NAC) could act as pro-oxidants in humans during inflammatory conditions. We induced an acute-phase inflammatory response by an eccentric arm muscle injury. The inflammation was characterized by edema, swelling, pain, and increases in plasma inflammatory indicators, myeloperoxidase and interleukin-6. Immediately following the injury, subjects consumed a placebo or vitamin C (12.5 mg/kg body weight) and NAC (10 mg/kg body weight) for 7 d. The resulting muscle injury caused increased levels of serum bleomycin-detectable iron and the amount of iron was higher in the vitamin C and NAC group. The concentrations of lactate dehydrogenase (LDH), creatine kinase (CK), and myoglobin were significantly elevated 2, 3, and 4 d postinjury and returned to baseline levels by day 7. In addition, LDH and CK activities were elevated to a greater extent in t... |
query: Does Drinking Cola Interfere with Iron Absorption? |
passage: Preliminary data in the literature indicate that iron absorption from a meal may be increased when consumed with low-pH beverages such as cola, and it is also possible that sugar iron complexes may alter iron availability. A randomized, crossover trial was conducted to compare the bioavailability of nonheme iron from a vegetarian pizza meal when consumed with 3 different beverages (cola, diet cola, and mineral water). Sixteen women with serum ferritin concentrations of 11-54 µg/L were recruited and completed the study. The pizza meal contained native iron and added ferric chloride solution as a stable isotope extrinsic label; the total iron content of the meal was ~5.3 mg. Incorporation of iron from the meal into RBC was not affected by the type of drink (9.9% with cola, 9.4% with diet cola, and 9.6% with water). Serum ferritin and plasma hepcidin were correlated (r = 0.66; P<0.001) and both were significant predictors of iron bioavailability, but their combined effect explain... |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 16,
"gather_across_devices": false
}
eval_strategy: epochper_device_train_batch_size: 128learning_rate: 2e-05num_train_epochs: 30warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 30max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 1.0 | 151 | 1.5321 | 0.0049 |
| 2.0 | 302 | 0.0982 | 0.0032 |
| 3.0 | 453 | 0.0662 | 0.0027 |
| 4.0 | 604 | 0.0524 | 0.0023 |
| 5.0 | 755 | 0.0443 | 0.0019 |
| 6.0 | 906 | 0.0367 | 0.0018 |
| 7.0 | 1057 | 0.0359 | 0.0018 |
| 8.0 | 1208 | 0.0308 | 0.0014 |
| 9.0 | 1359 | 0.0312 | 0.0014 |
| 10.0 | 1510 | 0.0261 | 0.0013 |
| 11.0 | 1661 | 0.0248 | 0.0014 |
| 12.0 | 1812 | 0.0235 | 0.0014 |
| 13.0 | 1963 | 0.0233 | 0.0013 |
| 14.0 | 2114 | 0.0241 | 0.0012 |
| 15.0 | 2265 | 0.0208 | 0.0013 |
| 16.0 | 2416 | 0.022 | 0.0016 |
| 17.0 | 2567 | 0.0192 | 0.0012 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Base model
intfloat/e5-small-v2