made a few improvements on custom grpo trainer: - added sequence similarity reward (seems to work) - improved vllm support (5x inference speed) - adjusted reward scores (this helped with format/accuracy) - can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)
🔑 This project contains a text-to-text model designed to decrypt English and Turkish text encoded using a substitution cipher. In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, aiming to recover the original plaintext message.
These models were fine-tuned on T5-base. The models are for monoalphabetic English and Turkish substitution ciphers, and they output decoded text and the alphabet with an accuracy that has never been achieved before!
Example:
Encoded text: Z hztwgx tstcsf qf z ulooqfe osfuqb tzx uezx awej z ozewsbe vlfwby fsmqisfx.
Decoded text: A family member or a support person may stay with a patient during recovery.