EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation
This model is a byte-level language model trained with a byte-level Universal transformer (UT) architecture for translating text descriptions to emojis.
Model Description
- Model Type: Causal Language Model with Looped Transformer Architecture
- Task: Text-to-Emoji Translation
- Training Data: KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
- Tokenizer: Byte-level (vocab size: 258)
Architecture Details
Looped Transformer Architecture:
- Base Layers: 24
- Number of Loops: 8 (layers are applied iteratively)
- Shared Layers: True (parameter efficient)
- Loop Residual: True (residual connections across loops)
Model Dimensions:
- Hidden Dimension: 1024
- Number of Attention Heads: 16
- KV Heads: 16
- Max Sequence Length: 512
- RoPE Theta: 10000.0
Training Configuration
- Training Steps: 5100
- Batch Size: 12
- Sequence Length: 512
- Learning Rate: 0.0003
- Warmup Steps: 1000
- Optimizer: AdamW (β1=0.9, β2=0.95)
- LR Scheduler: Cosine with min ratio 0.1
- Gradient Clipping: 1.0
- Weight Decay: 0.1
- Precision: BF16
What is a Looped Transformer?
A looped transformer applies the same transformer layers multiple times in an iterative refinement process. This is particularly effective for translation tasks as it allows the model to:
- Refine predictions through multiple iterations
- Use parameters more efficiently (shared weights across loops)
- Model complex input-output mappings with fewer total parameters
In this model, 24 layers are applied 8 times with residual connections between loops.
Intended Use
This model is designed to translate text descriptions into appropriate emojis.
Example Usage:
Input: "I love pizza"
Output: "🍕❤️"
Training Data
The model was trained on the KomeijiForce/Text2Emoji dataset, which contains over 500,000 text-emoji pairs.
Model Files
This repository contains:
consolidated.pth: PyTorch model weightsparams.json: Complete model and training configurationtrain_state_*.json: Training state information from checkpoint
Usage
To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:
import torch
import json
# Load model parameters
with open('params.json', 'r') as f:
params = json.load(f)
# Load model weights
checkpoint = torch.load('consolidated.pth', map_location='cpu')
# Initialize model with your BFlowNet loopedLM architecture
# from apps.loopedLM import LoopedTransformer
# model = LoopedTransformer(**params['model'])
# model.load_state_dict(checkpoint)
Generation Parameters
For best results, use:
- Max Tokens: 128 (outputs are typically short)
- Temperature: 0.7 (for diverse emoji selection)
- Top-p: 0.9
Limitations
- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
- The model requires the specific looped transformer architecture implementation to load and use
Training Framework
This model was trained using the BFlowNet framework with looped transformer architecture.
Dataset: KomeijiForce/Text2Emoji
License
Apache 2.0
- Downloads last month
- 35