EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation

This model is a byte-level language model trained with a byte-level Universal transformer (UT) architecture for translating text descriptions to emojis.

Model Description

Model Type: Causal Language Model with Looped Transformer Architecture
Task: Text-to-Emoji Translation
Training Data: KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
Tokenizer: Byte-level (vocab size: 258)

Architecture Details

Looped Transformer Architecture:

Base Layers: 24
Number of Loops: 8 (layers are applied iteratively)
Shared Layers: True (parameter efficient)
Loop Residual: True (residual connections across loops)

Model Dimensions:

Hidden Dimension: 1024
Number of Attention Heads: 16
KV Heads: 16
Max Sequence Length: 512
RoPE Theta: 10000.0

Training Configuration

Training Steps: 5100
Batch Size: 12
Sequence Length: 512
Learning Rate: 0.0003
Warmup Steps: 1000
Optimizer: AdamW (β1=0.9, β2=0.95)
LR Scheduler: Cosine with min ratio 0.1
Gradient Clipping: 1.0
Weight Decay: 0.1
Precision: BF16

What is a Looped Transformer?

A looped transformer applies the same transformer layers multiple times in an iterative refinement process. This is particularly effective for translation tasks as it allows the model to:

Refine predictions through multiple iterations
Use parameters more efficiently (shared weights across loops)
Model complex input-output mappings with fewer total parameters

In this model, 24 layers are applied 8 times with residual connections between loops.

Intended Use

This model is designed to translate text descriptions into appropriate emojis.

Example Usage:

Input: "I love pizza"
Output: "🍕❤️"

Training Data

The model was trained on the KomeijiForce/Text2Emoji dataset, which contains over 500,000 text-emoji pairs.

Model Files

This repository contains:

consolidated.pth: PyTorch model weights
params.json: Complete model and training configuration
train_state_*.json: Training state information from checkpoint

Usage

To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:

import torch
import json

# Load model parameters
with open('params.json', 'r') as f:
    params = json.load(f)

# Load model weights
checkpoint = torch.load('consolidated.pth', map_location='cpu')

# Initialize model with your BFlowNet loopedLM architecture
# from apps.loopedLM import LoopedTransformer
# model = LoopedTransformer(**params['model'])
# model.load_state_dict(checkpoint)

Generation Parameters

For best results, use:

Max Tokens: 128 (outputs are typically short)
Temperature: 0.7 (for diverse emoji selection)
Top-p: 0.9

Limitations

The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
The model requires the specific looped transformer architecture implementation to load and use

Training Framework

This model was trained using the BFlowNet framework with looped transformer architecture.

Dataset: KomeijiForce/Text2Emoji

License

Apache 2.0

Downloads last month: 35

chili-lab
/

Emoji-ByteLM