EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation

This model is a byte-level language model trained with a byte-level Universal transformer (UT) architecture for translating text descriptions to emojis.

Model Description

  • Model Type: Causal Language Model with Looped Transformer Architecture
  • Task: Text-to-Emoji Translation
  • Training Data: KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
  • Tokenizer: Byte-level (vocab size: 258)

Architecture Details

Looped Transformer Architecture:

  • Base Layers: 24
  • Number of Loops: 8 (layers are applied iteratively)
  • Shared Layers: True (parameter efficient)
  • Loop Residual: True (residual connections across loops)

Model Dimensions:

  • Hidden Dimension: 1024
  • Number of Attention Heads: 16
  • KV Heads: 16
  • Max Sequence Length: 512
  • RoPE Theta: 10000.0

Training Configuration

  • Training Steps: 5100
  • Batch Size: 12
  • Sequence Length: 512
  • Learning Rate: 0.0003
  • Warmup Steps: 1000
  • Optimizer: AdamW (β1=0.9, β2=0.95)
  • LR Scheduler: Cosine with min ratio 0.1
  • Gradient Clipping: 1.0
  • Weight Decay: 0.1
  • Precision: BF16

What is a Looped Transformer?

A looped transformer applies the same transformer layers multiple times in an iterative refinement process. This is particularly effective for translation tasks as it allows the model to:

  • Refine predictions through multiple iterations
  • Use parameters more efficiently (shared weights across loops)
  • Model complex input-output mappings with fewer total parameters

In this model, 24 layers are applied 8 times with residual connections between loops.

Intended Use

This model is designed to translate text descriptions into appropriate emojis.

Example Usage:

Input: "I love pizza"
Output: "🍕❤️"

Training Data

The model was trained on the KomeijiForce/Text2Emoji dataset, which contains over 500,000 text-emoji pairs.

Model Files

This repository contains:

  • consolidated.pth: PyTorch model weights
  • params.json: Complete model and training configuration
  • train_state_*.json: Training state information from checkpoint

Usage

To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:

import torch
import json

# Load model parameters
with open('params.json', 'r') as f:
    params = json.load(f)

# Load model weights
checkpoint = torch.load('consolidated.pth', map_location='cpu')

# Initialize model with your BFlowNet loopedLM architecture
# from apps.loopedLM import LoopedTransformer
# model = LoopedTransformer(**params['model'])
# model.load_state_dict(checkpoint)

Generation Parameters

For best results, use:

  • Max Tokens: 128 (outputs are typically short)
  • Temperature: 0.7 (for diverse emoji selection)
  • Top-p: 0.9

Limitations

  • The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
  • Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
  • The model requires the specific looped transformer architecture implementation to load and use

Training Framework

This model was trained using the BFlowNet framework with looped transformer architecture.

Dataset: KomeijiForce/Text2Emoji

License

Apache 2.0

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train chili-lab/Emoji-ByteLM