File size: 6,954 Bytes

fe3c65a
 
 
022bd8e
 
fe3c65a
 
 
022bd8e
be5197c
 
 
64cee8c
be5197c
 
 
ec75f2d
be5197c
 
 
 
 
 
 
 
7c4ab57
be5197c
 
583da5e
2842ef5
 
 
be5197c
 
 
 
 
3912219
be5197c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c4ab57
be5197c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d8a0fd
be5197c
 
 
 
 
 
 
 
 
 
9d8a0fd
 
be5197c
9a5f1b1
be5197c
 
9174ac1
 
 
 
 
 
9a5f1b1
 
 
 
 
9174ac1
 
 
 
 
 
 
 
 
 
 
 
 
 
be5197c
 
9d8a0fd
be5197c
 
 
 
9d8a0fd
be5197c
 
 
 
 
 
 
 
 
64cee8c
be5197c
 
 
 
 
 
 
 
 
 
64cee8c
be5197c

---
base_model:
- black-forest-labs/FLUX.1-Fill-dev
pipeline_tag: text-to-image
library_name: transformers
tags:
- art
---

# Calligrapher: Freestyle Text Image Customization

<div align="center">
  <img src="./assets/teaser.jpg" width="850px" alt="Calligrapher Teaser">
</div>

<div align="center">
  <h3>📄 <a href="https://ezioby.github.io/Calligrapher/">Project Page</a> | 📦 <a href="https://github.com/Calligrapher2025/Calligrapher">Code</a> | 🎥 <a href="https://youtu.be/FLSPphkylQE">Video</a> | 🤗 <a href="https://huggingface.co/spaces/Calligrapher2025/Calligrapher">HF_Demo</a> </h3>
</div>

## 🎯 Overview

**Calligrapher** is a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Our framework supports text customization under various settings including self-reference, cross-reference, and non-text reference customization.

## ✨ Key Features

- **🎨 Freestyle Text Customization**: Generate text with diverse stylized images and text prompts
- **🔄 Various Reference Modes**: Support for self-reference, cross-reference, and non-text reference customization
- **🚀 High-Quality Results**: Photorealistic text image customization with consistent typography
- **🌐 Multi-Language Support**: Style-centric text customization across diverse languages (see <a href="https://github.com/Calligrapher2025/Calligrapher/issues/1">this issue</a>)
<div align="center">
  <img src="./assets/multilingual_samples.png" width="900px" alt="Multilingual Samples">
</div>
## 📦 Repository Contents

This Hugging Face repository contains:

- **`calligrapher.bin`**: Pre-trained Calligrapher model weights.
- **`Calligrapher_bench_testing.zip`**: Comprehensive test dataset with examples for both self-reference and cross-reference customization scenarios with additional reference images for testing, omitting a small portion of samples due to IP concerns.



## 🛠️ Quick Start

### Installation

We provide two ways to set up the environment (requiring Python 3.10 + PyTorch 2.5.0 + CUDA):

#### Using pip
```bash
# Clone the repository
git clone https://github.com/Calligrapher2025/Calligrapher.git
cd Calligrapher

# Install dependencies
pip install -r requirements.txt
```

#### Using Conda
```bash
# Clone the repository
git clone https://github.com/Calligrapher2025/Calligrapher.git
cd Calligrapher

# Create and activate conda environment
conda env create -f env.yml
conda activate calligrapher
```

### Download Models & Testing Data

```python
from huggingface_hub import snapshot_download

# Download Calligrapher model and test data
snapshot_download("Calligrapher2025/Calligrapher")
# Download required base models (granted access needed for FLUX.1-Fill)
snapshot_download("black-forest-labs/FLUX.1-Fill-dev", token="your_token")
snapshot_download("google/siglip-so400m-patch14-384")
```

### Configuration

Before running the models, you need to configure the paths in `path_dict.json`:

```json
{
  "data_dir": "path/to/Calligrapher_bench_testing",
  "cli_save_dir": "path/to/cli_results",
  "gradio_save_dir": "path/to/gradio_results",
  "gradio_temp_dir": "path/to/gradio_tmp",
  "base_model_path": "path/to/FLUX.1-Fill-dev",
  "image_encoder_path": "path/to/siglip-so400m-patch14-384",
  "calligrapher_path": "path/to/calligrapher.bin"
}
```

Configuration parameters:
- `data_dir`: Path to store the test dataset
- `cli_save_dir`: Path to save results from command-line interface experiments
- `gradio_save_dir`: Path to save results from Gradio interface experiments
- `gradio_temp_dir`: Path to save Gradio temporary files
- `base_model_path`: Path to the base model FLUX.1-Fill-dev
- `image_encoder_path`: Path to the SigLIP image encoder model
- `calligrapher_path`: Path to the Calligrapher model weights

### Run Gradio Demo

```bash
# Basic Gradio demo
python gradio_demo.py

# PLEASE consider trying examples here first - demo with custom mask upload (recommended for first-time users)
# This version includes pre-configured examples and is RECOMMENDED for users to first understand how to use the model
python gradio_demo_upload_mask.py

```

Below is a preview of the Gradio demo interfaces:

<div align="center">
  <img src="./assets/gradio_preview.png" width="900px" alt="Gradio Demo Preview">
</div>

We also provide a gradio demo enabling multilingual freestyle text customization such as Chinese, which is supported by [TextFLUX](https://github.com/yyyyyxie/textflux). To use this gradio demo, first download [TextFLUX weights](https://huggingface.co/yyyyyxie/textflux-lora/blob/main/pytorch_lora_weights.safetensors) and configure the "textflux_path" entry in "path_dict.json". Then download [the font resource](https://github.com/yyyyyxie/textflux/blob/main/resource/font/Arial-Unicode-Regular.ttf) to "./resources/" and run:
```bash
python gradio_demo_multilingual.py
```

**✨User Tips:**

1. **Speed vs Quality Trade-off.** Use fewer steps (e.g., 10-step which takes ~4s/image on a single A6000 GPU) for faster generation, but quality may be lower.

2. **Inpaint Position Freedom.** Inpainting positions are flexible - they don't necessarily need to match the original text locations in the input image.

3. **Iterative Editing.** Drag outputs from the gallery to the Image Editing Panel (clean the Editing Panel first) for quick refinements.

4. **Mask Optimization.** Adjust mask size/aspect ratio to match your desired content. The model tends to fill the masks, and harmonizes the generation with background in terms of color and lighting.

5. **Reference Image Tip.** White-background references improve style consistency - the encoder also considers background context of the given reference image.

6. **Resolution Balance.** Very high-resolution generation sometimes triggers spelling errors. 512/768px are recommended considering the model is trained under the resolution of 512.

## 🎨 Command Line Usage Examples

### Self-reference Customization
```bash
python infer_calligrapher_self_custom.py
```

### Cross-reference Customization
```bash
python infer_calligrapher_cross_custom.py
```

**Note:** Image result files starting with "result" are the customization outputs, while files starting with "vis_result" are concatenated results showing the source image, reference image, and model output together.

## 📊 Framework

<div align="center">
  <img src="./assets/framework.jpg" width="900px" alt="Calligrapher Framework">
</div>

Our framework integrates localized style injection and diffusion-based learning, featuring:
- **Self-distillation mechanism** for automatic typography benchmark construction.
- **Localized style injection** via trainable style encoder.
- **In-context generation** for enhanced style alignment.

## 🎭 Results Gallery

<div align="center">
  <img src="./assets/application.jpg" width="900px" alt="Calligrapher Applications">
</div>