File size: 6,954 Bytes
fe3c65a 022bd8e fe3c65a 022bd8e be5197c 64cee8c be5197c ec75f2d be5197c 7c4ab57 be5197c 583da5e 2842ef5 be5197c 3912219 be5197c 7c4ab57 be5197c 9d8a0fd be5197c 9d8a0fd be5197c 9a5f1b1 be5197c 9174ac1 9a5f1b1 9174ac1 be5197c 9d8a0fd be5197c 9d8a0fd be5197c 64cee8c be5197c 64cee8c be5197c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
---
base_model:
- black-forest-labs/FLUX.1-Fill-dev
pipeline_tag: text-to-image
library_name: transformers
tags:
- art
---
# Calligrapher: Freestyle Text Image Customization
<div align="center">
<img src="./assets/teaser.jpg" width="850px" alt="Calligrapher Teaser">
</div>
<div align="center">
<h3>π <a href="https://ezioby.github.io/Calligrapher/">Project Page</a> | π¦ <a href="https://github.com/Calligrapher2025/Calligrapher">Code</a> | π₯ <a href="https://youtu.be/FLSPphkylQE">Video</a> | π€ <a href="https://huggingface.co/spaces/Calligrapher2025/Calligrapher">HF_Demo</a> </h3>
</div>
## π― Overview
**Calligrapher** is a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Our framework supports text customization under various settings including self-reference, cross-reference, and non-text reference customization.
## β¨ Key Features
- **π¨ Freestyle Text Customization**: Generate text with diverse stylized images and text prompts
- **π Various Reference Modes**: Support for self-reference, cross-reference, and non-text reference customization
- **π High-Quality Results**: Photorealistic text image customization with consistent typography
- **π Multi-Language Support**: Style-centric text customization across diverse languages (see <a href="https://github.com/Calligrapher2025/Calligrapher/issues/1">this issue</a>)
<div align="center">
<img src="./assets/multilingual_samples.png" width="900px" alt="Multilingual Samples">
</div>
## π¦ Repository Contents
This Hugging Face repository contains:
- **`calligrapher.bin`**: Pre-trained Calligrapher model weights.
- **`Calligrapher_bench_testing.zip`**: Comprehensive test dataset with examples for both self-reference and cross-reference customization scenarios with additional reference images for testing, omitting a small portion of samples due to IP concerns.
## π οΈ Quick Start
### Installation
We provide two ways to set up the environment (requiring Python 3.10 + PyTorch 2.5.0 + CUDA):
#### Using pip
```bash
# Clone the repository
git clone https://github.com/Calligrapher2025/Calligrapher.git
cd Calligrapher
# Install dependencies
pip install -r requirements.txt
```
#### Using Conda
```bash
# Clone the repository
git clone https://github.com/Calligrapher2025/Calligrapher.git
cd Calligrapher
# Create and activate conda environment
conda env create -f env.yml
conda activate calligrapher
```
### Download Models & Testing Data
```python
from huggingface_hub import snapshot_download
# Download Calligrapher model and test data
snapshot_download("Calligrapher2025/Calligrapher")
# Download required base models (granted access needed for FLUX.1-Fill)
snapshot_download("black-forest-labs/FLUX.1-Fill-dev", token="your_token")
snapshot_download("google/siglip-so400m-patch14-384")
```
### Configuration
Before running the models, you need to configure the paths in `path_dict.json`:
```json
{
"data_dir": "path/to/Calligrapher_bench_testing",
"cli_save_dir": "path/to/cli_results",
"gradio_save_dir": "path/to/gradio_results",
"gradio_temp_dir": "path/to/gradio_tmp",
"base_model_path": "path/to/FLUX.1-Fill-dev",
"image_encoder_path": "path/to/siglip-so400m-patch14-384",
"calligrapher_path": "path/to/calligrapher.bin"
}
```
Configuration parameters:
- `data_dir`: Path to store the test dataset
- `cli_save_dir`: Path to save results from command-line interface experiments
- `gradio_save_dir`: Path to save results from Gradio interface experiments
- `gradio_temp_dir`: Path to save Gradio temporary files
- `base_model_path`: Path to the base model FLUX.1-Fill-dev
- `image_encoder_path`: Path to the SigLIP image encoder model
- `calligrapher_path`: Path to the Calligrapher model weights
### Run Gradio Demo
```bash
# Basic Gradio demo
python gradio_demo.py
# PLEASE consider trying examples here first - demo with custom mask upload (recommended for first-time users)
# This version includes pre-configured examples and is RECOMMENDED for users to first understand how to use the model
python gradio_demo_upload_mask.py
```
Below is a preview of the Gradio demo interfaces:
<div align="center">
<img src="./assets/gradio_preview.png" width="900px" alt="Gradio Demo Preview">
</div>
We also provide a gradio demo enabling multilingual freestyle text customization such as Chinese, which is supported by [TextFLUX](https://github.com/yyyyyxie/textflux). To use this gradio demo, first download [TextFLUX weights](https://huggingface.co/yyyyyxie/textflux-lora/blob/main/pytorch_lora_weights.safetensors) and configure the "textflux_path" entry in "path_dict.json". Then download [the font resource](https://github.com/yyyyyxie/textflux/blob/main/resource/font/Arial-Unicode-Regular.ttf) to "./resources/" and run:
```bash
python gradio_demo_multilingual.py
```
**β¨User Tips:**
1. **Speed vs Quality Trade-off.** Use fewer steps (e.g., 10-step which takes ~4s/image on a single A6000 GPU) for faster generation, but quality may be lower.
2. **Inpaint Position Freedom.** Inpainting positions are flexible - they don't necessarily need to match the original text locations in the input image.
3. **Iterative Editing.** Drag outputs from the gallery to the Image Editing Panel (clean the Editing Panel first) for quick refinements.
4. **Mask Optimization.** Adjust mask size/aspect ratio to match your desired content. The model tends to fill the masks, and harmonizes the generation with background in terms of color and lighting.
5. **Reference Image Tip.** White-background references improve style consistency - the encoder also considers background context of the given reference image.
6. **Resolution Balance.** Very high-resolution generation sometimes triggers spelling errors. 512/768px are recommended considering the model is trained under the resolution of 512.
## π¨ Command Line Usage Examples
### Self-reference Customization
```bash
python infer_calligrapher_self_custom.py
```
### Cross-reference Customization
```bash
python infer_calligrapher_cross_custom.py
```
**Note:** Image result files starting with "result" are the customization outputs, while files starting with "vis_result" are concatenated results showing the source image, reference image, and model output together.
## π Framework
<div align="center">
<img src="./assets/framework.jpg" width="900px" alt="Calligrapher Framework">
</div>
Our framework integrates localized style injection and diffusion-based learning, featuring:
- **Self-distillation mechanism** for automatic typography benchmark construction.
- **Localized style injection** via trainable style encoder.
- **In-context generation** for enhanced style alignment.
## π Results Gallery
<div align="center">
<img src="./assets/application.jpg" width="900px" alt="Calligrapher Applications">
</div> |