Spaces:
Running
A newer version of the Gradio SDK is available:
6.2.0
FaceFusion UI - Complete Feature Guide & Tips
This comprehensive guide explains every section and option in the FaceFusion UI to help you achieve the best results.
π Table of Contents
- Main Workflow
- Input Section
- Processors
- Face Detection & Selection
- Face Masking
- Output Settings
- Execution Settings
- Memory Management
- Tips for Best Results
Main Workflow
Basic Steps for Face Swapping
- Upload Source β The face you want to apply
- Upload Target β The image/video to modify
- Select Processors β face_swapper + face_enhancer for best quality
- Configure Settings β Adjust quality and options
- Preview β Check a frame before processing
- Start Processing β Generate final output
Input Section
SOURCE
Purpose: Upload the face image or audio file you want to apply to the target.
Supported Files:
- Images: For face swapping (JPG, PNG, etc.)
- Audio: For lip syncing (MP3, WAV, etc.)
Tips:
- Use high-quality, well-lit images for best face swap results
- Source face should be frontal or similar angle to target
- Clear facial features produce better swaps
TARGET
Purpose: Upload the base image or video that will be modified.
Supported Files:
- Images: Single image face swap
- Videos: Video face swap/lip sync
Tips:
- Higher resolution = better quality but slower processing
- Good lighting on faces improves detection and swap quality
- Videos with stable faces work better than highly dynamic scenes
OUTPUT PATH
Purpose: Specify where the processed result will be saved.
Tips:
- Use descriptive filenames to organize your outputs
- Default saves to temp directory - specify custom path for permanent storage
Processors
PROCESSORS SELECTION
Select one or more AI processors to apply to your content:
face_swapper β (Recommended)
- Swaps faces from source to target
- Best Models:
inswapper_128,blendswap_256 - Pixel Boost: Use
1024x1024for maximum quality - Higher resolution = better detail but slower processing
face_enhancer β (Recommended)
- Improves face quality and details after swapping
- Best Models:
gfpgan_1.4,restoreformer_plus_plus - Blend: 80-100 for strong enhancement
- Weight: Adjust for different model variants
- Use together with face_swapper for professional results
lip_syncer
- Synchronizes lips to audio file
- Requirements: Source audio file must be uploaded
- Best Model:
wav2lip_gan_96for quality - Weight: 1.0 for full sync, lower to blend with original
age_modifier
- Makes faces younger or older
- Direction: Negative = younger, Positive = older
- Range: -100 (very young) to +100 (very old)
expression_restorer
- Restores target's original facial expressions
- Factor: 100 = full target expression, 0 = source expression
- Useful to maintain natural emotions after face swap
frame_enhancer
- Upscales entire frame (not just face)
- Models:
real_esrgan_x4(4x upscale),ultra_sharp_x4(sharper) - Use for low-resolution videos
- Very slow - use only when needed
frame_colorizer
- Colorizes black & white videos/images
- Multiple artistic styles available
face_editor
- Manually adjust facial features
- Control eyes, mouth, head rotation, expressions
- Advanced feature for fine-tuning
face_debugger
- Shows detection boxes, landmarks, scores
- Useful for troubleshooting detection issues
Face Detection & Selection
FACE DETECTOR
Purpose: Detects faces in images/videos for processing.
Face Detector Model
- yolo_face: Recommended - best accuracy and speed
- retinaface: Good alternative
Face Detector Size
- 640x640: Balanced speed and accuracy (recommended)
- 320x320: Faster but may miss faces
- 1280x1280: Best accuracy but slower
Face Detector Angles
- Enable to detect rotated/tilted faces
- More angles = better detection but slower
- Use when faces aren't upright
Face Detector Score
- Confidence threshold (0-1)
- 0.5: Standard - good balance
- Higher = stricter detection, fewer false positives
- Lower = detect more faces but more false positives
FACE LANDMARKER
Purpose: Detects facial landmarks (eyes, nose, mouth) for accurate alignment.
Face Landmarker Model
- Detects 5 or 68 facial points
- Essential for proper face alignment and swapping
Face Landmarker Score
- Confidence threshold (0-1)
- 0.5: Generally works well
- Higher = more accurate landmark detection required
FACE SELECTOR MODE
Purpose: Choose which faces to process in the target.
Modes:
- One: Process first detected face only
- Many: Process all detected faces
- Reference: Track specific face across video frames (best for videos)
- Age/Gender/Race filters: Target specific demographics
Reference Face Distance
- Similarity threshold for reference tracking
- Lower = stricter matching (same person)
- Higher = more lenient matching
Tips:
- Use Reference mode for videos with multiple people
- Use One for single-person content
- Use filters to target specific faces in multi-person scenes
Face Masking
PURPOSE
Control which parts of the face are swapped and how they blend.
Face Mask Types
Box
- Simple rectangular mask around face
- Blur: Controls edge softness (0.3-0.5 recommended)
- Padding: Expand mask in each direction (top, right, bottom, left)
- Fast and simple
Occlusion
- Avoids occluded areas (glasses, hands, hair)
- Uses face occluder model
- More natural when face is partially covered
Region
- Masks specific facial regions
- Uses face parser model
- Select regions: eyes, nose, mouth, skin, etc.
Area
- Masks by facial areas
- Combine multiple for custom masking
Tips:
- Combine mask types for best results
- Increase blur for smoother blending
- Adjust padding if face edges are visible
Output Settings
IMAGE OUTPUT
Output Image Quality (0-100)
- JPEG compression quality
- 90-95: Recommended for high quality
- 100: Maximum quality (larger file)
- 70-80: Good quality, smaller file
Output Image Resolution
- Can upscale or downscale from original
- Match source resolution for best quality
- Upscaling beyond 2x may look artificial
VIDEO OUTPUT
Output Video Encoder
- libx264: Widely compatible, good quality
- libx265/hevc: Better compression, smaller files
- h264_nvenc: GPU-accelerated (NVIDIA only)
- copy: Preserve original encoding
Output Video Preset
- ultrafast: Quick but large file
- fast/medium: Balanced
- slow/slower: Best quality and compression (recommended)
- veryslow: Maximum quality, very slow encoding
Output Video Quality (0-100)
- 90-95: Recommended for professional results
- 80-85: Good quality, reasonable file size
- Higher = better visual quality, larger files
Output Video Resolution
- Can upscale or downscale
- Higher resolution requires more processing time
- Match original for best quality/performance ratio
Output Video FPS
- 24: Cinematic look
- 30: Standard video
- 60: Smooth motion
- Match original video FPS for best results
AUDIO OUTPUT (for videos)
Output Audio Encoder
- aac: Widely compatible, good quality (recommended)
- libmp3lame: MP3 format
- copy: Preserve original audio
Output Audio Quality (0-100)
- 80-90: CD quality
- 100: Lossless
- Higher = better sound, larger file
Output Audio Volume (0-200%)
- 100: Original volume
- <100: Quieter
- >100: Louder (may cause distortion)
Execution Settings
EXECUTION PROVIDERS
Purpose: Choose hardware acceleration for processing.
Options:
- CUDAExecutionProvider: NVIDIA GPU acceleration (fastest)
- CoreMLExecutionProvider: Apple Silicon acceleration
- CPUExecutionProvider: CPU only (slowest but always available)
Tips:
- Use GPU providers when available for 10-50x speedup
- CPU is very slow but works on any system
- Some models require specific providers
EXECUTION THREAD COUNT
Purpose: Number of parallel processing threads.
Recommendations:
- Set to your CPU core count for optimal performance
- Higher = faster but uses more CPU/GPU
- Lower if system becomes unresponsive
EXECUTION QUEUE COUNT
Purpose: Frames each thread processes before returning.
Recommendations:
- 1-2: Recommended for most cases
- Higher = better GPU utilization but more VRAM needed
- Lower = less memory usage
Memory Management
VIDEO MEMORY STRATEGY
Purpose: Balance processing speed vs VRAM usage.
Options:
- Strict: Low memory usage, slower processing
- Moderate: Balanced (recommended)
- Tolerant: Faster but uses more VRAM
Tips:
- Use Strict if you get out-of-memory errors
- Use Tolerant if you have high-end GPU (12GB+ VRAM)
SYSTEM MEMORY LIMIT
Purpose: Limit RAM usage during processing.
- 0: No limit
- Set value (in GB) to prevent system crashes
- Useful for systems with limited RAM
Tips for Best Results
π Quality Settings (Best Quality)
Processors: face_swapper + face_enhancer
Face Swapper Model: inswapper_128
Pixel Boost: 1024x1024
Face Enhancer Model: gfpgan_1.4
Face Enhancer Blend: 80-100
Output Image/Video Quality: 90-95
Video Preset: slow or slower
β‘ Speed Settings (Faster Processing)
Processors: face_swapper only
Face Swapper Model: inswapper_128
Pixel Boost: 512x512 or 768x768
Skip face_enhancer
Output Quality: 80-85
Video Preset: medium or fast
Execution Threads: Max CPU cores
π― Troubleshooting
Face Not Detected
- Check face detector score (try lowering to 0.3)
- Enable more detector angles
- Increase detector size to 1280x1280
- Ensure face is visible and well-lit
Poor Swap Quality
- Increase pixel boost to 1024x1024
- Add face_enhancer processor
- Use higher output quality (90-95)
- Ensure source and target faces are similar angles
Out of Memory Error
- Lower pixel boost to 512x512 or 768x768
- Set video memory strategy to "strict"
- Reduce execution queue count to 1
- Lower output resolution
- Process shorter video segments using trim frame
Slow Processing
- Use GPU execution provider (CUDA/CoreML)
- Reduce pixel boost
- Skip face_enhancer for faster processing
- Lower execution thread count
- Use faster video preset (medium/fast)
Unnatural Blending
- Increase face mask blur (0.4-0.6)
- Adjust face mask padding
- Enable occlusion mask type
- Lower face enhancer blend
Workflow Examples
Example 1: High-Quality Photo Face Swap
- Upload high-resolution source face image
- Upload target photo
- Select: face_swapper + face_enhancer
- Settings:
- Face Swapper: inswapper_128, 1024x1024
- Face Enhancer: gfpgan_1.4, blend 90
- Output Quality: 95
- Preview result
- Process
Example 2: Video Face Swap (Multiple People)
- Upload source face
- Upload target video
- Select: face_swapper + face_enhancer
- Face Selector: Reference mode
- Click reference face in gallery
- Settings:
- Pixel boost: 1024x1024
- Video quality: 90
- Preset: slow
- Use trim frame to process test segment first
- Process full video
Example 3: Lip Sync Video
- Upload source audio (speech/song)
- Upload target video
- Select: lip_syncer + face_swapper (optional)
- Settings:
- Lip Syncer: wav2lip_gan_96
- Weight: 1.0
- Process
Summary Table
| Feature | Recommended Setting | Purpose |
|---|---|---|
| Face Swapper Model | inswapper_128 | Best quality swapping |
| Pixel Boost | 1024x1024 | Maximum detail |
| Face Enhancer | gfpgan_1.4, blend 80 | Improve quality |
| Output Quality | 90-95 | Near-lossless |
| Video Preset | slow/slower | Best compression |
| Execution Provider | CUDA/CoreML | GPU acceleration |
| Face Selector | Reference (videos) | Track specific person |
| Face Mask Blur | 0.3-0.5 | Natural blending |
Last Updated: October 6, 2025
For more information, visit the official FaceFusion documentation.