FaceFusion-NextTech-2 / UI_IMPROVEMENTS_GUIDE.md
DegMaTsu's picture
Initial commit FaceFusion-NextTech-2
61cde45

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

FaceFusion UI - Complete Feature Guide & Tips

This comprehensive guide explains every section and option in the FaceFusion UI to help you achieve the best results.


πŸ“‹ Table of Contents

  1. Main Workflow
  2. Input Section
  3. Processors
  4. Face Detection & Selection
  5. Face Masking
  6. Output Settings
  7. Execution Settings
  8. Memory Management
  9. Tips for Best Results

Main Workflow

Basic Steps for Face Swapping

  1. Upload Source β†’ The face you want to apply
  2. Upload Target β†’ The image/video to modify
  3. Select Processors β†’ face_swapper + face_enhancer for best quality
  4. Configure Settings β†’ Adjust quality and options
  5. Preview β†’ Check a frame before processing
  6. Start Processing β†’ Generate final output

Input Section

SOURCE

Purpose: Upload the face image or audio file you want to apply to the target.

Supported Files:

  • Images: For face swapping (JPG, PNG, etc.)
  • Audio: For lip syncing (MP3, WAV, etc.)

Tips:

  • Use high-quality, well-lit images for best face swap results
  • Source face should be frontal or similar angle to target
  • Clear facial features produce better swaps

TARGET

Purpose: Upload the base image or video that will be modified.

Supported Files:

  • Images: Single image face swap
  • Videos: Video face swap/lip sync

Tips:

  • Higher resolution = better quality but slower processing
  • Good lighting on faces improves detection and swap quality
  • Videos with stable faces work better than highly dynamic scenes

OUTPUT PATH

Purpose: Specify where the processed result will be saved.

Tips:

  • Use descriptive filenames to organize your outputs
  • Default saves to temp directory - specify custom path for permanent storage

Processors

PROCESSORS SELECTION

Select one or more AI processors to apply to your content:

face_swapper ⭐ (Recommended)

  • Swaps faces from source to target
  • Best Models: inswapper_128, blendswap_256
  • Pixel Boost: Use 1024x1024 for maximum quality
  • Higher resolution = better detail but slower processing

face_enhancer ⭐ (Recommended)

  • Improves face quality and details after swapping
  • Best Models: gfpgan_1.4, restoreformer_plus_plus
  • Blend: 80-100 for strong enhancement
  • Weight: Adjust for different model variants
  • Use together with face_swapper for professional results

lip_syncer

  • Synchronizes lips to audio file
  • Requirements: Source audio file must be uploaded
  • Best Model: wav2lip_gan_96 for quality
  • Weight: 1.0 for full sync, lower to blend with original

age_modifier

  • Makes faces younger or older
  • Direction: Negative = younger, Positive = older
  • Range: -100 (very young) to +100 (very old)

expression_restorer

  • Restores target's original facial expressions
  • Factor: 100 = full target expression, 0 = source expression
  • Useful to maintain natural emotions after face swap

frame_enhancer

  • Upscales entire frame (not just face)
  • Models: real_esrgan_x4 (4x upscale), ultra_sharp_x4 (sharper)
  • Use for low-resolution videos
  • Very slow - use only when needed

frame_colorizer

  • Colorizes black & white videos/images
  • Multiple artistic styles available

face_editor

  • Manually adjust facial features
  • Control eyes, mouth, head rotation, expressions
  • Advanced feature for fine-tuning

face_debugger

  • Shows detection boxes, landmarks, scores
  • Useful for troubleshooting detection issues

Face Detection & Selection

FACE DETECTOR

Purpose: Detects faces in images/videos for processing.

Face Detector Model

  • yolo_face: Recommended - best accuracy and speed
  • retinaface: Good alternative

Face Detector Size

  • 640x640: Balanced speed and accuracy (recommended)
  • 320x320: Faster but may miss faces
  • 1280x1280: Best accuracy but slower

Face Detector Angles

  • Enable to detect rotated/tilted faces
  • More angles = better detection but slower
  • Use when faces aren't upright

Face Detector Score

  • Confidence threshold (0-1)
  • 0.5: Standard - good balance
  • Higher = stricter detection, fewer false positives
  • Lower = detect more faces but more false positives

FACE LANDMARKER

Purpose: Detects facial landmarks (eyes, nose, mouth) for accurate alignment.

Face Landmarker Model

  • Detects 5 or 68 facial points
  • Essential for proper face alignment and swapping

Face Landmarker Score

  • Confidence threshold (0-1)
  • 0.5: Generally works well
  • Higher = more accurate landmark detection required

FACE SELECTOR MODE

Purpose: Choose which faces to process in the target.

Modes:

  • One: Process first detected face only
  • Many: Process all detected faces
  • Reference: Track specific face across video frames (best for videos)
  • Age/Gender/Race filters: Target specific demographics

Reference Face Distance

  • Similarity threshold for reference tracking
  • Lower = stricter matching (same person)
  • Higher = more lenient matching

Tips:

  • Use Reference mode for videos with multiple people
  • Use One for single-person content
  • Use filters to target specific faces in multi-person scenes

Face Masking

PURPOSE

Control which parts of the face are swapped and how they blend.

Face Mask Types

Box

  • Simple rectangular mask around face
  • Blur: Controls edge softness (0.3-0.5 recommended)
  • Padding: Expand mask in each direction (top, right, bottom, left)
  • Fast and simple

Occlusion

  • Avoids occluded areas (glasses, hands, hair)
  • Uses face occluder model
  • More natural when face is partially covered

Region

  • Masks specific facial regions
  • Uses face parser model
  • Select regions: eyes, nose, mouth, skin, etc.

Area

  • Masks by facial areas
  • Combine multiple for custom masking

Tips:

  • Combine mask types for best results
  • Increase blur for smoother blending
  • Adjust padding if face edges are visible

Output Settings

IMAGE OUTPUT

Output Image Quality (0-100)

  • JPEG compression quality
  • 90-95: Recommended for high quality
  • 100: Maximum quality (larger file)
  • 70-80: Good quality, smaller file

Output Image Resolution

  • Can upscale or downscale from original
  • Match source resolution for best quality
  • Upscaling beyond 2x may look artificial

VIDEO OUTPUT

Output Video Encoder

  • libx264: Widely compatible, good quality
  • libx265/hevc: Better compression, smaller files
  • h264_nvenc: GPU-accelerated (NVIDIA only)
  • copy: Preserve original encoding

Output Video Preset

  • ultrafast: Quick but large file
  • fast/medium: Balanced
  • slow/slower: Best quality and compression (recommended)
  • veryslow: Maximum quality, very slow encoding

Output Video Quality (0-100)

  • 90-95: Recommended for professional results
  • 80-85: Good quality, reasonable file size
  • Higher = better visual quality, larger files

Output Video Resolution

  • Can upscale or downscale
  • Higher resolution requires more processing time
  • Match original for best quality/performance ratio

Output Video FPS

  • 24: Cinematic look
  • 30: Standard video
  • 60: Smooth motion
  • Match original video FPS for best results

AUDIO OUTPUT (for videos)

Output Audio Encoder

  • aac: Widely compatible, good quality (recommended)
  • libmp3lame: MP3 format
  • copy: Preserve original audio

Output Audio Quality (0-100)

  • 80-90: CD quality
  • 100: Lossless
  • Higher = better sound, larger file

Output Audio Volume (0-200%)

  • 100: Original volume
  • <100: Quieter
  • >100: Louder (may cause distortion)

Execution Settings

EXECUTION PROVIDERS

Purpose: Choose hardware acceleration for processing.

Options:

  • CUDAExecutionProvider: NVIDIA GPU acceleration (fastest)
  • CoreMLExecutionProvider: Apple Silicon acceleration
  • CPUExecutionProvider: CPU only (slowest but always available)

Tips:

  • Use GPU providers when available for 10-50x speedup
  • CPU is very slow but works on any system
  • Some models require specific providers

EXECUTION THREAD COUNT

Purpose: Number of parallel processing threads.

Recommendations:

  • Set to your CPU core count for optimal performance
  • Higher = faster but uses more CPU/GPU
  • Lower if system becomes unresponsive

EXECUTION QUEUE COUNT

Purpose: Frames each thread processes before returning.

Recommendations:

  • 1-2: Recommended for most cases
  • Higher = better GPU utilization but more VRAM needed
  • Lower = less memory usage

Memory Management

VIDEO MEMORY STRATEGY

Purpose: Balance processing speed vs VRAM usage.

Options:

  • Strict: Low memory usage, slower processing
  • Moderate: Balanced (recommended)
  • Tolerant: Faster but uses more VRAM

Tips:

  • Use Strict if you get out-of-memory errors
  • Use Tolerant if you have high-end GPU (12GB+ VRAM)

SYSTEM MEMORY LIMIT

Purpose: Limit RAM usage during processing.

  • 0: No limit
  • Set value (in GB) to prevent system crashes
  • Useful for systems with limited RAM

Tips for Best Results

🌟 Quality Settings (Best Quality)

Processors: face_swapper + face_enhancer
Face Swapper Model: inswapper_128
Pixel Boost: 1024x1024
Face Enhancer Model: gfpgan_1.4
Face Enhancer Blend: 80-100
Output Image/Video Quality: 90-95
Video Preset: slow or slower

⚑ Speed Settings (Faster Processing)

Processors: face_swapper only
Face Swapper Model: inswapper_128
Pixel Boost: 512x512 or 768x768
Skip face_enhancer
Output Quality: 80-85
Video Preset: medium or fast
Execution Threads: Max CPU cores

🎯 Troubleshooting

Face Not Detected

  • Check face detector score (try lowering to 0.3)
  • Enable more detector angles
  • Increase detector size to 1280x1280
  • Ensure face is visible and well-lit

Poor Swap Quality

  • Increase pixel boost to 1024x1024
  • Add face_enhancer processor
  • Use higher output quality (90-95)
  • Ensure source and target faces are similar angles

Out of Memory Error

  • Lower pixel boost to 512x512 or 768x768
  • Set video memory strategy to "strict"
  • Reduce execution queue count to 1
  • Lower output resolution
  • Process shorter video segments using trim frame

Slow Processing

  • Use GPU execution provider (CUDA/CoreML)
  • Reduce pixel boost
  • Skip face_enhancer for faster processing
  • Lower execution thread count
  • Use faster video preset (medium/fast)

Unnatural Blending

  • Increase face mask blur (0.4-0.6)
  • Adjust face mask padding
  • Enable occlusion mask type
  • Lower face enhancer blend

Workflow Examples

Example 1: High-Quality Photo Face Swap

  1. Upload high-resolution source face image
  2. Upload target photo
  3. Select: face_swapper + face_enhancer
  4. Settings:
    • Face Swapper: inswapper_128, 1024x1024
    • Face Enhancer: gfpgan_1.4, blend 90
    • Output Quality: 95
  5. Preview result
  6. Process

Example 2: Video Face Swap (Multiple People)

  1. Upload source face
  2. Upload target video
  3. Select: face_swapper + face_enhancer
  4. Face Selector: Reference mode
  5. Click reference face in gallery
  6. Settings:
    • Pixel boost: 1024x1024
    • Video quality: 90
    • Preset: slow
  7. Use trim frame to process test segment first
  8. Process full video

Example 3: Lip Sync Video

  1. Upload source audio (speech/song)
  2. Upload target video
  3. Select: lip_syncer + face_swapper (optional)
  4. Settings:
    • Lip Syncer: wav2lip_gan_96
    • Weight: 1.0
  5. Process

Summary Table

Feature Recommended Setting Purpose
Face Swapper Model inswapper_128 Best quality swapping
Pixel Boost 1024x1024 Maximum detail
Face Enhancer gfpgan_1.4, blend 80 Improve quality
Output Quality 90-95 Near-lossless
Video Preset slow/slower Best compression
Execution Provider CUDA/CoreML GPU acceleration
Face Selector Reference (videos) Track specific person
Face Mask Blur 0.3-0.5 Natural blending

Last Updated: October 6, 2025

For more information, visit the official FaceFusion documentation.