Spaces:

DegMaTsu
/

FaceFusion-NextTech-2

Running

App Files Files Community

FaceFusion-NextTech-2 / UI_IMPROVEMENTS_GUIDE.md

DegMaTsu

Initial commit FaceFusion-NextTech-2

61cde45 2 months ago

preview code

raw

history blame contribute delete

12.5 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

FaceFusion UI - Complete Feature Guide & Tips

This comprehensive guide explains every section and option in the FaceFusion UI to help you achieve the best results.

Main Workflow

Basic Steps for Face Swapping

Upload Source → The face you want to apply
Upload Target → The image/video to modify
Select Processors → face_swapper + face_enhancer for best quality
Configure Settings → Adjust quality and options
Preview → Check a frame before processing
Start Processing → Generate final output

Input Section

SOURCE

Purpose: Upload the face image or audio file you want to apply to the target.

Supported Files:

Images: For face swapping (JPG, PNG, etc.)
Audio: For lip syncing (MP3, WAV, etc.)

Tips:

Use high-quality, well-lit images for best face swap results
Source face should be frontal or similar angle to target
Clear facial features produce better swaps

TARGET

Purpose: Upload the base image or video that will be modified.

Supported Files:

Images: Single image face swap
Videos: Video face swap/lip sync

Tips:

Higher resolution = better quality but slower processing
Good lighting on faces improves detection and swap quality
Videos with stable faces work better than highly dynamic scenes

OUTPUT PATH

Purpose: Specify where the processed result will be saved.

Tips:

Use descriptive filenames to organize your outputs
Default saves to temp directory - specify custom path for permanent storage

Processors

PROCESSORS SELECTION

Select one or more AI processors to apply to your content:

face_swapper ⭐ (Recommended)

Swaps faces from source to target
Best Models: inswapper_128, blendswap_256
Pixel Boost: Use 1024x1024 for maximum quality
Higher resolution = better detail but slower processing

face_enhancer ⭐ (Recommended)

Improves face quality and details after swapping
Best Models: gfpgan_1.4, restoreformer_plus_plus
Blend: 80-100 for strong enhancement
Weight: Adjust for different model variants
Use together with face_swapper for professional results

lip_syncer

Synchronizes lips to audio file
Requirements: Source audio file must be uploaded
Best Model: wav2lip_gan_96 for quality
Weight: 1.0 for full sync, lower to blend with original

age_modifier

Makes faces younger or older
Direction: Negative = younger, Positive = older
Range: -100 (very young) to +100 (very old)

expression_restorer

Restores target's original facial expressions
Factor: 100 = full target expression, 0 = source expression
Useful to maintain natural emotions after face swap

frame_enhancer

Upscales entire frame (not just face)
Models: real_esrgan_x4 (4x upscale), ultra_sharp_x4 (sharper)
Use for low-resolution videos
Very slow - use only when needed

frame_colorizer

Colorizes black & white videos/images
Multiple artistic styles available

face_editor

Manually adjust facial features
Control eyes, mouth, head rotation, expressions
Advanced feature for fine-tuning

face_debugger

Shows detection boxes, landmarks, scores
Useful for troubleshooting detection issues

Face Detection & Selection

FACE DETECTOR

Purpose: Detects faces in images/videos for processing.

Face Detector Model

yolo_face: Recommended - best accuracy and speed
retinaface: Good alternative

Face Detector Size

640x640: Balanced speed and accuracy (recommended)
320x320: Faster but may miss faces
1280x1280: Best accuracy but slower

Face Detector Angles

Enable to detect rotated/tilted faces
More angles = better detection but slower
Use when faces aren't upright

Face Detector Score

Confidence threshold (0-1)
0.5: Standard - good balance
Higher = stricter detection, fewer false positives
Lower = detect more faces but more false positives

FACE LANDMARKER

Purpose: Detects facial landmarks (eyes, nose, mouth) for accurate alignment.

Face Landmarker Model

Detects 5 or 68 facial points
Essential for proper face alignment and swapping

Face Landmarker Score

Confidence threshold (0-1)
0.5: Generally works well
Higher = more accurate landmark detection required

FACE SELECTOR MODE

Purpose: Choose which faces to process in the target.

Modes:

One: Process first detected face only
Many: Process all detected faces
Reference: Track specific face across video frames (best for videos)
Age/Gender/Race filters: Target specific demographics

Reference Face Distance

Similarity threshold for reference tracking
Lower = stricter matching (same person)
Higher = more lenient matching

Tips:

Use Reference mode for videos with multiple people
Use One for single-person content
Use filters to target specific faces in multi-person scenes

Face Masking

PURPOSE

Control which parts of the face are swapped and how they blend.

Face Mask Types

Box

Simple rectangular mask around face
Blur: Controls edge softness (0.3-0.5 recommended)
Padding: Expand mask in each direction (top, right, bottom, left)
Fast and simple

Occlusion

Avoids occluded areas (glasses, hands, hair)
Uses face occluder model
More natural when face is partially covered

Region

Masks specific facial regions
Uses face parser model
Select regions: eyes, nose, mouth, skin, etc.

Area

Masks by facial areas
Combine multiple for custom masking

Tips:

Combine mask types for best results
Increase blur for smoother blending
Adjust padding if face edges are visible

Output Settings

IMAGE OUTPUT

Output Image Quality (0-100)

JPEG compression quality
90-95: Recommended for high quality
100: Maximum quality (larger file)
70-80: Good quality, smaller file

Output Image Resolution

Can upscale or downscale from original
Match source resolution for best quality
Upscaling beyond 2x may look artificial

VIDEO OUTPUT

Output Video Encoder

libx264: Widely compatible, good quality
libx265/hevc: Better compression, smaller files
h264_nvenc: GPU-accelerated (NVIDIA only)
copy: Preserve original encoding

Output Video Preset

ultrafast: Quick but large file
fast/medium: Balanced
slow/slower: Best quality and compression (recommended)
veryslow: Maximum quality, very slow encoding

Output Video Quality (0-100)

90-95: Recommended for professional results
80-85: Good quality, reasonable file size
Higher = better visual quality, larger files

Output Video Resolution

Can upscale or downscale
Higher resolution requires more processing time
Match original for best quality/performance ratio

Output Video FPS

24: Cinematic look
30: Standard video
60: Smooth motion
Match original video FPS for best results

AUDIO OUTPUT (for videos)

Output Audio Encoder

aac: Widely compatible, good quality (recommended)
libmp3lame: MP3 format
copy: Preserve original audio

Output Audio Quality (0-100)

80-90: CD quality
100: Lossless
Higher = better sound, larger file

Output Audio Volume (0-200%)

100: Original volume
<100: Quieter
>100: Louder (may cause distortion)

Execution Settings

EXECUTION PROVIDERS

Purpose: Choose hardware acceleration for processing.

Options:

CUDAExecutionProvider: NVIDIA GPU acceleration (fastest)
CoreMLExecutionProvider: Apple Silicon acceleration
CPUExecutionProvider: CPU only (slowest but always available)

Tips:

Use GPU providers when available for 10-50x speedup
CPU is very slow but works on any system
Some models require specific providers

EXECUTION THREAD COUNT

Purpose: Number of parallel processing threads.

Recommendations:

Set to your CPU core count for optimal performance
Higher = faster but uses more CPU/GPU
Lower if system becomes unresponsive

EXECUTION QUEUE COUNT

Purpose: Frames each thread processes before returning.

Recommendations:

1-2: Recommended for most cases
Higher = better GPU utilization but more VRAM needed
Lower = less memory usage

Memory Management

VIDEO MEMORY STRATEGY

Purpose: Balance processing speed vs VRAM usage.

Options:

Strict: Low memory usage, slower processing
Moderate: Balanced (recommended)
Tolerant: Faster but uses more VRAM

Tips:

Use Strict if you get out-of-memory errors
Use Tolerant if you have high-end GPU (12GB+ VRAM)

SYSTEM MEMORY LIMIT

Purpose: Limit RAM usage during processing.

0: No limit
Set value (in GB) to prevent system crashes
Useful for systems with limited RAM

Tips for Best Results

🌟 Quality Settings (Best Quality)

Processors: face_swapper + face_enhancer
Face Swapper Model: inswapper_128
Pixel Boost: 1024x1024
Face Enhancer Model: gfpgan_1.4
Face Enhancer Blend: 80-100
Output Image/Video Quality: 90-95
Video Preset: slow or slower

⚡ Speed Settings (Faster Processing)

Processors: face_swapper only
Face Swapper Model: inswapper_128
Pixel Boost: 512x512 or 768x768
Skip face_enhancer
Output Quality: 80-85
Video Preset: medium or fast
Execution Threads: Max CPU cores

🎯 Troubleshooting

Face Not Detected

Check face detector score (try lowering to 0.3)
Enable more detector angles
Increase detector size to 1280x1280
Ensure face is visible and well-lit

Poor Swap Quality

Increase pixel boost to 1024x1024
Add face_enhancer processor
Use higher output quality (90-95)
Ensure source and target faces are similar angles

Out of Memory Error

Lower pixel boost to 512x512 or 768x768
Set video memory strategy to "strict"
Reduce execution queue count to 1
Lower output resolution
Process shorter video segments using trim frame

Slow Processing

Use GPU execution provider (CUDA/CoreML)
Reduce pixel boost
Skip face_enhancer for faster processing
Lower execution thread count
Use faster video preset (medium/fast)

Unnatural Blending

Increase face mask blur (0.4-0.6)
Adjust face mask padding
Enable occlusion mask type
Lower face enhancer blend

Workflow Examples

Example 1: High-Quality Photo Face Swap

Upload high-resolution source face image
Upload target photo
Select: face_swapper + face_enhancer
Settings:
- Face Swapper: inswapper_128, 1024x1024
- Face Enhancer: gfpgan_1.4, blend 90
- Output Quality: 95
Preview result
Process

Example 2: Video Face Swap (Multiple People)

Upload source face
Upload target video
Select: face_swapper + face_enhancer
Face Selector: Reference mode
Click reference face in gallery
Settings:
- Pixel boost: 1024x1024
- Video quality: 90
- Preset: slow
Use trim frame to process test segment first
Process full video

Example 3: Lip Sync Video

Upload source audio (speech/song)
Upload target video
Select: lip_syncer + face_swapper (optional)
Settings:
- Lip Syncer: wav2lip_gan_96
- Weight: 1.0
Process

Summary Table

Feature	Recommended Setting	Purpose
Face Swapper Model	inswapper_128	Best quality swapping
Pixel Boost	1024x1024	Maximum detail
Face Enhancer	gfpgan_1.4, blend 80	Improve quality
Output Quality	90-95	Near-lossless
Video Preset	slow/slower	Best compression
Execution Provider	CUDA/CoreML	GPU acceleration
Face Selector	Reference (videos)	Track specific person
Face Mask Blur	0.3-0.5	Natural blending

Last Updated: October 6, 2025

For more information, visit the official FaceFusion documentation.