Spaces:

DegMaTsu
/

FaceFusion-NextTech-2

Paused

App Files Files Community

FaceFusion-NextTech-2 / UI_IMPROVEMENTS_GUIDE.md

DegMaTsu

Initial commit FaceFusion-NextTech-2

61cde45 2 months ago

preview code

raw

history blame contribute delete

12.5 kB

	# FaceFusion UI - Complete Feature Guide & Tips

	This comprehensive guide explains every section and option in the FaceFusion UI to help you achieve the best results.

	---

	## 📋 Table of Contents
	1. [Main Workflow](#main-workflow)
	2. [Input Section](#input-section)
	3. [Processors](#processors)
	4. [Face Detection & Selection](#face-detection--selection)
	5. [Face Masking](#face-masking)
	6. [Output Settings](#output-settings)
	7. [Execution Settings](#execution-settings)
	8. [Memory Management](#memory-management)
	9. [Tips for Best Results](#tips-for-best-results)

	---

	## Main Workflow

	### Basic Steps for Face Swapping
	1. Upload Source → The face you want to apply
	2. Upload Target → The image/video to modify
	3. Select Processors → face_swapper + face_enhancer for best quality
	4. Configure Settings → Adjust quality and options
	5. Preview → Check a frame before processing
	6. Start Processing → Generate final output

	---

	## Input Section

	### SOURCE
	Purpose: Upload the face image or audio file you want to apply to the target.

	Supported Files:
	- Images: For face swapping (JPG, PNG, etc.)
	- Audio: For lip syncing (MP3, WAV, etc.)

	Tips:
	- Use high-quality, well-lit images for best face swap results
	- Source face should be frontal or similar angle to target
	- Clear facial features produce better swaps

	### TARGET
	Purpose: Upload the base image or video that will be modified.

	Supported Files:
	- Images: Single image face swap
	- Videos: Video face swap/lip sync

	Tips:
	- Higher resolution = better quality but slower processing
	- Good lighting on faces improves detection and swap quality
	- Videos with stable faces work better than highly dynamic scenes

	### OUTPUT PATH
	Purpose: Specify where the processed result will be saved.

	Tips:
	- Use descriptive filenames to organize your outputs
	- Default saves to temp directory - specify custom path for permanent storage

	---

	## Processors

	### PROCESSORS SELECTION
	Select one or more AI processors to apply to your content:

	#### face_swapper ⭐ (Recommended)
	- Swaps faces from source to target
	- Best Models: `inswapper_128`, `blendswap_256`
	- Pixel Boost: Use `1024x1024` for maximum quality
	- Higher resolution = better detail but slower processing

	#### face_enhancer ⭐ (Recommended)
	- Improves face quality and details after swapping
	- Best Models: `gfpgan_1.4`, `restoreformer_plus_plus`
	- Blend: 80-100 for strong enhancement
	- Weight: Adjust for different model variants
	- Use together with face_swapper for professional results

	#### lip_syncer
	- Synchronizes lips to audio file
	- Requirements: Source audio file must be uploaded
	- Best Model: `wav2lip_gan_96` for quality
	- Weight: 1.0 for full sync, lower to blend with original

	#### age_modifier
	- Makes faces younger or older
	- Direction: Negative = younger, Positive = older
	- Range: -100 (very young) to +100 (very old)

	#### expression_restorer
	- Restores target's original facial expressions
	- Factor: 100 = full target expression, 0 = source expression
	- Useful to maintain natural emotions after face swap

	#### frame_enhancer
	- Upscales entire frame (not just face)
	- Models: `real_esrgan_x4` (4x upscale), `ultra_sharp_x4` (sharper)
	- Use for low-resolution videos
	- Very slow - use only when needed

	#### frame_colorizer
	- Colorizes black & white videos/images
	- Multiple artistic styles available

	#### face_editor
	- Manually adjust facial features
	- Control eyes, mouth, head rotation, expressions
	- Advanced feature for fine-tuning

	#### face_debugger
	- Shows detection boxes, landmarks, scores
	- Useful for troubleshooting detection issues

	---

	## Face Detection & Selection

	### FACE DETECTOR
	Purpose: Detects faces in images/videos for processing.

	#### Face Detector Model
	- yolo_face: Recommended - best accuracy and speed
	- retinaface: Good alternative

	#### Face Detector Size
	- 640x640: Balanced speed and accuracy (recommended)
	- 320x320: Faster but may miss faces
	- 1280x1280: Best accuracy but slower

	#### Face Detector Angles
	- Enable to detect rotated/tilted faces
	- More angles = better detection but slower
	- Use when faces aren't upright

	#### Face Detector Score
	- Confidence threshold (0-1)
	- 0.5: Standard - good balance
	- Higher = stricter detection, fewer false positives
	- Lower = detect more faces but more false positives

	### FACE LANDMARKER
	Purpose: Detects facial landmarks (eyes, nose, mouth) for accurate alignment.

	#### Face Landmarker Model
	- Detects 5 or 68 facial points
	- Essential for proper face alignment and swapping

	#### Face Landmarker Score
	- Confidence threshold (0-1)
	- 0.5: Generally works well
	- Higher = more accurate landmark detection required

	### FACE SELECTOR MODE
	Purpose: Choose which faces to process in the target.

	#### Modes:
	- One: Process first detected face only
	- Many: Process all detected faces
	- Reference: Track specific face across video frames (best for videos)
	- Age/Gender/Race filters: Target specific demographics

	#### Reference Face Distance
	- Similarity threshold for reference tracking
	- Lower = stricter matching (same person)
	- Higher = more lenient matching

	Tips:
	- Use Reference mode for videos with multiple people
	- Use One for single-person content
	- Use filters to target specific faces in multi-person scenes

	---

	## Face Masking

	### PURPOSE
	Control which parts of the face are swapped and how they blend.

	### Face Mask Types

	#### Box
	- Simple rectangular mask around face
	- Blur: Controls edge softness (0.3-0.5 recommended)
	- Padding: Expand mask in each direction (top, right, bottom, left)
	- Fast and simple

	#### Occlusion
	- Avoids occluded areas (glasses, hands, hair)
	- Uses face occluder model
	- More natural when face is partially covered

	#### Region
	- Masks specific facial regions
	- Uses face parser model
	- Select regions: eyes, nose, mouth, skin, etc.

	#### Area
	- Masks by facial areas
	- Combine multiple for custom masking

	Tips:
	- Combine mask types for best results
	- Increase blur for smoother blending
	- Adjust padding if face edges are visible

	---

	## Output Settings

	### IMAGE OUTPUT

	#### Output Image Quality (0-100)
	- JPEG compression quality
	- 90-95: Recommended for high quality
	- 100: Maximum quality (larger file)
	- 70-80: Good quality, smaller file

	#### Output Image Resolution
	- Can upscale or downscale from original
	- Match source resolution for best quality
	- Upscaling beyond 2x may look artificial

	### VIDEO OUTPUT

	#### Output Video Encoder
	- libx264: Widely compatible, good quality
	- libx265/hevc: Better compression, smaller files
	- h264_nvenc: GPU-accelerated (NVIDIA only)
	- copy: Preserve original encoding

	#### Output Video Preset
	- ultrafast: Quick but large file
	- fast/medium: Balanced
	- slow/slower: Best quality and compression (recommended)
	- veryslow: Maximum quality, very slow encoding

	#### Output Video Quality (0-100)
	- 90-95: Recommended for professional results
	- 80-85: Good quality, reasonable file size
	- Higher = better visual quality, larger files

	#### Output Video Resolution
	- Can upscale or downscale
	- Higher resolution requires more processing time
	- Match original for best quality/performance ratio

	#### Output Video FPS
	- 24: Cinematic look
	- 30: Standard video
	- 60: Smooth motion
	- Match original video FPS for best results

	### AUDIO OUTPUT (for videos)

	#### Output Audio Encoder
	- aac: Widely compatible, good quality (recommended)
	- libmp3lame: MP3 format
	- copy: Preserve original audio

	#### Output Audio Quality (0-100)
	- 80-90: CD quality
	- 100: Lossless
	- Higher = better sound, larger file

	#### Output Audio Volume (0-200%)
	- 100: Original volume
	- <100: Quieter
	- >100: Louder (may cause distortion)

	---

	## Execution Settings

	### EXECUTION PROVIDERS
	Purpose: Choose hardware acceleration for processing.

	#### Options:
	- CUDAExecutionProvider: NVIDIA GPU acceleration (fastest)
	- CoreMLExecutionProvider: Apple Silicon acceleration
	- CPUExecutionProvider: CPU only (slowest but always available)

	Tips:
	- Use GPU providers when available for 10-50x speedup
	- CPU is very slow but works on any system
	- Some models require specific providers

	### EXECUTION THREAD COUNT
	Purpose: Number of parallel processing threads.

	Recommendations:
	- Set to your CPU core count for optimal performance
	- Higher = faster but uses more CPU/GPU
	- Lower if system becomes unresponsive

	### EXECUTION QUEUE COUNT
	Purpose: Frames each thread processes before returning.

	Recommendations:
	- 1-2: Recommended for most cases
	- Higher = better GPU utilization but more VRAM needed
	- Lower = less memory usage

	---

	## Memory Management

	### VIDEO MEMORY STRATEGY
	Purpose: Balance processing speed vs VRAM usage.

	#### Options:
	- Strict: Low memory usage, slower processing
	- Moderate: Balanced (recommended)
	- Tolerant: Faster but uses more VRAM

	Tips:
	- Use Strict if you get out-of-memory errors
	- Use Tolerant if you have high-end GPU (12GB+ VRAM)

	### SYSTEM MEMORY LIMIT
	Purpose: Limit RAM usage during processing.

	- 0: No limit
	- Set value (in GB) to prevent system crashes
	- Useful for systems with limited RAM

	---

	## Tips for Best Results

	### 🌟 Quality Settings (Best Quality)
	```
	Processors: face_swapper + face_enhancer
	Face Swapper Model: inswapper_128
	Pixel Boost: 1024x1024
	Face Enhancer Model: gfpgan_1.4
	Face Enhancer Blend: 80-100
	Output Image/Video Quality: 90-95
	Video Preset: slow or slower
	```

	### ⚡ Speed Settings (Faster Processing)
	```
	Processors: face_swapper only
	Face Swapper Model: inswapper_128
	Pixel Boost: 512x512 or 768x768
	Skip face_enhancer
	Output Quality: 80-85
	Video Preset: medium or fast
	Execution Threads: Max CPU cores
	```

	### 🎯 Troubleshooting

	#### Face Not Detected
	- Check face detector score (try lowering to 0.3)
	- Enable more detector angles
	- Increase detector size to 1280x1280
	- Ensure face is visible and well-lit

	#### Poor Swap Quality
	- Increase pixel boost to 1024x1024
	- Add face_enhancer processor
	- Use higher output quality (90-95)
	- Ensure source and target faces are similar angles

	#### Out of Memory Error
	- Lower pixel boost to 512x512 or 768x768
	- Set video memory strategy to "strict"
	- Reduce execution queue count to 1
	- Lower output resolution
	- Process shorter video segments using trim frame

	#### Slow Processing
	- Use GPU execution provider (CUDA/CoreML)
	- Reduce pixel boost
	- Skip face_enhancer for faster processing
	- Lower execution thread count
	- Use faster video preset (medium/fast)

	#### Unnatural Blending
	- Increase face mask blur (0.4-0.6)
	- Adjust face mask padding
	- Enable occlusion mask type
	- Lower face enhancer blend

	---

	## Workflow Examples

	### Example 1: High-Quality Photo Face Swap
	1. Upload high-resolution source face image
	2. Upload target photo
	3. Select: face_swapper + face_enhancer
	4. Settings:
	- Face Swapper: inswapper_128, 1024x1024
	- Face Enhancer: gfpgan_1.4, blend 90
	- Output Quality: 95
	5. Preview result
	6. Process

	### Example 2: Video Face Swap (Multiple People)
	1. Upload source face
	2. Upload target video
	3. Select: face_swapper + face_enhancer
	4. Face Selector: Reference mode
	5. Click reference face in gallery
	6. Settings:
	- Pixel boost: 1024x1024
	- Video quality: 90
	- Preset: slow
	7. Use trim frame to process test segment first
	8. Process full video

	### Example 3: Lip Sync Video
	1. Upload source audio (speech/song)
	2. Upload target video
	3. Select: lip_syncer + face_swapper (optional)
	4. Settings:
	- Lip Syncer: wav2lip_gan_96
	- Weight: 1.0
	5. Process

	---

	## Summary Table

	\| Feature \| Recommended Setting \| Purpose \|
	\|---------\|-------------------\|---------\|
	\| Face Swapper Model \| inswapper_128 \| Best quality swapping \|
	\| Pixel Boost \| 1024x1024 \| Maximum detail \|
	\| Face Enhancer \| gfpgan_1.4, blend 80 \| Improve quality \|
	\| Output Quality \| 90-95 \| Near-lossless \|
	\| Video Preset \| slow/slower \| Best compression \|
	\| Execution Provider \| CUDA/CoreML \| GPU acceleration \|
	\| Face Selector \| Reference (videos) \| Track specific person \|
	\| Face Mask Blur \| 0.3-0.5 \| Natural blending \|

	---

	Last Updated: October 6, 2025

	For more information, visit the official FaceFusion documentation.