V20's prompts are followed better.

#244
by lisi31415926 - opened

V20's prompts are followed better.Thank you for your great work.

I feel like if you try to output an image resolution higher than the source image resolution, the facial features are completely overridden in v20.

Okay so I think since V20 added BestFaceSwap, it totally starts messing around with the facial features if you only have one source image. BestFaceSwap might be good when used with two images but with single images, it does random things. Eyes are wider, mouth opens, eyes looking at different direction, etc. It just messes with the facial features when you just want a 1:1 copy.

I tried v20 but I've gone back to v19. It seems to be changing the hair style and face of subjects for me. V19, at least for me, still remains the best one.

EDIT: After experimenting with v20 a bit more I've observed that reducing steps to 4 (from 6), euler_a/beta as before still, has helped improve the consistency. To further improve the consistency I've had to change the prompting style too, previously I was using a prompt enhancer to make the prompt more detailed, but I seem to now get better results on v20 with less detailed and much shorter prompts. It's possible that more experimentation is needed to adapt to this version better, but I'm no longer in a situation where I'm sticking solidly with v19 and am continuing to experiment with v20 to understand what might work more effectively with it.

For me, v20 is way better than the rest!!!! It's a 100% keeper for me. It's almost unreal how consistent some of my generation are even if i make the subjects do totally different things than the original reference image!

For me, v20 is way better than the rest!!!! It's a 100% keeper for me. It's almost unreal how consistent some of my generation are even if i make the subjects do totally different things than the original reference image!

Can we see the comparison of the effect pictures of your usage?

deleted

As folks have mentioned major face issues eyes especially. Anything outside the resolution of the first image it distorts the face. Thought I had done some update wrong until I saw these comments.

yea, seems the V20 will change the face feature...

Can you share which prompts are you used to tested?

I’m not sure if it’s the BestFaceSwap LORA causing the issue, but the facial changes seem quite significant.
However, if you exclude this LORA, V20 is perfect.
I wonder if it’s possible to release an AIO version without BestFaceSwap.

My first impression after using version v20 was that the consistency of the characters' facial features had deteriorated, and the characters' skin looked like wax - this is truly bad news!

Yeah, after further experimenting with v20 I've had to revert back to v19. I still get the best results overall from that. There's some noticeable inconsistencies happening when I've been experimenting with v20 unfortunately. My guess is that it's the FaceSwap LoRA.

Would like to have an AIO version without BFS as well.

Owner

The BFS LORA is only added at 0.2 strength in the latest v21 model.

There were many changes between v19 and v20, the biggest being not mixing in 2509 at all anymore.

Please share workflows showing issues with original images (as long as it doesn't violate and community standards) showing inconsistencies which I can add to test cases.

The BFS LORA is only added at 0.2 strength in the latest v21 model.

There were many changes between v19 and v20, the biggest being not mixing in 2509 at all anymore.

Please share workflows showing issues with original images (as long as it doesn't violate and community standards) showing inconsistencies which I can add to test cases.

source1
output1

source2
output2

workflow on the output images. v21. I just asked it to change the color of top blue top to red.

source3
output3

additional sample.

Source:
IMG_2392

v21 - 4 steps:
ComfyUI_01150_
v21 - 6 steps:
ComfyUI_01151_

source:
434891742-1139189243749271-535-9880-5143-1735032080

v21 - 4 steps:
ComfyUI_01157_

v21 - 6 steps:
ComfyUI_01158_

Version 21 ends up with a texture that feels like the AI's characteristic wax texture.

Version 21 ends up with a texture that feels like the AI's characteristic wax texture.

Yep all the details are smoothed out. Look at the roots and hay on the floor on the last example, it is pretty bad.

Owner

Thank you for these examples, I'll see what I can do or figure out what's going on.

Thank you for these examples, I'll see what I can do or figure out what's going on.

Thank you for all your hard work. Here is another one for you where the face consistency changes when the prompt was just to change the color of the clothes.

Source:
30001003928_1280

v21 - 4 steps:
ComfyUI_01162_

v21 - 6 steps:
ComfyUI_01163_

Owner

I'm out at the moment so I haven't been able to deep dive yet, but it appears like it is mostly just removing noise.

Owner

OK, let's try and break this down. This is the original:

image

image

This is what I get when I use raw Qwen Image Edit 2511 with just accelerators (prompt is "Change the woman's dress to pink.") :

image

image

This is the SFW v21 AIO:

image

image

This is the NSFW v21 AIO:

image

image

Owner

My conclusions:

  1. I don't think you were using my fixed "Text Encode Node", as you did not have the target_latent set. If you were using the default ComfyUI text encoding node, it would be reducing the resolution of your original images where detail would be lost (because these images are all >1MP).

  2. Qwen Image Edit "raw" already loses quite a bit of detail due to VAE encoding and decoding.

  3. My "SFW" merge very closely matches the detail of raw Qwen Image Edit. Considering this is "SFW" use, this is the model you should be using.

  4. The "NSFW" merge does do more smoothing as a result of the NSFW LORAs being added. However, I don't prioritize the detail of hay when I'm making NSFW merges, I prioritize NSFW capabilities (like genitalia, sexual positions etc.). It is a delicate balance because many NSFW LORAs are not designed for Qwen Image Edit 2511, so compromises always have to be made. With that said, I was not noticing any degradation or shifts in facial features, and at most, noise was being reduced.

There was a case of some different mouth positions in some of the examples, but looks like using 6 steps fixed it. The very small missing tongue in another example may likely be fixed by specifying "keeping her tongue slightly visible in her mouth".

My conclusions:

  1. I don't think you were using my fixed "Text Encode Node", as you did not have the target_latent set. If you were using the default ComfyUI text encoding node, it would be reducing the resolution of your original images where detail would be lost (because these images are all >1MP).

  2. Qwen Image Edit "raw" already loses quite a bit of detail due to VAE encoding and decoding.

  3. My "SFW" merge very closely matches the detail of raw Qwen Image Edit. Considering this is "SFW" use, this is the model you should be using.

  4. The "NSFW" merge does do more smoothing as a result of the NSFW LORAs being added. However, I don't prioritize the detail of hay when I'm making NSFW merges, I prioritize NSFW capabilities (like genitalia, sexual positions etc.). It is a delicate balance because many NSFW LORAs are not designed for Qwen Image Edit 2511, so compromises always have to be made. With that said, I was not noticing any degradation or shifts in facial features, and at most, noise was being reduced.

There was a case of some different mouth positions in some of the examples, but looks like using 6 steps fixed it. The very small missing tongue in another example may likely be fixed by specifying "keeping her tongue slightly visible in her mouth".

I am using your Text Encode Node v2. Can you provide an sample output so i can try to generate this myself? How do you use target_latent? The sample json you provided on huggingface has no use of it.

I used SFW prompts because i cant paste nsfw stuff on hugginface. I want to use NSFW but it keeps altering the face.

1768902135-96296cb1-d366-4376-804f-2ecb063325b5

Owner

Connect your "final image size" latent output to the target_size latent input.

I ran some tests above with her face, comparing different "AIOs" including the NSFW one. I wasn't able to reproduce an "altering the face" issue, as demonstrated.

You can post NSFW stuff on Huggingface. If you want, you can crop out just the face since that is all we are discussing.

Connect your "final image size" latent output to the target_size latent input.

I ran some tests above with her face, comparing different "AIOs" including the NSFW one. I wasn't able to reproduce an "altering the face" issue, as demonstrated.

You can post NSFW stuff on Huggingface. If you want, you can crop out just the face since that is all we are discussing.

It helps a little after linking the target_latent. but there are still issues with these two. The eyes and mouth are just doing random things.

https://cdn-uploads.huggingface.co/production/uploads/64985bcfc03129f5a4ea05a0/iJ7s327V2Xrb4oB4-hNWJ.jpeg

https://cdn-uploads.huggingface.co/production/uploads/64985bcfc03129f5a4ea05a0/drJy7Y0f_jQyJhOSVuXpX.jpeg

I am not able to reproduce your issue:

image

Left is original. Right is using the NSFW v21 merge with only the prompt "Change her shirt to pink.". The faces look the virtually the same to me, perhaps with just a bit less noise as I've already concluded. I will not try and prioritize pixel perfect copies (which is particularly hard when you have a really big picture and small face inside it).

Sampler settings used:

image

left original vs right v21.
image

It just seems like there is something modifying the face giving it a AI smooth over. All the details are lost and it just looks like another person. This is the same as the example you just gave above. Something on the face was "processed".

I am not able to reproduce your issue:

image

Left is original. Right is using the NSFW v21 merge with only the prompt "Change her shirt to pink.". The faces look the virtually the same to me, perhaps with just a bit less noise as I've already concluded. I will not try and prioritize pixel perfect copies (which is particularly hard when you have a really big picture and small face inside it).

Sampler settings used:

image

That is fine since this is your AIO model. But i think the original intent of the qwen image edit was to only change what was being asked to change without touching or having other things get modified unintentionally. But yeah, if that is the direction you want it, then i have nothing additional to add. Thanks.

image

2B50A0F2-0324-4B67-BD6F-DEDED31540C9_1_102_o

For science here is a HQ photo, try prompt: "Remove her bra". The face always changes. 4 steps, 6 steps, same resolution, higher resolution. doesn't matter.

Owner

2B50A0F2-0324-4B67-BD6F-DEDED31540C9_1_102_o

For science here is a HQ photo, try prompt: "Remove her bra". The face always changes. 4 steps, 6 steps, same resolution, higher resolution. doesn't matter.

OK, in this example, I'm finally seeing a significant change in her facial expression. It also has "enough pixels" of her face that it should be preserved. I'm looking into this more.

Just for reference. v18.1 was the last version that tried to keep the face consistent. Starting from v19, the face had started to gradually change but it was only very minor (like slight lip movements). v20 and v21, the face consistency was the worst and couldn't keep it at all (it was a completely different person in addition to movements).

Owner

Just for reference. v18.1 was the last version that tried to keep the face consistent. Starting from v19, the face had started to gradually change but it was only very minor (like slight lip movements). v20 and v21, the face consistency was the worst and couldn't keep it at all (it was a completely different person in addition to movements).

Well, as we have been demonstrating, keeping the "face consistent" is not black and white. Even with v21, finding concrete examples of facial inconsistency was quite hard. I narrowed down this particular facial expression issue to the anime2real LORAs, so I'm looking into another JibMix Skin LORA that should hopefully work better.

Owner

left original vs right v21.
image

It just seems like there is something modifying the face giving it a AI smooth over. All the details are lost and it just looks like another person. This is the same as the example you just gave above. Something on the face was "processed".

I don't consider this a problem. As I've stated before, this is just noise being removed.

This phenomenon is similar to how common denoising processes in AI upscalers often replace textures with an AI-like plastic appearance.

Owner

This phenomenon is similar to how common denoising processes in AI upscalers often replace textures with an AI-like plastic appearance.

Yeah, which is why I'm trying to combat that plastic effect with a compatible skin or realistic LORA. However, finding one that doesn't hurt anime or cause other inconsistencies is very difficult and I don't think there is a perfect solution. I'm hoping to get out something a little better, though.

kakeimiwakoerogazou-279

Another photo for science. It will always want to open her eyes and change her face.

sample1
This one is for the tongue. No matter what you edit, remove her bra, remove some text, her tongue will always be cleaned of the red food color.

v22 is uploading. I've tested some of these original images with it (in particular, the red food color now remains in edits with v22).

v22 just popped up. downloading and will test. Thanks!

We are so back with v22. Thank you!

I am not able to reproduce your issue:

image

Left is original. Right is using the NSFW v21 merge with only the prompt "Change her shirt to pink.". The faces look the virtually the same to me, perhaps with just a bit less noise as I've already concluded. I will not try and prioritize pixel perfect copies (which is particularly hard when you have a really big picture and small face inside it).

Sampler settings used:

image

can you share your current workflow please? it would also be cool if the workflow included an upscaler to add more realism and improve the quality of the photos, do you have any information about such tools? It would also be useful to have a workflow using 2 images.
by the way, I'm testing version 22. I think it's the best in terms of detail.
but I still have to use F2P Lora to save face (if I change my pose or scene)

I feel like if you try to output an image resolution higher than the source image resolution, the facial features are completely overridden in v20.

It should not exceed the original image, and it should be set to be about 30% smaller than the original. You can enlarge it again in the later stage

Sign up or log in to comment