Phr00t/Qwen-Image-Edit-Rapid-AIO · V20's prompts are followed better.

Jan 16

V20's prompts are followed better.Thank you for your great work.

Jan 16

I feel like if you try to output an image resolution higher than the source image resolution, the facial features are completely overridden in v20.

kinopu

Jan 16

Okay so I think since V20 added BestFaceSwap, it totally starts messing around with the facial features if you only have one source image. BestFaceSwap might be good when used with two images but with single images, it does random things. Eyes are wider, mouth opens, eyes looking at different direction, etc. It just messes with the facial features when you just want a 1:1 copy.

Ixel1

Jan 16

•

edited Jan 16

I tried v20 but I've gone back to v19. It seems to be changing the hair style and face of subjects for me. V19, at least for me, still remains the best one.

EDIT: After experimenting with v20 a bit more I've observed that reducing steps to 4 (from 6), euler_a/beta as before still, has helped improve the consistency. To further improve the consistency I've had to change the prompting style too, previously I was using a prompt enhancer to make the prompt more detailed, but I seem to now get better results on v20 with less detailed and much shorter prompts. It's possible that more experimentation is needed to adapt to this version better, but I'm no longer in a situation where I'm sticking solidly with v19 and am continuing to experiment with v20 to understand what might work more effectively with it.

narrn001

Jan 16

For me, v20 is way better than the rest!!!! It's a 100% keeper for me. It's almost unreal how consistent some of my generation are even if i make the subjects do totally different things than the original reference image!

pymo

Jan 16

For me, v20 is way better than the rest!!!! It's a 100% keeper for me. It's almost unreal how consistent some of my generation are even if i make the subjects do totally different things than the original reference image!

Can we see the comparison of the effect pictures of your usage?

deleted

Jan 16

As folks have mentioned major face issues eyes especially. Anything outside the resolution of the first image it distorts the face. Thought I had done some update wrong until I saw these comments.

kknd3000

Jan 17

yea, seems the V20 will change the face feature...

condeduke

Jan 17

Can you share which prompts are you used to tested?

wally0308

Jan 17

I’m not sure if it’s the BestFaceSwap LORA causing the issue, but the facial changes seem quite significant.
However, if you exclude this LORA, V20 is perfect.
I wonder if it’s possible to release an AIO version without BestFaceSwap.

wange1002

Jan 18

My first impression after using version v20 was that the consistency of the characters' facial features had deteriorated, and the characters' skin looked like wax - this is truly bad news!

Ixel1

Jan 18

•

edited Jan 18

Yeah, after further experimenting with v20 I've had to revert back to v19. I still get the best results overall from that. There's some noticeable inconsistencies happening when I've been experimenting with v20 unfortunately. My guess is that it's the FaceSwap LoRA.

kinopu

Jan 18

Would like to have an AIO version without BFS as well.

Phr00t

Owner Jan 18

The BFS LORA is only added at 0.2 strength in the latest v21 model.

There were many changes between v19 and v20, the biggest being not mixing in 2509 at all anymore.

Please share workflows showing issues with original images (as long as it doesn't violate and community standards) showing inconsistencies which I can add to test cases.

kinopu

Jan 19

•

edited Jan 19

The BFS LORA is only added at 0.2 strength in the latest v21 model.

There were many changes between v19 and v20, the biggest being not mixing in 2509 at all anymore.

Please share workflows showing issues with original images (as long as it doesn't violate and community standards) showing inconsistencies which I can add to test cases.

workflow on the output images. v21. I just asked it to change the color of top blue top to red.

kinopu

Jan 19

additional sample.

kinopu

Jan 19

•

edited Jan 19

Source:

v21 - 4 steps:

v21 - 6 steps:

kinopu

Jan 19

source:

v21 - 4 steps:

v21 - 6 steps:

slash726

Jan 19

Version 21 ends up with a texture that feels like the AI's characteristic wax texture.

kinopu

Jan 19

Version 21 ends up with a texture that feels like the AI's characteristic wax texture.

Yep all the details are smoothed out. Look at the roots and hay on the floor on the last example, it is pretty bad.

Phr00t

Owner Jan 19

Thank you for these examples, I'll see what I can do or figure out what's going on.

kinopu

Jan 19

Thank you for these examples, I'll see what I can do or figure out what's going on.

Thank you for all your hard work. Here is another one for you where the face consistency changes when the prompt was just to change the color of the clothes.

Source:

v21 - 4 steps:

v21 - 6 steps:

Phr00t

Owner Jan 19

I'm out at the moment so I haven't been able to deep dive yet, but it appears like it is mostly just removing noise.

Phr00t

Owner Jan 19

OK, let's try and break this down. This is the original:

This is what I get when I use raw Qwen Image Edit 2511 with just accelerators (prompt is "Change the woman's dress to pink.") :

This is the SFW v21 AIO:

This is the NSFW v21 AIO:

Phr00t

Owner Jan 19

My conclusions:

I don't think you were using my fixed "Text Encode Node", as you did not have the target_latent set. If you were using the default ComfyUI text encoding node, it would be reducing the resolution of your original images where detail would be lost (because these images are all >1MP).
Qwen Image Edit "raw" already loses quite a bit of detail due to VAE encoding and decoding.
My "SFW" merge very closely matches the detail of raw Qwen Image Edit. Considering this is "SFW" use, this is the model you should be using.
The "NSFW" merge does do more smoothing as a result of the NSFW LORAs being added. However, I don't prioritize the detail of hay when I'm making NSFW merges, I prioritize NSFW capabilities (like genitalia, sexual positions etc.). It is a delicate balance because many NSFW LORAs are not designed for Qwen Image Edit 2511, so compromises always have to be made. With that said, I was not noticing any degradation or shifts in facial features, and at most, noise was being reduced.

There was a case of some different mouth positions in some of the examples, but looks like using 6 steps fixed it. The very small missing tongue in another example may likely be fixed by specifying "keeping her tongue slightly visible in her mouth".

kinopu

Jan 20

•

edited Jan 20

My conclusions:

I don't think you were using my fixed "Text Encode Node", as you did not have the target_latent set. If you were using the default ComfyUI text encoding node, it would be reducing the resolution of your original images where detail would be lost (because these images are all >1MP).

Qwen Image Edit "raw" already loses quite a bit of detail due to VAE encoding and decoding.

My "SFW" merge very closely matches the detail of raw Qwen Image Edit. Considering this is "SFW" use, this is the model you should be using.

The "NSFW" merge does do more smoothing as a result of the NSFW LORAs being added. However, I don't prioritize the detail of hay when I'm making NSFW merges, I prioritize NSFW capabilities (like genitalia, sexual positions etc.). It is a delicate balance because many NSFW LORAs are not designed for Qwen Image Edit 2511, so compromises always have to be made. With that said, I was not noticing any degradation or shifts in facial features, and at most, noise was being reduced.

There was a case of some different mouth positions in some of the examples, but looks like using 6 steps fixed it. The very small missing tongue in another example may likely be fixed by specifying "keeping her tongue slightly visible in her mouth".

I am using your Text Encode Node v2. Can you provide an sample output so i can try to generate this myself? How do you use target_latent? The sample json you provided on huggingface has no use of it.

I used SFW prompts because i cant paste nsfw stuff on hugginface. I want to use NSFW but it keeps altering the face.

Phr00t

Owner Jan 20

Connect your "final image size" latent output to the target_size latent input.

I ran some tests above with her face, comparing different "AIOs" including the NSFW one. I wasn't able to reproduce an "altering the face" issue, as demonstrated.

You can post NSFW stuff on Huggingface. If you want, you can crop out just the face since that is all we are discussing.

kinopu

Jan 20

Connect your "final image size" latent output to the target_size latent input.

I ran some tests above with her face, comparing different "AIOs" including the NSFW one. I wasn't able to reproduce an "altering the face" issue, as demonstrated.

You can post NSFW stuff on Huggingface. If you want, you can crop out just the face since that is all we are discussing.

It helps a little after linking the target_latent. but there are still issues with these two. The eyes and mouth are just doing random things.

https://cdn-uploads.huggingface.co/production/uploads/64985bcfc03129f5a4ea05a0/iJ7s327V2Xrb4oB4-hNWJ.jpeg

https://cdn-uploads.huggingface.co/production/uploads/64985bcfc03129f5a4ea05a0/drJy7Y0f_jQyJhOSVuXpX.jpeg

Phr00t

Owner Jan 20

•

edited Jan 20

I am not able to reproduce your issue:

Left is original. Right is using the NSFW v21 merge with only the prompt "Change her shirt to pink.". The faces look the virtually the same to me, perhaps with just a bit less noise as I've already concluded. I will not try and prioritize pixel perfect copies (which is particularly hard when you have a really big picture and small face inside it).

Sampler settings used:

kinopu

Jan 20

left original vs right v21.

It just seems like there is something modifying the face giving it a AI smooth over. All the details are lost and it just looks like another person. This is the same as the example you just gave above. Something on the face was "processed".

kinopu

Jan 20

I am not able to reproduce your issue:

Left is original. Right is using the NSFW v21 merge with only the prompt "Change her shirt to pink.". The faces look the virtually the same to me, perhaps with just a bit less noise as I've already concluded. I will not try and prioritize pixel perfect copies (which is particularly hard when you have a really big picture and small face inside it).

Sampler settings used:

That is fine since this is your AIO model. But i think the original intent of the qwen image edit was to only change what was being asked to change without touching or having other things get modified unintentionally. But yeah, if that is the direction you want it, then i have nothing additional to add. Thanks.

kinopu

Jan 20

For science here is a HQ photo, try prompt: "Remove her bra". The face always changes. 4 steps, 6 steps, same resolution, higher resolution. doesn't matter.

Phr00t

Owner Jan 20

For science here is a HQ photo, try prompt: "Remove her bra". The face always changes. 4 steps, 6 steps, same resolution, higher resolution. doesn't matter.

OK, in this example, I'm finally seeing a significant change in her facial expression. It also has "enough pixels" of her face that it should be preserved. I'm looking into this more.

kinopu

Jan 20

Just for reference. v18.1 was the last version that tried to keep the face consistent. Starting from v19, the face had started to gradually change but it was only very minor (like slight lip movements). v20 and v21, the face consistency was the worst and couldn't keep it at all (it was a completely different person in addition to movements).

Phr00t

Owner Jan 20

Just for reference. v18.1 was the last version that tried to keep the face consistent. Starting from v19, the face had started to gradually change but it was only very minor (like slight lip movements). v20 and v21, the face consistency was the worst and couldn't keep it at all (it was a completely different person in addition to movements).

Well, as we have been demonstrating, keeping the "face consistent" is not black and white. Even with v21, finding concrete examples of facial inconsistency was quite hard. I narrowed down this particular facial expression issue to the anime2real LORAs, so I'm looking into another JibMix Skin LORA that should hopefully work better.

Phr00t

Owner Jan 20

left original vs right v21.

It just seems like there is something modifying the face giving it a AI smooth over. All the details are lost and it just looks like another person. This is the same as the example you just gave above. Something on the face was "processed".

I don't consider this a problem. As I've stated before, this is just noise being removed.

slash726

Jan 20

This phenomenon is similar to how common denoising processes in AI upscalers often replace textures with an AI-like plastic appearance.

Phr00t

Owner Jan 20

This phenomenon is similar to how common denoising processes in AI upscalers often replace textures with an AI-like plastic appearance.

Yeah, which is why I'm trying to combat that plastic effect with a compatible skin or realistic LORA. However, finding one that doesn't hurt anime or cause other inconsistencies is very difficult and I don't think there is a perfect solution. I'm hoping to get out something a little better, though.

kinopu

Jan 20

•

edited Jan 20

Another photo for science. It will always want to open her eyes and change her face.

kinopu

Jan 20

This one is for the tongue. No matter what you edit, remove her bra, remove some text, her tongue will always be cleaned of the red food color.

Phr00t

Owner Jan 20

•

edited Jan 20

v22 is uploading. I've tested some of these original images with it (in particular, the red food color now remains in edits with v22).

kinopu

Jan 20

v22 just popped up. downloading and will test. Thanks!

kinopu

Jan 21

We are so back with v22. Thank you!

dGuarava

Jan 21

•

edited Jan 21

I am not able to reproduce your issue:

Left is original. Right is using the NSFW v21 merge with only the prompt "Change her shirt to pink.". The faces look the virtually the same to me, perhaps with just a bit less noise as I've already concluded. I will not try and prioritize pixel perfect copies (which is particularly hard when you have a really big picture and small face inside it).

Sampler settings used:

can you share your current workflow please? it would also be cool if the workflow included an upscaler to add more realism and improve the quality of the photos, do you have any information about such tools? It would also be useful to have a workflow using 2 images.
by the way, I'm testing version 22. I think it's the best in terms of detail.
but I still have to use F2P Lora to save face (if I change my pose or scene)

ming256

Jan 22

I feel like if you try to output an image resolution higher than the source image resolution, the facial features are completely overridden in v20.

It should not exceed the original image, and it should be set to be about 30% smaller than the original. You can enlarge it again in the later stage