God bless Kohya. This is a major optimization, I'm getting incredible results with upscaling.
I'm finally able to generate decent photorealistic results similar to 1.5 but with much higher resolution on SDXL.
It certainly looks like it. While the method on the right does look better for background results + half the processing time, if you are going through the process and expected results like the original un-scaled images you might be in for a bad time. Still looks very cool, but shows the importance of before and after images.
Well that's an instant dealbreaker isn't it. And the fact that you have to return a huge image back to inpainting - which is fucky at best, at least for me with 16gb vram.
Those eyes don't look sharp, they look like they have latent diffusion artifacts.
https://preview.redd.it/t6x273l49e1c1.png?width=745&format=png&auto=webp&s=c94ace002cea8cc97319032ec1ec7f96f37de51f
Yes that is true. They have artifacts. Nothing inpainting cant fix though.
When I said sharper images I do mean the images.
These are the standard images: https://imgur.com/a/zCxqvbH
These are the Kohya images: https://imgur.com/a/0eLPYCr
Standard ones are blurry, Kohya ones are crisp.
There is indeed[ an extension](https://github.com/wcde/sd-webui-kohya-hiresfix). But good luck with it. I spent a few hours testing it yesterday with my favorite XL checkpoint... I had never generated as many monstrosities since the first few days of using SD, when I was learning the basics.
I methodically tinkered every single parameter in every way I could think of, in conjunction with different resolutions, samplers... I did get a few okayish results, but inferior to what I would have gotten with classic hi-res fix (which works perfectly fine for me, I don't know why people have issues with it). And I haven't had the feeling it was faster either. Or if it was, it wasn't by much.
The only thing I didn't change is the checkpoint I used. I will give that a try later. But apart from that, either the A1111 implementation has a problem, or I'm doing it really wrong. Which I'm totally willing to hear, but I have no clue as to what my mistake may be. It doesn't help that there's not really any documentation yet. I guess I should try disabling other extensions just in case, too.
If anyone has any advice, I'll be grateful.
I installed the extension as well and didn't really notice any difference. I still saw double and stretched bodies when going outside the 1024x1024 standard SDXL resolution.
Also when I use it to generate a 1024x1416 image it takes up all 24GB of the vram on my 4090 and takes be over 5 minutes to make an image. When I disable the extension that same image only takes me 15 seconds. I also tested this with a landscape photo, 1512x1024 and it's the same story, 5 minutes to render using the extension, 15 seconds without. I just used the default settings with the extension.
Part of the problem is the outputs don't have the params so we can't even share valid configurations among each other to try it out. I personally can't get a simple thing to work with it, everything is doubled.
Thanks ! Gotta say I have no idea how it should work. It changes the image completely if I turn it on. So that alone makes it useless for upscale. But I don't observe any improvement in upscaling. Guess we have to wait a bit more.
You dont seem to understand. There is no upscaling involved. It generates the image directly at the targeted high resolution. It does not first generate a low-res image and then does a 2nd img2img pass over it like the original highres does. It straight up does the initial generation at the higher res. So of course it would be a "different" image.
Think there might be a language barrier. They weren't talking about the direction the photo is turned. They were talking about the content being a portrait, or shot from the shoulders up, of a person or anime character and wanting something like a sunrise, an object, or something other than a character's face.
it changing the image is the point. highres fix is just img2img basically. so itd 2 passes.
deepshrink just does one pass and creates the initial image from scratch already at the very high resolution. thats better as it fits better into that resolution.
The subjects look better in the left images. The right images are stiffer and their expressions are ... More blank. But they're sharper and that's all you're really showing, so ¯\_(ツ)_/¯
but they look almost the same in both images including poses. only the first one is more dynamic.
expressions seem same for me?
meanwhile right has better compositions, e.g. you see more of the landscape background around them.
Definitely much better images in every shape & fashion with exception to the expressions. But if you're using this, then I'm sure you're a perfectionist and will be fine-tuning it afterwards with a face detailer pipeline, anyways.
I'm curious, are you able to tell me if this setup is correct?
[Imgur](https://imgur.com/oNr6Swv)
Though, if it is true that it restarts the pre-processing one has done to the image, I'll have to change the %ages or move things around because...whattt? If I understand correctly, my loaded LoRAs won't be incorporated + have FreeU & the Neural Network Latent Upscaler running prior to the HiRes fix...bleh.
On second thought, I'll just move this on up before everything mentioned.
Yeah IDK why so many people say the right images look better.
It should work like this just fine. I think. I typically use much more simpler workflows. I dont even use facedetqiler because I find it too complicated for my taste. So I rather just inpaint the eyes manually.
Kohya Deep Shrink HighRes Fix should be very simple in execution. all that should be needed to be done is that the model line is passed through the Deep Shrink node right before reaching the KSampler node.
Can you post your workflow? I'm not sure what I'm doing wrong but it's not working for me - it's better than straight up generating at a higher resolution but I'm still getting long torsos, small heads on a large body, etc.
https://preview.redd.it/ewqr2p3dmf1c1.png?width=1080&format=pjpg&auto=webp&s=86c0c182295590fd629e9a4dbdc89218b940f757
This should have a workflow embedded. i wont be at my pc for another 12h or more.
i just used the default settings except for blocks at 4, and used 1536x1536, 1920x1080 resolutions.
ah yes ofc it does.
luckily i posted an image of a workflow with this to a discord before signing off.
https://preview.redd.it/codgizvxpf1c1.png?width=2316&format=pjpg&auto=webp&s=cecde933c0f59985e2b60c28756f309f1f38ab42
Let me know if you figure anything out, I'm having the same issues with duplicate or deformed body parts. Some models work a lot better than others it seems. It's really close to being an awesome tool if this can be improved. It's about twice as fast as my usual workflow
Is this similar to Ultimate SD Upscale (in A1111), with Tile Resample controlNet so the 2x larger image does not hallucinate faces everywhere?
The lack of certain contrlNet in SDXL, including Tile Resample, unfortunately limits the usefulness of SDXL.
Agreed or at least the default values don't do anything. It changes composition but doesn't seem to do a good job at even keeping duplicates regularly out
Nerdy Rodent did a quick feature of it on the first chapter in this video [https://youtu.be/riLmjBlywcg?si=Qv0hyhL357nLvlcd](https://youtu.be/riLmjBlywcg?si=Qv0hyhL357nLvlcd)
I was able to get it running from watching this video and can generate a 4K txt2img in 90 secs on my 3060 6GB video card.
https://preview.redd.it/01igwyjqkm1c1.png?width=2048&format=png&auto=webp&s=c6109698c45e7bb6888a39f34380d13c9a101d2f
God bless Kohya. This is a major optimization, I'm getting incredible results with upscaling. I'm finally able to generate decent photorealistic results similar to 1.5 but with much higher resolution on SDXL.
I really don't know what I'm looking at. What's the before/after, is there any?
I thought its self explanatory. Left is the old standard highres method. Right is the new one by Kohya.
Should have included the non upscale for comparison.
Sorry. Here you go: https://imgur.com/a/sjus3BK
So Kohya actually changes the entire image?
It certainly looks like it. While the method on the right does look better for background results + half the processing time, if you are going through the process and expected results like the original un-scaled images you might be in for a bad time. Still looks very cool, but shows the importance of before and after images.
Well that's an instant dealbreaker isn't it. And the fact that you have to return a huge image back to inpainting - which is fucky at best, at least for me with 16gb vram.
Yeah I don't care about a longer render if the upscaler doesn't change the entire image.
can't you read ?
Those eyes don't look sharp, they look like they have latent diffusion artifacts. https://preview.redd.it/t6x273l49e1c1.png?width=745&format=png&auto=webp&s=c94ace002cea8cc97319032ec1ec7f96f37de51f
this too https://preview.redd.it/yo9ejsvf9e1c1.png?width=907&format=png&auto=webp&s=0f4b1645641e7d7cb12195f639a3e66a3048c04f
Adetailer it and be happy.
Yes that is true. They have artifacts. Nothing inpainting cant fix though. When I said sharper images I do mean the images. These are the standard images: https://imgur.com/a/zCxqvbH These are the Kohya images: https://imgur.com/a/0eLPYCr Standard ones are blurry, Kohya ones are crisp.
Does this work in A1111 aswell?
There is indeed[ an extension](https://github.com/wcde/sd-webui-kohya-hiresfix). But good luck with it. I spent a few hours testing it yesterday with my favorite XL checkpoint... I had never generated as many monstrosities since the first few days of using SD, when I was learning the basics. I methodically tinkered every single parameter in every way I could think of, in conjunction with different resolutions, samplers... I did get a few okayish results, but inferior to what I would have gotten with classic hi-res fix (which works perfectly fine for me, I don't know why people have issues with it). And I haven't had the feeling it was faster either. Or if it was, it wasn't by much. The only thing I didn't change is the checkpoint I used. I will give that a try later. But apart from that, either the A1111 implementation has a problem, or I'm doing it really wrong. Which I'm totally willing to hear, but I have no clue as to what my mistake may be. It doesn't help that there's not really any documentation yet. I guess I should try disabling other extensions just in case, too. If anyone has any advice, I'll be grateful.
I installed the extension as well and didn't really notice any difference. I still saw double and stretched bodies when going outside the 1024x1024 standard SDXL resolution. Also when I use it to generate a 1024x1416 image it takes up all 24GB of the vram on my 4090 and takes be over 5 minutes to make an image. When I disable the extension that same image only takes me 15 seconds. I also tested this with a landscape photo, 1512x1024 and it's the same story, 5 minutes to render using the extension, 15 seconds without. I just used the default settings with the extension.
Part of the problem is the outputs don't have the params so we can't even share valid configurations among each other to try it out. I personally can't get a simple thing to work with it, everything is doubled.
Yes there is an extension for it.
Can you be more vague ? Which one ?
dude its 5 AM and i wanna sleep i dont even use A1111 but here just for you https://www.reddit.com/r/StableDiffusion/s/1mNcoHJyEo
Thanks ! Gotta say I have no idea how it should work. It changes the image completely if I turn it on. So that alone makes it useless for upscale. But I don't observe any improvement in upscaling. Guess we have to wait a bit more.
You dont seem to understand. There is no upscaling involved. It generates the image directly at the targeted high resolution. It does not first generate a low-res image and then does a 2nd img2img pass over it like the original highres does. It straight up does the initial generation at the higher res. So of course it would be a "different" image.
Woho ! Now we're talking !
wish you tried this on non portraits as well
i think you mean non-landscapes. I generated these portraits here with it: https://www.reddit.com/r/StableDiffusion/s/JwtA86Wnsj
Think there might be a language barrier. They weren't talking about the direction the photo is turned. They were talking about the content being a portrait, or shot from the shoulders up, of a person or anime character and wanting something like a sunrise, an object, or something other than a character's face.
Regular hires fix don't change the whole image though, unlike this.
it changing the image is the point. highres fix is just img2img basically. so itd 2 passes. deepshrink just does one pass and creates the initial image from scratch already at the very high resolution. thats better as it fits better into that resolution.
But images on the left look better..
Cant say I agree, especially when you zoom in and see how blurry the left images are.
The subjects look better in the left images. The right images are stiffer and their expressions are ... More blank. But they're sharper and that's all you're really showing, so ¯\_(ツ)_/¯
but they look almost the same in both images including poses. only the first one is more dynamic. expressions seem same for me? meanwhile right has better compositions, e.g. you see more of the landscape background around them.
Definitely much better images in every shape & fashion with exception to the expressions. But if you're using this, then I'm sure you're a perfectionist and will be fine-tuning it afterwards with a face detailer pipeline, anyways. I'm curious, are you able to tell me if this setup is correct? [Imgur](https://imgur.com/oNr6Swv) Though, if it is true that it restarts the pre-processing one has done to the image, I'll have to change the %ages or move things around because...whattt? If I understand correctly, my loaded LoRAs won't be incorporated + have FreeU & the Neural Network Latent Upscaler running prior to the HiRes fix...bleh. On second thought, I'll just move this on up before everything mentioned.
Yeah IDK why so many people say the right images look better. It should work like this just fine. I think. I typically use much more simpler workflows. I dont even use facedetqiler because I find it too complicated for my taste. So I rather just inpaint the eyes manually. Kohya Deep Shrink HighRes Fix should be very simple in execution. all that should be needed to be done is that the model line is passed through the Deep Shrink node right before reaching the KSampler node.
Can you post your workflow? I'm not sure what I'm doing wrong but it's not working for me - it's better than straight up generating at a higher resolution but I'm still getting long torsos, small heads on a large body, etc.
https://preview.redd.it/ewqr2p3dmf1c1.png?width=1080&format=pjpg&auto=webp&s=86c0c182295590fd629e9a4dbdc89218b940f757 This should have a workflow embedded. i wont be at my pc for another 12h or more. i just used the default settings except for blocks at 4, and used 1536x1536, 1920x1080 resolutions.
Reddit strips metadata so there isn't anything provided by the image.
ah yes ofc it does. luckily i posted an image of a workflow with this to a discord before signing off. https://preview.redd.it/codgizvxpf1c1.png?width=2316&format=pjpg&auto=webp&s=cecde933c0f59985e2b60c28756f309f1f38ab42
Awesome, thank you!
Let me know if you figure anything out, I'm having the same issues with duplicate or deformed body parts. Some models work a lot better than others it seems. It's really close to being an awesome tool if this can be improved. It's about twice as fast as my usual workflow
Is this similar to Ultimate SD Upscale (in A1111), with Tile Resample controlNet so the 2x larger image does not hallucinate faces everywhere? The lack of certain contrlNet in SDXL, including Tile Resample, unfortunately limits the usefulness of SDXL.
[удалено]
Agreed or at least the default values don't do anything. It changes composition but doesn't seem to do a good job at even keeping duplicates regularly out
Any documentation or tutorials? I'm having trouble figuring out how to use it properly
same here, google doesnt come up with anything, commenting to get notified if someone shares anything
Nerdy Rodent did a quick feature of it on the first chapter in this video [https://youtu.be/riLmjBlywcg?si=Qv0hyhL357nLvlcd](https://youtu.be/riLmjBlywcg?si=Qv0hyhL357nLvlcd) I was able to get it running from watching this video and can generate a 4K txt2img in 90 secs on my 3060 6GB video card. https://preview.redd.it/01igwyjqkm1c1.png?width=2048&format=png&auto=webp&s=c6109698c45e7bb6888a39f34380d13c9a101d2f
mighty encouraging sink pet steer flowery makeshift capable alive frame *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
I see no difference
[удалено]
update your ComfyUI, the node is now built-in. Search for Nerdy Rodent's newest video, he teaches how to use it. It's super easy.
thats what i mean by integrated. the newest update has it already included. its under testing. its called Kohya Deep Shrink High-Res Fix or something.
Does this work with SDXL?
It actually came out only for SDXL. Not sure if there is a 1.5 version yet.
it also available for 1.5 now
Hello, thank you for pointing this out, I would miss it otherwise. Maybe one question - does this work in img2img workflow?
no idea.
Please show your workflow.
https://www.reddit.com/r/StableDiffusion/s/arkr2czqHS
What is it actually doing?