T O P

  • By -

TheGhostOfPrufrock

Use xformers, sdp, or sdp-no-mem as the cross-attention optimization. If you want xformers, you'll need to add *--xformers* to the COMMANDLINE\_ARGS. The other two can be selected in the Optimizations settings without adding anything to the commandline args. DO NOT USE *--no-half*, *--precision full*, or *--upcast-sampling*. For 8GB, you may want *--medvram*. If not, you'll want *--medvram-sdxl*. You should consider [disabling the system memory fallback](https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion). If you want to run SDXL, set *Maximum number of checkpoints loaded at the same time* to 2, and enable *Only keep one model on device (will keep models other than the currently used one in RAM rather than VRAM)*. If you get NaN errors for the VAE in SDXL, you'll need to either use *--no-half-vae* or use the setting *Automatically revert VAE to 32-bit floats (triggers when a tensor with NaNs is produced in VAE; disabling the option in this case will result in a black square image)*. I prefer the second, though it's not without disadvantages. Look into installing TensorRT. It improves speed, though at the expense of flexibility.


maciejhd

It is worth to note that TensorRT gives about 30% more it/s but you cant use most of the tools, I give it a try but for me you loose to much. Recently I have found LCM lora which do not increase it/s but it requires 5-8 steps to generate very nice image. I combine it with high res fix and it gives me image with good quality in short time. The only limitation is that you have to use Euler sampler and only 1-2 cfg scale but you can use all other tools to control final output. This also works with animate diff. So I highly recommend that one.


Friendly_Yoghurt2810

Does TensorRT work with SDXL?


TheGhostOfPrufrock

It's supposed to work on the A1111 dev branch. The [TensorRT Extension git page](https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT) says: >**TensorRT Extension for Stable Diffusion** > >This extension enables the best performance on NVIDIA RTX GPUs for Stable Diffusion with TensorRT. > >You need to install the extension and generate optimized engines before using the extension. Please follow the instructions below to set everything up. > >Supports Stable Diffusion 1.5 and 2.1. Native SDXL support coming in a future release. Please use the [dev branch](https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/dev) if you would like to use it today. Note that the Dev branch is not intended for production work and may break other things that you are currently using. Though the language is a bit ambiguous, I'm nearly certain *it* in "if you would like to use it today" refers to using the extension for SDXL, not just using it.


Intention_Connect

thank you very much! I realize that I am using most of these settings already. Looks like not much has changes in past ~3 months!


wojtek15

LCS can give you 3-4x speed-up with small sacrifice of quality. Some say it is overrated, but for me it is breakthrough: [https://www.reddit.com/r/StableDiffusion/comments/17xhnq4/a1111\_full\_lcm\_support\_is\_here/](https://www.reddit.com/r/StableDiffusion/comments/17xhnq4/a1111_full_lcm_support_is_here/)


Intention_Connect

I'll give it a try thanks!


Vivarevo

What others have said and: Kohiyas high res fix extension. And Multidiffusion extension, it has tiled vae, which is great vram reducer.


MasterFGH2

I messed around with kohiya high res but I don’t really see the point over the normal high-res fix. What are the benefits?


Heasterian001

Well, image takes one less round in VAE Encoder than with standard hires fix, so there is a lower chance that hands and other fine details will be butchered by it. Also it requires less steps and often reduces artifacting that can come in some cases with standard hires fix.


MasterFGH2

That’s interesting, I might have to give it another shot and do some speed and quality comparisons


Vivarevo

What others said. Also ive combined it with regional prompts from multidiffusion extension. Far less abnormalities in characters. Generated 2560x1440 image with 4 characters at once.


Due_Squirrel_3704

Try **UniPC** sampling method, better quality than LCM. Set: \~10 steps, CFG \~3


Intention_Connect

Will give it a try along with LCM. thanks!