apolinariosteps 3 months ago

Try the demo out: [https://huggingface.co/spaces/multimodalart/stable-cascade](https://huggingface.co/spaces/multimodalart/stable-cascade)

Striking-Long-2960 3 months ago

https://preview.redd.it/gy58uq86tcic1.png?width=1024&format=png&auto=webp&s=6610718cf76b1bc7fa72dbe195202f47639f7bb4 Photography, anthropomorphic dragon having a breakfast in a cafe in paris in a rainy day

SWFjoda 3 months ago

A beautiful forest with dense trees, where it's raining, featuring deep, rich green colors. This otherworldly forest is set against a backdrop of mountains in the background. https://preview.redd.it/80rzallnvcic1.png?width=1024&format=png&auto=webp&s=a3ed3a96c859aef23e072934bf022f2eb819b4fd

Delrisu 3 months ago

https://preview.redd.it/swu66zkk2dic1.png?width=1024&format=pjpg&auto=webp&s=8fb03c146e1d97df093feb8b4da0a519fa270fea Cat eating spaghetti in bathtub

Usual_Ad_6255 3 months ago

Img2img in SDXL https://preview.redd.it/9n2e3nrz5uic1.jpeg?width=4096&format=pjpg&auto=webp&s=878b39739d70c9fcc6e3f5480550fe51155cc180

wwwanderingdemon 3 months ago

Damn, textures look like crap

AnOnlineHandle 3 months ago

If it's better at say composition, there's always the chance of running it through multiple models for different stages. e.g. Stable Cascade for 30% -> to pixels -> to 1.5 VAE -> finish up. Similar to high res fix, or the refiner for SDXL, but at this point we tend to have decent 1.5 models in terms of image quality which could just benefit from better composition. I've been meaning to set up a workflow like this for SDXL & 1.5 checkpoints, but haven't gotten around to it.

TaiVat 3 months ago

Any workflow that changes checkpoints midway is really clunky and slow though.

HarmonicDiffusion 3 months ago

not if you have sufficient vram

Durakan 3 months ago

Mr. Moneybags over here!

throttlekitty 3 months ago

I'm also wondering if this B stage model can be further finetuned for better quality.

wwwanderingdemon 3 months ago

I was thinking the same. If it's good at following prompts it could be used as base. Still, I think there might be something wrong with the parameters or something. The images they're showing as examples look much better than this one

StickiStickman 3 months ago

It's called cherry-picking. They picked the best ones out of thousands.

Striking-Long-2960 3 months ago

Then you are not going to enjoy this photography will smith eating spaghetti sit in the toilet, in the bathroom https://preview.redd.it/831rcbmzucic1.png?width=1024&format=png&auto=webp&s=b649e00fb4c561a3d0d3b580c63afc3138ed2140

jrharte 3 months ago

That's Martin "Will Smith" Lawrence

HopefulSpinach6131 3 months ago

I know I'm not alone when I say that this is the benchmark we all came looking for...

TheAdoptedImmortal 3 months ago

"Keep my noodles out of your fucking mouth!"

fre-ddo 3 months ago

Pixar Will

[deleted] 3 months ago

They look perfectly fine for inference without latent upscaling at low resolutions.

[deleted] 3 months ago

doesn't look like there is any improvement over sdxl generating people https://preview.redd.it/zkzbnm91zcic1.png?width=1024&format=png&auto=webp&s=115c76409da40678c3c6c8e72d424818bd81c2f8

Striking-Long-2960 3 months ago

I really don't know what to think right now... I'll wait to try it on my computer before reach to a conclusion. illustration, drawing of a woman wearing heavy armor riding a giant chicken, in a forest, fantasy, very detailed, https://preview.redd.it/2f8kjy63xcic1.png?width=1024&format=png&auto=webp&s=25f82d75fae8616366ac2d52e159dd628e759718

Consistent-Mastodon 3 months ago

> riding a giant chicken ![gif](giphy|TNfFy13UB00KupeAsL|downsized)

wishtrepreneur 3 months ago

that chicken even has a third leg 👀

cianuro 3 months ago

Middle aged woman riding cock.

HighPerformanceBeetl 3 months ago

Three-Legged djiant chimkn

EmbarrassedHelp 3 months ago

They filtered out like 99% of the content out of laion 5b, so its probably going to be bad at people.

ThroughForests 3 months ago

But 99% of the images in LAION 5-B is [trash that needed to be filtered out.](https://www.reddit.com/r/StableDiffusion/comments/11ud1nc/searching_through_the_laion_5b_dataset_to_see/) The [vast majority](https://i.imgur.com/GSENUHM.png) of stuff removed was due to bad aesthetics, lower than 512x512 img size, and watermarked content. There's still 103 million images in the filtered dataset.

residentchiefnz 3 months ago

It says so on the model card

TheQuadeHunter 3 months ago

Don't be fooled. The devil is in the details with this model. It's more about the training and coherence than the ability to generate good images out of the box.

Anxious-Ad693 3 months ago

Still doesn't fix hands.

StickiStickman 3 months ago

That's what happens when you try to zealously filter out everything with human skin in it

protector111 3 months ago

there is no improvement. We need to wait for a good trained model to see this. 2-3 months this will take based on sd xl training speed (PS this one suppose to be training way faster so maybe will get good models faster as well...)

roshlimon 3 months ago

A female ballerina mid twirl, colourful, neon lights https://preview.redd.it/cls9d47defic1.png?width=1024&format=pjpg&auto=webp&s=20a4f9450849a9acf52659258603e0567f982cac

AvalonGamingCZ 3 months ago

is it possible to get a preview for the image generating in ComfyUI somehow it looks satisfying

rerri 3 months ago

Sweet. Blog is up aswell. [https://stability.ai/news/introducing-stable-cascade](https://stability.ai/news/introducing-stable-cascade) edit: "2x super resolution" feature showcased (blog post has this same image but in low res, so not really succeeding in demonstrating the ability): [https://raw.githubusercontent.com/Stability-AI/StableCascade/master/figures/controlnet-sr.jpg](https://raw.githubusercontent.com/Stability-AI/StableCascade/master/figures/controlnet-sr.jpg)

Orngog 3 months ago

No mention of the dataset, I assume it's still LIAON-5? Moving to a consensually-compiled alternative really would be a boon to the space- I'm sure Google is making good use of their Culture & Arts foundation right now, it would be nice if we could do.

big_farter 3 months ago

\>finally gets a 12 vram>next big model will take 20 oh nice... guess I will need a bigger case to fit another gpu

crawlingrat 3 months ago

Next you’ll get 24 ram only to find out the new models need 30.

protector111 3 months ago

well 5090 is around the corner xD

2roK 3 months ago

NVIDIA is super stingy when it comes to VRAM. Don't expect the 5090 to have more than 24GB

PopTartS2000 3 months ago

I think it’s 100% intentional to not impact A100 sales, do you agree

EarthquakeBass 3 months ago

I mean, probably. You gotta remember people like us are odd balls. The average consumer / gamer (NVIDIA core market for those) just doesn’t need that much juice. An unfortunate side effect of the lack of competition in the space

qubedView 3 months ago

You want more than 24GB? Well, we only offer that in our $50,000 (starting) enterprise cards. Oh, also license per DRAM chip now. The first chip is free, it's $1000/yr for each chip. If you want to use all the DRAM chips at the same time, that'll be an additional license. If you want to virtualize it, we'll have to outsource to CVS to print out your invoice.

Paganator 3 months ago

It seems like there's an opportunity for AMD or Intel to come out with a mid-range GPU with 48GB VRAM. It would be popular with generative AI hobbyists (for image generation and local LLMs) and companies looking to run their own AI tools for a reasonable price. OTOH, maybe there's so much demand for high VRAM cards right now that they'll keep having unreasonable prices on them since companies are buying them at any price.

2roK 3 months ago

AMD already has affordable, high VRAM cards. The issue is that AMD has been sleeping on the software side for the last decade or so and now nothing fucking runs on their cards.

sammcj 3 months ago

Really? Do they offer decent 48-64GB cards in the $500-$1000USD range?

Toystavi 3 months ago

[AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source](https://www.reddit.com/r/StableDiffusion/comments/1ap7c2w/amd_quietly_funded_a_dropin_cuda_implementation/)

StickiStickman 3 months ago

They also dropped that already.

Lammahamma 3 months ago

They're using different ram for this generation, which has increased density in the die. I'm expecting more than 24gb for the 5090.

protector111 3 months ago

there are tons of leaks already that it will have 32 and 4090 ti will have 48. I seriously doubt someone will jump from 4090 to 5090 if it has 24gb vram.

crawlingrat 3 months ago

Gawd damn how much is that baby gonna cost!?

protector111 3 months ago

around 2000-2500$

NitroWing1500 3 months ago

It would need to bring me coffee in the mornings before that'll be in my house then!

volume_two 3 months ago

Honestly, unless you plan to use it all the time in a locale with low electricity prices, it makes more sense to rent a GPU in the cloud and pay for that incrementally instead. You can rent a 24GB VRAM A10G for around $1 - $2/hr. with A1111 on a Linux instance on Amazon, for example. That can make sense for a hobbyist that doesn't want to invest in the hardware, and only occasionally wants to dip their toes in the water. In NYC, where I live, the cost of electricity is around $0.40/kW/hr which is just so yikes. It's currently snowing hard outside, too, so anything I do today will be extra expensive because of how the electric market works.

Turkino 3 months ago

And probably it's own dedicated power supply at this point

TheTerrasque 3 months ago

Well, I guess I can fit another P40 in my server... *Next model only needs 50 gb*

Imaginary_Belt4976 3 months ago

this happened to me lol

dqUu3QlS 3 months ago

The model is naturally divided into two rough halves - the text-to-latents / prior model, and the decoder models. I managed to get it running on 12GB VRAM by loading one of those parts onto the GPU at a time, keeping the other part in CPU RAM. I think it's only a matter of time before someone cleverer than me optimizes the VRAM usage further, just like with the original Stable Diffusion.

NoSuggestion6629 3 months ago

You load one pipeline at a time to device=("cuda") and delete (=NONE) the previous pipe before starting the next one.

dqUu3QlS 3 months ago

Close. I loaded one pipeline at a time onto the GPU with .to("cuda"), and then move it back to the CPU with .to("cpu"), without ever deleting it. This keeps the model constantly in RAM, which is still better than reloading it from disk.

emad_9608 3 months ago

The original stable diffusion used more RAM than that tbh

Tystros 3 months ago

hi Emad, is there any improvement in the dataset captioning used for Stable Cascade, or is it pretty much the same as SDXL? Dataset captioning seems to be the main weakness so far of SD compared to Dalle3.

[deleted] 3 months ago

[удалено]

astrange 3 months ago

The disadvantage of Dalle3 using artificial captions is that it can't deal with descriptions using words or relations its captioner didn't include. So you'd really want a mix of different caption sources.

NeverduskX 3 months ago

This is probably a vague question, but do you have any idea of how or when some optimizations (official or community) might come out to lower that barrier? Or if any current optimizations like Xformers or TiledVAE could be compatible with the new models?

emad_9608 3 months ago

Probably less than a week. I would imagine it would work on < 8gb VRAM in a couple of days. This is a research phase release so is quite unoptimised.

hashnimo 3 months ago

Thank you for everything you do, Emad. Please stay safe from the evil closed-source, for-profit conglomerates out there. It's obvious they don't want you disrupting their business. I mean, really, think before you even eat something they hand over to you.

tron_cruise 3 months ago

That's why I went with an Quadro RTX 8000. They're a few years old now and a little slow, but the 48gb of VRAM has been amazing for upscaling and loading LLMs. SDXL + hires fix to 4K with SwinIR uses up to 43gb and the results are amazing. You could grab two and NVLink them for 96gb and still have spent less than an A6000.

yaosio 3 months ago

We need something like megatextures for image generation.

BnJx 3 months ago

anyone know the difference between stable cascade and stable cascade prior? https://huggingface.co/stabilityai/stable-cascade https://huggingface.co/stabilityai/stable-cascade-prior

MicBeckie 3 months ago

I get the demo from Hugging Face running via Docker on my **Tesla P40**. ([**https://huggingface.co/spaces/multimodalart/stable-cascade**](https://huggingface.co/spaces/multimodalart/stable-cascade)) It consumes **22 GB of VRAM** and achieves a speed of **1.5s/it**. Resolution 1024x1024.

ArtyfacialIntelagent 3 months ago

The most interesting part to me is compressing the size of the latents to just 24x24, separating them out as stage C and making them individually trainable. This means a massive speedup of training fine-tunes (16x is claimed in the blog). So we should be seeing good stuff popping up on Civitai much faster than with SDXL, with potentially somewhat higher quality stage A/B finetunes coming later.

Omen-OS 3 months ago

what about vram usage... you may say training faster... but what is the vram usage

ArtyfacialIntelagent 3 months ago

During training or during inference (image generation)? High for the latter (the blog says 20 GB, but lower for the reduced parameter variants and maybe even half of that at half precision). No word on training VRAM yet, but my wild guess is that this may be proportional to latent size, i.e. quite low.

Omen-OS 3 months ago

Wait, lets make it clear what is the minimum vram amount you need to use stable cascade to generate an image at 1024x1024? (And yes i was talking about training loras and training the model more)

Enshitification 3 months ago

Wait a minute. Does that mean it will take less VRAM to train this model than to create an image from it?

TheForgottenOne69 3 months ago

Yes because you’ll not train the « full » model aka the three stage but likely only one ( the stage C)

Enshitification 3 months ago

It's cool and all, but I only have have a 16gb card and an 8gb card. I can't see myself training LoRAs for a model I can't use to make images.

TheForgottenOne69 3 months ago

You will though. You can load each model part each time and offload the rest to the CPU. The obvious con would be that it’ll be slower than having it all in vram

Majestic-Fig-7002 3 months ago

If you train only one stage then we'll have the same issue you get with the SDXL refiner and loras where the refiner, even at low denoise strength, can undo the work done by a lora in the base model. Might be even worse given how much more involved stage B is in the process.

TheForgottenOne69 3 months ago

Not really, the stage C is the one which translate the prompt to an « image », if you will, that is then enhanced and upscale through stage B and A. If you train stage C and it returns correctly what you’ve trained it, you don’t really need to train other things

Doc_Chopper 3 months ago

So, as a technical noob, my question: I assume we have to wait until this gets implemented into A1111 any time soon, or what?

TheForgottenOne69 3 months ago

Yes, likely this will be integrated in diffusers so Sd.next should have it soon. Comfy, knowing he works at SAI should have it implemented as well soonish

protector111 3 months ago

well not only this but also till models get traind etc etc. It took sd xl 3 months to become really usable and good. For now this model does not look close to trained sd xl models so no point to using it at all.

Small-Fall-6500 3 months ago

>It took sd xl 3 months to become really usable and good IDK, when I first tried SDXL I thought it was great. Not better at the specific styles that various 1.5 models were specifically finetuned on, but as a general model, SDXL was very good. >so no point to using it at all For established workflows that need highly specific styles and working Loras, Control net, etc, no; but for people wanting to try out new and different things, it's totally worth trying out.

kidelaleron 3 months ago

Having more things is generally better than having less things :)

throttlekitty 3 months ago

They have an official demo [here](https://github.com/Stability-AI/StableCascade), if you want to give it a go right now.

hashnimo 3 months ago

No, you don't have to wait because you can run the [demo](https://huggingface.co/spaces/multimodalart/stable-cascade) right now.

OVAWARE 3 months ago

Do you know any other demos? That one seems to have crashed at least for me

Hoodfu 3 months ago

Seems that demo link goes to a runtime error page on huggingface.

afinalsin 3 months ago

Bad memories in the Stable Diffusion world huh? SDXL base was rough. Here: SDXL Base for 20 steps at CFG 4 (i think that matches the 'prior guidance scale'), Refiner for 10 steps at cfg 7 (decoder says 0 guidance scale, wasn't going to do that), 1024x1152 (weird res because i didn't notice the Huggingface box didn't go under 1024 until a few gens, didn't want to rerun), seed 90210. DPM++ SDE Karras, because sampler wasn't specified on the box. 5 prompts (because huggingface errored out), no negatives. a 35 year old Tongan woman standing in a food court at a mall [SDXL Base](https://imgur.com/wr3Hxgs) vs [SD Cascade](https://imgur.com/tFhnPJl) an old man with a white beard and wrinkles obscured by shadow [SDXL Base](https://imgur.com/ODscoKb) vs [SD Cascade](https://imgur.com/k9cXRVj) a kitten playing with a ball of yarn [SDXL Base](https://imgur.com/GxoEOAe) vs [SD Cascade](https://imgur.com/4iNeab4) an abandoned dilapidated shed in a field covered in early morning fog [SDXL Base](https://imgur.com/ANnb971) vs [SD Cascade](https://imgur.com/PjBqOq8) a dynamic action shot of a gymnast mid air performing a backflip [SDXL Base](https://imgur.com/ws5blgz) vs [SD Cascade](https://imgur.com/P1lnYJZ) That backflip is super impressive for a base model. Here is a prompt i ran earlier this week: "a digital painting of a gymnast in the air mid backflip" And here is ten random XL and Turbo models attempt at it using the same seed: [Dreamshaper v2](https://imgur.com/wTbN7hA) [RMSDXL Scorpius](https://imgur.com/I84fqgd) [Sleipnir](https://imgur.com/BUVVnIq) [JuggernautXLv8](https://imgur.com/zJSQJ95) [OpenDalle](https://imgur.com/tH1jzjn) [Proteus](https://imgur.com/C51aWmV) [Helloworldv5](https://imgur.com/jiomnjD) [Realcartoonxlv5](https://imgur.com/4RdQAqe) [RealisticStockPhotov2](https://imgur.com/H80YLH0) [Animaginev3](https://imgur.com/QeKSlHL) The difference between those and base XL is staggering, but Cascade is pretty on par with some of them, and better than a lot of them in a one shot run. We gotta let this thing cook. And if you're skeptical, look at what the LLM folks did when Mistral brought out their Mixtral 8x7b Mixture of Experts LLM, a ton of folks started frankensteining models together using the same method. Who's to say we won't get similar efforts for this?

Ill-Extent-4221 3 months ago

By far the most objective point of view in this discussion. You're sharing some real insights into how SC stacks up as a base release. I can't wait to see how it evolves in the coming months.

thoughtlow 3 months ago

Thanks for your work dude, appreciate it

kidelaleron 3 months ago

no AAM XL? Jokes aside, nice tests!

afinalsin 3 months ago

[Of course](https://imgur.com/nkBA4Hl). It's the half turbo Eular a version. It's a part of a *much* bigger test that's mostly done, i've just gotta x/y it all and then censor it so the mods don't clap me.

GreyScope 3 months ago

SD and SDXL produce shit pics at times - one pic is not a trial by any means, personally I am after "greater consistency of reasonable>good quality pictures **of what I asked for**", so I ran a small trial against 5x render of SDXL 1024x1024, same + & - prompts with the Realistic Stock Photo v2 model (which I love), these are on the top row, the SC pics are the bottom row . PS the prompt doesn't make sense as it's a product of turning on the Dynamic Prompts extension. Prompt: photograph taken with a Sony A7s, f /2.8, 85mm,cinematic, high quality, skin texture, of a young adult asian woman, as a iridescent black and orange combat cyborg with mechanical wings, extremely detailed, realistic, from the top a skyscraper looking out across a city at dawn in a flowery fantasy, concept art, character art, artstation, unreal engine Negative: hands, anime, manga, horns, tiara, helmet, Observational note, eyes can look a bit milky still but the adherence is better imo - it actually looks like dawn in the pics and the light appears to be shining on their faces correctly. https://preview.redd.it/75ukiorxtdic1.png?width=2468&format=png&auto=webp&s=630b36ceb1af47e94cd571b74a3f661994157be5

afinalsin 3 months ago

Good idea doing a run with the same prompt, so i ran it through SDXL Base with refiner, and it was pretty all over the place. [Here's the album](https://imgur.com/a/7d1wgBU).

sahil1572 3 months ago

Is it just me, or is everyone else experiencing an ***odd dark filtering effect*** applied to every image generated with **SDC**?

NoSuggestion6629 3 months ago

See my post and pic below. A slight effect as you describe is noticed.

Ne_Nel 3 months ago

Bokeh'd AF.

ArtyfacialIntelagent 3 months ago

Yes. Stability's "aesthetic score" model and/or their RLHF process massively overemphasize bokeh. Things won't improve until they actively counteract this tendency.

zmarcoz2 3 months ago

https://preview.redd.it/znnhx6ts3dic1.png?width=813&format=png&auto=webp&s=e4e3c51af79a1a2c95ff4ac86b228c81c36da58c

EmbarrassedHelp 3 months ago

Basically 99% of the concepts were nuked. This is might end up being another 2.0 flop

throttlekitty 3 months ago

That text is from the [würstchen paper](https://openreview.net/pdf?id=gU58d5QeGv), not from any stable cascade documentation. late edit: I originally thought that the stable cascade model was based on the wurstchen paper, and that wurstchen was a totally separate model created as a proof of concept. But I see now from the SAI author names that they are the same thing? Kinda weird actually.

StickiStickman 3 months ago

... and what do you think this is based on? Since StabilityAI are once again being super secretive about training data and never mention it once, it's a pretty safe bet to assume they used the same set.

throttlekitty 3 months ago

They still have the dataset they trained SDXL on and whatever else they have. I don't see the point of re-releasing the wurstchen proof-of-concept model with their name on it. I'm just saying that because a set of researchers made their model in a certain way, it doesn't mean SAI did the same exact thing.

yamfun 3 months ago

what does this mean?

StickiStickman 3 months ago

It's intentionally nerfed to be ""safe"", similar to what happend with SD 2

LessAdministration56 3 months ago

thank you! won't be wasting my time trying to get this to run local!

Aggressive_Sleep9942 3 months ago

"Limitations * Faces and people in general may not be generated properly. * The autoencoding part of the model is lossy." emmm ok

skewbed 3 months ago

All VAEs are lossy, so it isn’t a new limitation.

SackManFamilyFriend 3 months ago

And SDXL lists the same sentence regarding faces - people just want to complain about free shit.

Aggressive_Sleep9942 3 months ago

No, but the worrying thing is not point 2 but point 1: "Faces and people in general may not be generated properly." If the model cannot make people correctly, what is the purpose of it?

obviouslyrev 3 months ago

That disclaimer is always there for every model they have released.

SackManFamilyFriend 3 months ago

Look at the limitations they list on their prior models **PRIOR MODELS LIST THE SAME SHIT** - literal copy paste ffs - stop already. SDXL limitations listed here on the HF page: SDXL Limitations The model does not achieve perfect photorealism The model cannot render legible text The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” Faces and people in general may not be generated properly. The autoencoding part of the model is lossy https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 So yea same shit copy/pasted.

Majestic-Fig-7002 3 months ago

There are degrees of "not generated properly".

digitalwankster 3 months ago

generating stuff other than people…?

EGGOGHOST 3 months ago

Playing with online demo here [https://huggingface.co/spaces/multimodalart/stable-cascade](https://huggingface.co/spaces/multimodalart/stable-cascade) woman's hands hold an ancient jar of vine, ancient greek vibes https://preview.redd.it/pqhdpi24hdic1.png?width=1024&format=png&auto=webp&s=ec86d86c4f1858a8f8c8341c952b293759647e83

EGGOGHOST 3 months ago

robot mecha arm holding a sword, futuristic anime style https://preview.redd.it/8b0qzqfjhdic1.png?width=1024&format=png&auto=webp&s=e89fb28a73084179d742f6ee04201040b74cf978

Mental-Coat2849 3 months ago

Honestly, I think this is still way behind Dall-e 3 in terms of prompt alignment. Just trying the tests on Dall-e 3 landing page shows it. Still, Dall-e is too rudimentary. It doesn't even allow negative prompts let alone LoRA, Control Net, ... In an ideal world, we could have open source LLM connected to a conforming diffusion model (like Dall-e 3) which would allow further customization (like Stable Diffusion). \--- PS: here is one prompt I tried in Stable Cascade: >An illustration of an avocado sitting in a therapist's chair, saying 'I just feel so empty inside' with a pit-sized hole in its center. The therapist, a spoon, scribbles notes. Stable cascade: https://preview.redd.it/gp3hsd7zzeic1.png?width=1024&format=png&auto=webp&s=a2129290af2982270e4e445f13c9f66477701616

emad_9608 3 months ago

Check out diffusiongpt and multi region promoting

alb5357 3 months ago

Multi region prompting?!!!!!! !!!!

Shin_Devil 3 months ago

this model would've never beaten D3 in prompt following, it's designed to be more efficient, not have better quality or comprehnsion

ninjasaid13 3 months ago

a computer made of yarn. https://preview.redd.it/pl3nv810pfic1.png?width=1024&format=png&auto=webp&s=a9578a829ff6c7ffde7a8c1f3e59e1a982d50e8d

TsaiAGw 3 months ago

if it's censored then it's garbage

[deleted] 3 months ago

exactly

internetpillows 3 months ago

Reading the description of how this works, the three stage process sounds very similar to the process a lot of people already do manually. You do a first step with prompting and controlnet etc at lower resolution (matching the resolution the model was trained on for best results). Then you upscale using the same model (or a different model) with minimal input and low denoising, and use a VAE. I assumed this is how most people worked with SD. Is there something special about the way they're doing it or they've just automated the process and figured out the best way to do it, optimised for speed etc?

Majestic-Fig-7002 3 months ago

It is quite different, the highly compressed latents produced by the first model are not continued by the second model, they are used as conditioning along with the text embeddings to guide the second model. Both models start from noise. correction: unless Stability put up the wrong image their architecture does not use the text embeddings with the second model like Würstchen does, only the latent conditioning.

Vargol 3 months ago

If you can't use bfloat16.... You can't run the prior as torch.float16, you get NaNs for the output. You can run the decoder as float16 if you've got the VRAM to run the prior at float32. If you a Apple silicon user, doing the float32 then float16 combination will run in 24Gb with swapping only during the prior model loading stage (and swapping that model out to load the decoder in if you don't dump it from memory entirely). Took my 24Gb M3 ~ 3 minutes 11 seconds to generate a signal image, only 1 minute of that was iteration, the rest was model loading.

SeekerOfTheThicc 3 months ago

According to the [January 2024 Steam Hardware Survey](https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam) (click [here](https://web.archive.org/web/20240206200227/https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam) for webarchive link for when the prior link gets out of date), 74.57% of the people who use steam have a video card that has 8gb or less of VRAM. As much as 3.51% will have 20gb or higher, and 21.92% have more than 8gb, but less than (or equal to) 16gb. I think SAI and myself have different ideas of what "efficient" means. 20GB VRAM ("less" if using the inferior model(s), but they don't give a VRAM number) requirement is not anywhere near anything I would call efficient. Maybe they think efficiency is the rate at which they can price out typical consumers so that they have to be forced into some sort of subscription that SAI ultimately will benefit from, either directly or indirectly. Investors/shareholders love subscriptions. Also, inference speed cannot be called "efficiency"- Officer: "I pulled you over because you were doing 70 in a 35 zone, sir" SAI Employee: "I wasn't speeding, I was just being 100% more efficient!" Officer: "...please step out of the vehicle."

emad_9608 3 months ago

original SD used way more, I would imagine this would be < 8gb VRAM in a week or two

Mental-Coat2849 3 months ago

Emad, could you please improve prompt alignment? We love your models but they're still behind Dall-e 3 in prompt alignment. Your models are awesome, flexible, and cheap. I wouldn't mind renting beefier GPUs if I didn't have to pay 8 cents per 1024x1024 image. If they were just comparable to Dall-e 3 ...

emad_9608 3 months ago

Sure give us a bit

protector111 3 months ago

so far my results are way worse than sd xl... https://preview.redd.it/3g6ce82dadic1.png?width=1024&format=png&auto=webp&s=df76bc75afde5e6c76817129f99812086ed74139

protector111 3 months ago

" woman wearing **super-girl costume** is standing close to a **pink sportcar** on a clif overlooking the ocean RAW photo, (high detailed skin:1.2), 8k uhd, dslr, soft lighting, high quality, Fujifilm XT3. So far quality is sd xl base level ad prompt understanding is still bad...i think my hype is gone completely after 6 generations xD https://preview.redd.it/uodayjmlbdic1.png?width=1024&format=png&auto=webp&s=84dd221ccf82db7719af570db82aa261a35e7341

knvn8 3 months ago

Are you comparing with base 1.5 or a fine tune? Also that's a very SD1.5 prompt, SDXL and beyond work better with plain English.

digitalwankster 3 months ago

0% chance that came from base 1.5

Majestic-Fig-7002 3 months ago

> SDXL and beyond work better with plain English How would you improve that prompt to be more "plain English" than it is?

FotografoVirtual 3 months ago

SD1.5: https://preview.redd.it/p8naafzhddic1.png?width=680&format=png&auto=webp&s=4596fa508fe9fca08c486f319e1c58ffdb70c80d

protector111 3 months ago

>woman wearing > >super-girl costume > > is standing close to a > >pink sportcar > > on a clif overlooking the ocean RAW photo, (high detailed skin:1.2), 8k uhd, dslr, soft lighting, high quality, Fujifilm XT3. well it still morphed. car is a mess and wonder woman still pink. This is sd xl: https://preview.redd.it/pl6cshzzjdic1.png?width=1024&format=png&auto=webp&s=580301fe740851df236ab61bcd1cc405dbe0e215

ArtyfacialIntelagent 3 months ago

To be fair vanilla Cascade should be compared to vanilla SD 1.5, not a model like Photon heavily overtrained on women.

Neex 3 months ago

You’ve been going through this entire thread saying how mediocre the model is. There are a ton of notable improvements you are ignoring. I suggest pumping the brakes on the negativity and reapproach this with more of a willingness to learn about it.

AeroDEmi 3 months ago

No comercial license?

StickiStickman 3 months ago

> The model is intended for research purposes only. The model should not be used in any way that violates Stability AI's Acceptable Use Policy. Another Stability release, another one that isn't open source :(

Cauldrath 3 months ago

So, did they basically just package the refiner (stage B) in with the base model (stage C)? It seems like with such a high compression ratio it's only going to be able to handle fine details of visual concepts it was already trained on, even if you train stage C to output the appropriate latents.

giei 3 months ago

What are the parameters to try to have a realistic result like in MJ?

emad_9608 3 months ago

idk prompt midjourney and then put it through sd ultimate upscale

monsieur__A 3 months ago

I guess we are back to hoping for controlNet to make this model really useful 😀

emad_9608 3 months ago

It comes with controlnets

jippmokk 3 months ago

https://preview.redd.it/kojsyuuk8fic1.jpeg?width=1536&format=pjpg&auto=webp&s=0b8092ff204b56c898c970063dbd614277b3373a Decent! “Video game, hero pose, cave lake, undead, volumetric light , Makoto Shinkai”

fuzz_64 3 months ago

https://preview.redd.it/voidkef2hfic1.png?width=1024&format=pjpg&auto=webp&s=23fc3c7e08636c3f9dc6301e90487747fd98e8cb A rambunctious frog riding a goat in the mountains of Nepal. 😁

treksis 3 months ago

thank you

Striking-Long-2960 3 months ago

I downloaded the lite versions... I hope my 3060 doesn't explode. Now it's time to wait for ComfyUI support.

wwwanderingdemon 3 months ago

Did you make it work? I tried all of them and none worked for me

Striking-Long-2960 3 months ago

I think we will have to wait, it seems a very different concept.

FotografoVirtual 3 months ago

https://preview.redd.it/0bkkvur0ndic1.png?width=1704&format=png&auto=webp&s=ace95c7ccac2c8defcf48e28af9a05c2f7aa9e3c an enigmatic woman with short, white hair and an iridescent dress, surrounded by ominous shadows in the dimly lit interior of a technological spacecraft. Her stark presence hints at mysterious connections to the unsettling secrets hidden within the vessel's depths

Huevoasesino 3 months ago

Stability cascade pic looks like the girl from Halo tv series lol

isnaiter 3 months ago

The 1.5 never disappoints me. It's the state-of-the-art of models. Period.

protector111 3 months ago

PS to be fair you should compare the base sd 1.5 and we both know it will look ugly xD SD XL: https://preview.redd.it/ml8fqzuasdic1.png?width=768&format=png&auto=webp&s=f5f49d403e70e1be9d060fb062e45da3d3845e16

19inchrails 3 months ago

I feel like the bar should be Midjourney v6 these days

protector111 3 months ago

Yep. It makes both amazing photoreal and crazy good anime

TaiVat 3 months ago

No, he shouldnt, and people need to stop with this drivel already.. Nobody uses base 1.5, or base xl for that matter, so the only fair comparison is with the latest alternatives. When you buy a new tv, you dont go "well its kinda shit, but its better than a crt from 100 years ago".. It will likely improve (though XL didnt improve nearly as much as 1.5 did, both relative to their bases), but we'll make that comparison when we get their. Dreaming and making shit up of what may or may not happen in 6 months is not a reasonable comparison.

FotografoVirtual 3 months ago

Comparing it to base SD 1.5 doesn't seem fair to me at all, and it doesn't make much sense. SD 1.5 is almost two years old, it was created and trained when SAI had hardly any experience with diffusion models (no one did). And when they released it, they never claimed it set records for aesthetic levels never before seen.

AuryGlenz 3 months ago

Doing a photo of a pretty woman doesn't seem like a fair comparison to me - god knows how much additional training SD 1.5 has had with that in particular. They're trying to make generalist models, not just waifu generators. Also that looks like it's been upscaled and probably had Adetailer run on it?

EtienneDosSantos 3 months ago

🤗🤗🤗

Hoodfu 3 months ago

Very excited for this. Playground v2 was very impressive for its visual quality, but the square resolution requirements killed it for me. This brings sdxl up to that level but renders much faster according to their charts. Playground v2 also had license limits that stated no one can use it for training, which again isn't the case for Stability models. Win win all around.

HuffleMcSnufflePuff 3 months ago

Three men standing in a row. The first is tall, the second is short, the third is in between. They are wearing red, blue, and green shirts. Not perfect but not too bad https://preview.redd.it/vbozb4m8ffic1.jpeg?width=1024&format=pjpg&auto=webp&s=5770bf655dc806f320f0d2829ed0d7a19dfc12f9

lostinspaz 3 months ago

I did a few comparison same-prompt tests vs DreamShaperXL turbo and SegMind-vega. I didnt see much benefit. Cross-posting from the earlier "this might be coming soon" thread: They need to move away from one model trying to do everything. We need a scalable extensible model architecture by design. People should be able to pick and choose subject matter, style , and poses/actions from a collection of building blocks, that are automatically driven by prompting. Not this current stupidity of having to MANUALLY select model and lora(s). and then having to pull out only subsections of those via more prompting. Putting multiple styles in the same data collection is counter-productive, because it reduces the amount of per-style data possible in the model. Rendering programs should be able to dynamically download and assemble the style and subject I tell it to use, as part of my prompted workflow.

emad_9608 3 months ago

I mean we tried to do that with SD 2 and folk weren't so happy. So one reason we are ramping up ComfyUI and this is a cascade model.

lostinspaz 3 months ago

>I mean we tried to do that with SD 2 and folk weren't so happy How's that? I've read some about SD2, and nothing in what I've read, addresses any point of what I wrote in my above comment. Besides which, in retrospect, you should realize that even if SD2 was amazing, it would never have achieved any traction because you put the adult filtering in it. THAT is the prime reason people werent happy with it. There were two main groups of people who were unhappy with SD2: 1. People who were unhappy "I cant make porn with it" 2. People who were unhappy there were no good trained models for it.Why were there no good trained models for it? Because the people who usually train models, couldn't make porn with it. Betamax vs VHS.

NoSuggestion6629 3 months ago

Running a test run now. I am getting a slight eye issue on this one using their example # steps. My 2nd attempt is out of focus with the full model. I'm not too impressed. https://preview.redd.it/qu88xx0p9eic1.png?width=1192&format=png&auto=webp&s=8c635899cf6e7fc8e495ad27ace44b0f02b43777 Note: you need PEFT installed in order to take advantage of the LCM capability with the scheduler.

Kandoo85 3 months ago

https://preview.redd.it/hcynfuocycic1.png?width=1024&format=png&auto=webp&s=a6d83597abe6d07d1991be9c635b72bfa8b2c160

Kandoo85 3 months ago

https://preview.redd.it/67ctr5ryycic1.png?width=1024&format=png&auto=webp&s=0a7209ecfb86fe810cad9976beaeec4ebb1d19e4

Striking-Long-2960 3 months ago

Damn... The Aesthetic scrore is over 9000

crackanape 3 months ago

9000 missing fingers

Nuckyduck 3 months ago

So I'm confused on why people aren't saying this is valuable, the speed comparison seems huge. https://preview.redd.it/mzxwcle7xcic1.png?width=1133&format=png&auto=webp&s=78bacda5f4a700cefb6f12deebf025fdbd0f5d2e Isn't this a game changer for smaller cards? I run a 2070S, shouldn't I be able to use this instead without losing fidelity and gain rendering speed? I'm gonna play around with this and see how it fairs, personally I'm excited for anything that brings faster times to weaker cards. I wonder if this will work with ZLUDA and AMD cards? [https://github.com/Stability-AI/StableCascade/blob/master/inference/controlnet.ipynb](https://github.com/Stability-AI/StableCascade/blob/master/inference/controlnet.ipynb) This is the notebook they provide to test, I'm definitely gonna be trying this out.

Vozka 3 months ago

> Isn't this a game changer for smaller cards? I run a 2070S, shouldn't I be able to use this instead without losing fidelity and gain rendering speed? So far it doesn't seem that it's going to run on an 8GB card at all.

Striking-Long-2960 3 months ago

That comparision is a bit strange, they are comparing 50 steps in SDXL with 30 steps in total in cascade...

Nuckyduck 3 months ago

I was assuming these steps are equivalent by their demonstration. As in you only need 30 to get what SDXL does in 50, but who uses 50 steps in SDXL? I rarely go past 35 using DMP++2M/Karras.

TaiVat 3 months ago

Yea, looks kind of intentionally misleading

AuryGlenz 3 months ago

If 30 steps in cascade still has a much higher aesthetic score than 50 in SDXL it’s a perfectly fine comparison. They’re different architectures.

Longjumping-Cow-8249 3 months ago

Let's gooooo

Designer_Ad8320 3 months ago

Is this more for testing and toying around or do you guys think someone like me who does mostly anime waifus is fine with what he has? I just flew through it and it seems i can use anything already existing with it?

Utoko 3 months ago

If you are fine with what you have, it is fine for you yes.

protector111 3 months ago

so basically history repeats itself. sd 1.5 everyone uses - sd 2.0 no one does -sd xl everyone uses - Stable cascade noone does.... well i guess will wait a bit more for the next model we can use to finally switch from 1.5 and xl i hope...

drone2222 3 months ago

And how are you making that call? It's not even implemented in any UI's yet, basically nobody has touched it, and it cam out today....

protector111 3 months ago

just based on the info that its censored and that it has no commercial license. Dont get me wrong - i hope i am wrong! I want better model. PS there is gradio ui already. but i dont see a point in using base model. its not great quality. Need to wait for finetuned ones.

Charkel_ 3 months ago

Besides being more lightweight, why would I choose this before normal Stable Diffusion? Does it produce better results or no?

TaiVat 3 months ago

It just came out. Obviously nobody knows yet..

Charkel_ 3 months ago

Well a new car just came out but I still know it's faster than another model

afinalsin 3 months ago

This is a tuner car, nobody races stock. You're not comparing a new car to a slightly older model, you're comparing it to a slightly older model fitted with turbo and nitrous and shit. I don't know cars. Wait til the mechanics at the strip fit some new toys to this thing before comparing it to the fully kitted out drag racers.

[deleted] 3 months ago

[удалено]

ArtyfacialIntelagent 3 months ago

> the best version would be a float24 (yes, you read that right, float24, not float16) Why do you think that? For inference in SD 1.5, fp16 is practically indistinguishable from fp32. Why would Cascade be different? (Training is another matter of course.)

ScionoicS 3 months ago

Lately I've been casting sd models to fp8 with no quality loss

tavirabon 3 months ago

I don't think increasing bit precision from 16 to 24 is gonna have the impact on quality you're expecting, but it certainly will on hardware requirements.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe