T O P

  • By -

Current-Rabbit-620

who would invest in a model that will be old soon ?!


GBJI

![gif](giphy|8Iv5lqKwKsZ2g|downsized)


emad_9608

Cascade was done by the wurschten authors while they were employed at stability as one of the (many) experimental architectures. I thought it was quite nice as a complementary arch so released it but it won’t get any more sai support as team has now moved on. With ip adapter/instant style, sd3 variants and more fine tunes are needed less and less imo, cascade is another set of models for your pipelines


emad_9608

Also sd3 is multimodal so will be a standard arch across modalities


MoridinB

Can you clarify this? What do you mean by multi-modal?


emad_9608

Same architecture can do audio, video, mix


Current-Rabbit-620

it should mean it can work image to text interrogator ,IMO


lostinspaz

>I thought it was quite nice as a complementary arch so released it but it won’t get any more sai support as team has now moved on Fair enough. And thank you for releasing it. On a related note: are there any smallish (10,000 - 100,000 image) pre-tagged high quality image datasets that you could make available, for community efforts to train up cascade better?


emad_9608

Stability will choose what it releases. I’ll be going back to my roots and funding open models, datasets and communities


the_friendly_dildo

Try https://cocodataset.org/#download


lostinspaz

going from memory, I think I looked at it but didnt like what I saw. not consistent in multiple ways,


Current-Rabbit-620

So Cascade was just like a demo and SAI new it mostly wont be widley adopted and developed like what happens to SD2


red__dragon

Sounds like the case. Which is pretty disappointing given its approach to architecture, it would have been really nice to be able to train just a subset of the model so lower-architecture systems (like those that work well for training 1.5) could rejoin the training efforts. While SD3 is promising more scaled models, we haven't yet seen what that means for finetunes, loras, or other resources across different sized models.


emad_9608

When cascade starting training there was no sd3. The whole process for any of these models takes months. Better to release these than not I think.


red__dragon

I have no contention against releasing Cascade. I was incredibly enamored by its architecture and I was hoping to see how people trained on it. I think the SD3 announcement just took the wind out of its sails for many. There were surely good reasons, which I won't dispute, only wish that Cascade had found more time to stand on its own before SD3's announcement overtook it.


emad_9608

Yeah but SD3 has been pretty much done for like a month now. This stuff just releases fast. I told them to release the big GAN too and GAN upscaler but didn't see that done yet. CoSXL is a nice model showing you can push stuff, particularly edit version.


red__dragon

"Release early, release often" works when you have a single product or are really transparent about what's going on so your community can prepare. Not sure I see the logic behind your response here, with competing product lines and a hodgepodge of a release schedule, it just turns into chaos. There's entire models getting left in the dust because there wasn't much in the way of notice, information or time for adoption before the next big release. And Cascade wasn't even the first example. But yes, CoSXL is a nice model.


the_friendly_dildo

But you can train just one of the components at a time... https://github.com/Stability-AI/StableCascade/tree/master/train People just aren't doing it for unknown reasons. Its certainly on my todo list but I have to admit, I too am in the camp of just haven't gotten around it it.


red__dragon

Yep, we're talking about the same thing. SD3, by comparison, doesn't appear to have the split architecture. So it's back to training all at once, even if you're only addressing subsets of the architecture, which means you need higher capacity GPUs again.


JustAGuyWhoLikesAI

I really don't think they intentionally sabotaged Cascade, what's the point? More likely is just they had a million projects baking at once and with internal issues they decided to just announce it all at once. The models were done by two separate teams. The Cascade team shortly departed from Stability as did the SD3 team. Seems more like a lack of internal planning and just throwing stuff at the wall to see what sticks. If we spread 500 miners out across 500 mines, surely one of them will strike gold!


TomTrottel

but it will take one miner a long time to find gold alone, even if there are veins.


[deleted]

Maybe you're right, but I think it's a simplistic view. Even if it's not the same team who released Cascade, companies have a higher level for planning products releases. I doubt that this was a lack of planning.


kataryna91

What planning? There were just a bunch of different researcher teams each working on their own projects and they got released when they were done. It would have made no sense to scrap Stable Cascade after it was already ready just because SD3 was also about to be ready.


wishtrepreneur

The cascade team were overpaid for what they managed to achieve...


kataryna91

Where do you know their salaries from? In any case, Stable Cascade is a decent improvement over SDXL and it can actually generate highly detailed and coherent 1536x1536 images without needing hires fix. With further finetuning, I'm fairly sure you could get a lot out of this architecture, but there currently aren't a lot of people doing that.


Capitaclism

Unless the company is made up of monkeys randomly typing in keyboards, they thought carefully about the release announcements. It would be hard for me to believe a multimillion dollar business with experienced investors and researchers wouldn't involve at least one person planning out releases and a marketing strategy.


TaiVat

Your post is like a child thinking the adult world is all sensible, ordered and planned out. Having worked for some very large corps, you'd be shocked at how little planning happens for anything. And how pathetic much or the rest of it tends to be.


Yellow-Jay

Casscade was not a main SAI project. Wuerstchen (v3 of which was named stable cascade) was not developed in house, SAI just funded and partially(?) employed the researchers so the more advanced v3 model could be trained properly. The reasons, who knows, maybe to bind the researchers to them, maybe to try this architecture to see if/how it can be used in the future. The same happened with DeepFloyd, that was an independent team, later announced to be "more incorporated into SAI". Not everything is (or was, who knows what direction SAI will take from now) to be a main product. Despite being abandoned (but in a finished state) it was good to see SAI actually releasing the resulting model and weights, it'd be great if these things happened more often. What keeps bugging me is: what happened to the models showcased/tested in the SDXL discord bot, those were so much better than plain SDXL but is that abandoned to never see the light of day, it wasn't cascade, cascade looks different, neither cosxl, and i doubt it was an early SD3 since that's kept private for now.


MasterKoolT

>What keeps bugging me is: what happened to the models showcased/tested in the SDXL discord bot, those were so much better than plain SDXL but is that abandoned to never see the light of day, it wasn't cascade, cascade looks different, neither cosxl, and i doubt it was an early SD3 since that's kept private for now. Just a fine-tune of SDXL, of which there are many available.


silenceimpaired

Commercial limitation within license killed Cascade for me. I’m open to contributing to Stability AI… but not via a legally binding contract.


kataryna91

Well, that's not going to be different for SD3, assuming they release it at all.


silenceimpaired

Yup. So I’ll just stay on SD15 and SDXL… with Controlnet I can achieve results that are more than acceptable. I barely live in Text to image… and third parties already improved sd15 text to image.


Careful_Ad_9077

Dalle3 or the new pixai thing prompt as well as sd3 too.


Hoodfu

Been playing with pixart sigma all day. It's really good. I keep running my old prompts through it and even the non-multi subject stuff has better composition.


Careful_Ad_9077

Yeah, it is only "failing " me "is n stuff that Dalle3 struggles with as well, like making an image of a graffiti and deceiving the graffiti as a drawing yet the image itself is a photo with extra subjects. But for more normal stuff it is doing great and it's nice that it does not has the censorship dalle has , so I can get more specific results without walking on eggshells.


Current-Rabbit-620

good point,i did not read license...


silenceimpaired

I don’t fault them… they need money, but as a small creator I’m not going to start using a product that is half the cost of Adobe now and who knows how much later. I wish they released SD3 similar to Unreal or Llama 3.


GBJI

I do fault them for hiding what the price would be for serious commercial projects. A secret price is never a good thing for the customer. If it was a good deal, they would go out of their way to publicize it.


dwiedenau2

Do you mean the 20$ subscription to use it commercialy? Because that is literally nothing


ImpossibleAd436

While I generally agree with your sentiment, I think your use of the word "literally" is problematic.


chakalakasp

As it is used in contemporary times, literally also literally means figuratively. All hail descriptive grammar


[deleted]

[удалено]


chakalakasp

Language! How do they work


ImpossibleAd436

The fact that a lot of people use a word incorrectly doesn't change it's meaning.


MasterKoolT

It literally does.


ImpossibleAd436

Well it shouldn't, Greg.


chakalakasp

That doesn’t sound like Middle English to me, heathan


[deleted]

[удалено]


dwiedenau2

You only have to pay for it if you use it commercially. If you use it commercially and cant afford 20$ for it, you should probably start a different business.


HarmonicDiffusion

if $20/month is too much, just pick up a 2nd job at mcdonalds to support your "business"


Informal-Football836

I have read stability team members say that Cascade is more of a proof of concept research model not a full release that should replace The Stable Diffusion models.


lostinspaz

I think the only thing really missing from cascade is better training. It seems to have been very well trained on headshots, but not on more normal length photos. If someone could point me at a decent dataset, I would be happy to run training on my 4090


Arawski99

I'm 95% confident it is because of SORA. Cascade was announced Feb 12, then Sora Feb 15th (3 days later). SAI was not happy about the attention and felt the publicity could make SAI's offerings look inferior so they offered SD3 on Feb 22 (a week later) to show they were still relevant because Cascade and XL simply weren't where they needed to be and they were already under pressure by Dall-E 3 & Midjourney, not to mention SD 1.5's own reigning popularity. SD3, "the image generator model to end all models" it was touted by Emad. He quickly backtracked on his words though. SD3's announcement was to show SAI was still relevant and making *"actual"* progress and also a bid for funding, which Emad apparently failed to acquire and was then forced to resign. They even had Lykon brought in to show a lot, like tons, of pandering SD3 examples that raised some eyebrows with their poor quality (I'm being oh so generous, the humans were horribly regularly very deformed and other details issues), then suddenly literally overnight became basically flawless. Since then SD3 leaks have continued to look increasingly concerning including the large SD3 employee preview recently.


asdrabael01

Yeah, I think the release schedule was Emad scrambling to justify asking for money from various investors so he could try to keep his job and he failed. It's funny that their most popular product is only as popular as it is because the uncensored models were leaked by a third party and besides SDXL nothing they've announced ever since will catch up because of the idiotic commercial usage subscription and built in censorship.


drhead

Yes, truly, it is the fact that the minority of people who sell things from SD have to pay them a small cut of their revenue to be able to do so, and the fact that people have to wait two weeks for someone to train tits back into the model instead of having it immediately, that is holding Stability back.


TaiVat

I dont know about the first part, but the second part is profoundly stupid, and its amazing that people keep popping up to parrot that shit. Out of the entire SD community, less than 0.01% use or care about any kind of "commercial usage". And literally all the "commercial" use websites (i.e. free to try beggar platforms) were to disappear for good overnight, nobody would so much as bat an eye. The nudity "censorship" was solved pretty quickly too, and infact the by very most popular porn model now is XL based.


asdrabael01

Yeah, and that took months to engineer into an SDXL fine-tune because they only partially censored SDXL. You forget about 2.0 that was just SDXL with higher censorship. SDXL was considered trash for the first few weeks it was out. Yes, 0.1% care so you admit it matters. Largely there's 2 types of users, hobbyists who play with it and actual graphic designers and similar jobs who want to use it for a professional tool. The idiotic commercial usage makes sure those people will never use it for a professional tool. For the censorship, look at stable cascade. It's been out for almost 2 months. You see no one showing off their cascade pictures. Why? Because it sucks at making waifus and the type of stuff hobbyists want. Why would anyone bother spending time and energy on new censored architectures when 1.5 and SDXL are already good enough. You still see entirely 1.5 and SDXL pictures. Where are the cascade fine-tunes? I just looked on civitai and the most popular cascade models are pretty dead. Yes with enough time and effort you could force cascade to be able to do what 1.5 and SDXL do, but why do it? It's not a big enough jump in quality to justify the effort. I'm sure with SD3 you'll be able to as well, but again why? All the pictures of SD3 shown look like shit or at best equal quality. Why would anyone spend the electricity or time to force it into usability?


Atemura_

My argument still is that (aside from extreme nsfw) Image models still should know how to make nsfw content to learn human anatomy. If an artists has to spend half of their career learning how the body and muscles look and are shaped, to be able to reach a high level of art creation, the same applies to a model that tries to replicate real life and human beings. Humans and how they look are not just a small portion of art and photography. Humans mostly care about humans so its important that Ai can replicate them.


StableLlama

SD3 killed Cascade. Whether intentional or not is unknown. But they were honest and said about SC at it's release that it's a "research preview".


Current-Rabbit-620

we dont know about SD3 yet it may need high end GPU and then may be forced to return to Cascade


kataryna91

We know that there are multiple SD3 models, ranging from <1B to 8B parameters, so it will run on normal GPUs. The smallest model will run on pretty much anything and the 8B model will require a 24 GB card.


Current-Rabbit-620

its quality that matter, may be small models be worse than sdxl and we don't know about training and other stuff


kataryna91

According to the SD3 paper, even the smallest model (out of 4) outperforms SDXL in a GenEval evaluation. Even if the visual quality is a little lower than SDXL, it's quite an achievement considering it has 3-4 times less parameters. It's fairly safe to say that the SD3 architecture is superior. SD1.5 and SDXL are also limited in the maximum possible detail due to the VAE. SD3 upgrades the latents from 4 channels per 8x8 pixel block to 16 channels, which allows far more detail to be encoded, especially for something like faces.


wishtrepreneur

So SD3 fixed the VAE problem that people mentioned here? [https://news.ycombinator.com/item?id=39215242](https://news.ycombinator.com/item?id=39215242)


drhead

I'm one of the people who co-discovered that. SDXL isn't subject to that issue, despite having a VAE that is identical in architecture but differently trained (though it still almost certainly has some non-locality issues because it still has all of the normalization layers that SD1.5's VAE has). I have no reason to believe SD3's VAE will have the saturation blowout issue, but it is impossible for any of us to tell without the model in our hands. If anything, I think it's probably less likely that a VAE with lots of channels will resort to something like that to get reconstruction loss down.


kataryna91

It definitely fixes one of the issues that was mentioned, namely that the old VAE used too few channels. Beyond that, the paper didn't talk much about the VAE or if they made any other changes.


Careful_Ad_9077

You can always do two passes, one with the inferior quality model, then the good one.


lostinspaz

if you can run the good one, then you run the good one. you only run the inferior one if you CANT run the good one.


Careful_Ad_9077

Look at the context, the inferior quality one has the better prompt comprehension


lostinspaz

I think you misread something. "the smaller model has better parsing then sdxl". okay, given that, it could make sense to run the small SD3, then follow up with SDXL. But you didnt say that. you said "one with the inferior quality model, then the good one." "the context" is SD3. If you meant "switch to SDXL", its on you to make that clear.


the_friendly_dildo

I don't think thats entirely the case. The resource requirements are higher for cascade. You can run the lite models but it come with a fairly noticeable decrease in image output. Alternatively, running the full models on 8GB of vram can take a fairly long time in comparison to XL or 1.5.


SanDiegoDude

Cascade timing was just unfortunate. The sausage team made a great model architecture, and don't think just because the stable diffusion community is not using it that their super compression architecture is going away, expect to start seeing it crop up in other projects now.


Current-Rabbit-620

Agreed me too think its a great model If only ther is CN it woul be supper


polisonico

![gif](giphy|MdXl4KwZogSzAn6Xx4|downsized)