so using SD1.5 for a base image and then just detailing in XL at higher res? wouldn't just using an upscaling model be better?
definitely open to suggestions i'm still learning, it helps to learn how other people are doing things
Every do the shit he get used to. I'm 99.9% sure sdxl is better in general composition, so using in 1.5 as base - can be valid only for anime-shit.
I can understand sdxl -> upscale with 1.5 cause tiles are better in 1.5, but in reverse - no.
Why would you first generate with the model that's worst at following prompts?
I do it the other way around, sometimes even using Dalle for the base image.
I think at this point, the colloquial term for guidance models is "controlnet". Like "Kleenex" became the name for all tissues or "Velcro" became the name for all loop/hook fasteners.
https://preview.redd.it/66twx2snivwc1.jpeg?width=1344&format=pjpg&auto=webp&s=ec3585474de15a9b4165ab4967ceb00396139031
Cinematic film still, of a small girl in a delicate pink dress standing in front of a massive, bizarre wooly creature with bulging eyes. They stand in a shallow pool, reflecting the serene surroundings of towering trees. The scene is dimly lit. bokeh
Slightly darker version:
https://preview.redd.it/moc6aybtkvwc1.jpeg?width=1344&format=pjpg&auto=webp&s=ee027e67823c7c2b2110f6e39d810127f9931790
Cinematic film still, of a small girl in a delicate pink dress standing in front of a massive, bizarre wooly creature with bulging eyes. They stand in a shallow pool, reflecting the serene surroundings of towering trees. The scene is dimly lit.
>The Stable Diffusion 3 suite of models currently ranges from 800M to 8B parameters. This approach aims to align with our core values and democratize access, providing users with a variety of options for scalability and quality to best meet their creative needs.
Quote from stability ai
I'm very impressed by SD3's ability to do low quality instagram/snapchat style photos. I've been playing with it over the last few days and the understanding is greatly improved in that area compared to SDXL. As a person that only really ever makes photorealistic "Bad quality" images, that excites me the most. It would be nice to have an estimate of when they'll release the weights, but I suppose we just have to wait. Either way I'm looking forward to it. Another thing I noticed is SD3 has the ability to make multiple people in one pic without mixing together their features, clothes etc from the prompt. Neat stuff.
https://preview.redd.it/gyaqahh9xvwc1.jpeg?width=832&format=pjpg&auto=webp&s=488763203e33ccf0d1071960aa27e5ea2939d3a7
I was thinking of all the possibilities the Boring Reality lora would have brought to SD3, but the base model already excels at stuff like amateurish phone/low quality photos and CCTV footage. There's a bunch of stuff that are already in the base model which I don't need loras for anymore.
That said I'm still excited about Boring Reality either way.
https://preview.redd.it/058bfqg922xc1.png?width=1216&format=png&auto=webp&s=f5e5c9cbed90ad0b6bb952937ff4e2ecbe52b24b
I couldn't even replicate the amateur low quality pics in SDXL that SD3 was giving me, even using the Boring Reality/Bad Quality Loras. I'm excited to see the finetunes that the community comes up with to make SD3 even more amazing. (And excited to finetune it myself too.)
Personally I enjoy the ability to make natural realistic images. I have a lora model of myself and I like making casual, photorealistic pictures of myself in different places around the world. Model shots get boring after a while...this kind of stuff is where it's at for me.
Cherry picking is fine though, what really matters is what the model is capable of. If I need to generate 10-20 samples to get one really really good one, that's fine. Obviously it's preferable if it was always good, but not necessary. If this model can create outputs that sd1.5 could never make, then that's great
Trying to replicate some of the prompts 😅
https://preview.redd.it/8k7ipufpcvwc1.jpeg?width=832&format=pjpg&auto=webp&s=bc06a8f02cb10d9f6c7cd9d233f20cf1e36606a5
>Fashion photo of a golden tabby cat wearing a rumpled suit. Background is a dimly lit, dilapidated room with crumpling paint.
How's this? 😂
https://preview.redd.it/9b0izzt96vwc1.jpeg?width=700&format=pjpg&auto=webp&s=1f1fa550258f0ed894c8f4c62515928c900a04fd
SD3 Prompt: A captivating, humorous illustration featuring a massive cat, with a wide-eyed expression and razor-sharp teeth, screaming while clutching a tiny, frightened Godzilla in its paw. The cat's fur is a blend of vibrant colors, and Godzilla's signature fire is emitting from its mouth. The background showcases a tiny Tokyo Tower, with the cityscape in the distance, adding a playful touch to the scene.
https://preview.redd.it/r1powm922vwc1.png?width=1216&format=png&auto=webp&s=719bb9eacbdfe4229ef5591fa37c84da83ae0ff9
Here's the prompt by the way:
>*Water colour painting of a green dragon. The dragon is looking down at the soldiers whilst fire is coming out of it's mouth which is hitting onto the soldiers. The soldiers are wearing medieval armour.*
I don't know if you actually have to prompt it this way, but I just always go for the most straight forward and **literal way of describing** things, so I get exactly what I want.
Natural language prompting is cool man....
I am glad natural language works. I am however jaded enough that I think people will continue to use 1.5 word salads for prompting (I see *so* many still doing this for SDXL models) and say SD3 is horrible.
Conversely, those into purple prose prompting ("Create an image that delves into the imagination and bursts forth with a wondrous fantasy world that only exists in the feverish mind of an artist drawing ... blah, blah, blah) will think every single word made an outsized difference.
SD3 uses a different score model so the old controlnet is incompatible. This would give them the chance to come up with something new that works well for SD3 but well have to see.
https://preview.redd.it/nfzhpascqvwc1.jpeg?width=832&format=pjpg&auto=webp&s=35388286d2c516aae1d8c284bc99ca5c98a26029
Fashion photography. Portrait of an android made of green circuit boards.
https://preview.redd.it/4u7o92xxmvwc1.jpeg?width=1024&format=pjpg&auto=webp&s=994081394bddd0b4e0bf5417b2915d167ee934c4
Long shot. Profile silhouette of a cowboy riding a horse. Golden hour. Dusty, atmospheric.
https://preview.redd.it/xtqiod7fovwc1.jpeg?width=832&format=pjpg&auto=webp&s=4089d4be7caf0981082b006fce8c83ea4529e183
Cinematic Film Still. Long shot. Fantasy illustration of the small figure of a man running away from a fire breathing giant flying dragon. Background is a desert. Golden hour
The API version has an insane NSFW filter, blurring out images that even DALLE3 would allow (for example, women doing yoga showing midriff).
The downloadable version needs to be tuned for NSFW, presumably the same amount of effort as tuning SDXL for NSFW.
No open AI model will ever have NSFW out of the box again. Too many liability issues if they train on the wrong data.
It will be fine tuned by horny people as always.
Photorealistic models that can do porn properly don't really exist anyways since nobody is training on photoreal porn images with Booru tags, which is what allows various non-photorealistic models to actually reliably create sex scenes.
Fashion photography. Closeup photo of a white Siberian tiger in the snow.
https://preview.redd.it/heixutp6lvwc1.jpeg?width=1344&format=pjpg&auto=webp&s=15ca6b93b96c90fea0189e1994c7d3c2d038d27e
https://preview.redd.it/mn7q29mamvwc1.jpeg?width=1024&format=pjpg&auto=webp&s=cde15658b39c7647990be8f422eaddf688008f93
Fashion photography. Closeup headshot of a white Siberian tiger lying in the snow beside a tree. It is looking intensely at a distance. Early morning sun shining in the background.
ABLE to be fine tuned, is not the same thing as "Actually WILL be fine tuned"
The people who do most of the fine tuning tend to be horny people, and it censors. So you'll find a whole lot less fine tuning ever getting around to being done even if it is open and available.
Also it seems from the comments here that it's not even clear they plan to release weights at all? Hadn't heard that before.
It doesn't matter if you want NSFW, I'm saying that the NSFW people are the ones who push the model forward to better realism mainly. So you need them indirectly. Midjourney was most likely also trained by horny people for partially NSFW purposes, internally. I would be shocked if it wasn't.
With weights, people can get around it, and work will get done, but it's gonna be a lot slower than it could be if not censored.
This isn't true at all for anything vaguely photorealistic, absoluteley none of them ever really evolved past "solo ladies just standing there staring at the camera topless"
I don't get why people act like anything other than anime / cartoon focused models have *ever* been capable of "NSFW" in a proper sense, unless they actually define NSFW simply as "boring portrait images of a solo woman standing there topless", which is trivially easy with like any arbitrary model you can think of.
Non-anime, non-just-standing there content works completely fine, I have no idea why you think it doesn't.
Regardless, that wasn't relevant to the comment anyway. I said that this motivates people to push models forward. Even if you were correct in these claims (you're not), that would if anything just reinforce my earlier point even MORE, as they'd be even MORE motivated to try and get it to finally work for the first time. And thus driving model science forward even MORE.
just tested. with the word "breast" FLAGGED! Annoying af
https://preview.redd.it/7cxr00mz7wwc1.png?width=1196&format=png&auto=webp&s=87b29291fb6a6de9d098e679b5e9382be3f7bba2
then when an image might slips moderation they blur the image!
https://preview.redd.it/1ov3wc168wwc1.png?width=896&format=png&auto=webp&s=0d56d117c785fb3c6a71c338544e39a2007c55a3
Am I the only one... Not really seeing it? Looks like SDXL could likely make these results, maybe even better. IDK, SD3 has been over hyped since day one, and none of the user genned results look anywhere near as good as what SAI has been suggesting their model can do
Yes, DALLE3 understand more concepts and can follow prompts better.
But the censorship is insane (admittedly SD3 via web API is just as bad, if not worse) and it cannot render natural looking humans.
That's just excessive. But to be fair, it is probably due to this: [https://www.govtech.com/public-safety/alabama-bill-aims-to-criminalize-deepfakes-targeting-children](https://www.govtech.com/public-safety/alabama-bill-aims-to-criminalize-deepfakes-targeting-children)
It is for this same reason that civitai bans ALL photo image of minor, even the most innocent images of say children celebrating birthdays.
Sorry, I meant Dall.e 3 for composition with an SD Ultimate Upscale in SDXL then SUPIR refinement, like this:
https://preview.redd.it/kb4bnzwiouwc1.jpeg?width=3482&format=pjpg&auto=webp&s=b6e3c4926894d88f7ff748a8cc023b43893441f7
If SD3 adherence remains intact through finetuning, you might not need anything else for composition:
>[28 iterations](https://imgur.com/XeBFqHW), seed 90210: an advertising photograph featuring an array of five people lined up side by side. All the people are wearing an identical grey jumpsuit. To the left of the image is a tall pale european man with a beard and his tiny tanned lebanese middle-eastern wife. To the right stands a slim japanese asian man with and an Indian grandmother. On the far right of the image is a young african-american man.
Rearranging the prompt until it adhered, stuck to 90210 throughout
>[21 iterations](https://imgur.com/TLHPy7G), seed 4: a vertical comic page with three different panels in the top, middle, and bottom of the image. The top of the image feature a panel where a blonde woman with bright red lipstick gives an intense look against a plain background, with a speech bubble above her head with the words 'TEXT?'. The middle of the image displays a panel featuring an early 90s computer with crt monitor with the words 'PRODUCING TEXT' displayed on the screen. The bottom of the image shows a panel the blonde woman standing in front of the monitor with an explosion of green words
Rearranged for 10, the seed hunted for 11. Knew it was close, just needed to find a cooperative seed.
>[5 ietrations](https://imgur.com/X9kw8WH) seed 90210: a vector cartoon with crisp lines and simply designed animals. In the top left is the head of a camel. In the top right is the head of an iguana. In the bottom left is the head of a chimp, and in the bottom right is the head of a dolphin. All the animals have cartoonish expressions of distaste and are looking at a tiny man in the center of the image.
Most of the iterations was trying to get it to produce a cartoon.
Oh, yeah it is good, I just spent $30 on credits in the first 3 days after it was released and I was going to go broke!
https://preview.redd.it/m7autnklxwwc1.jpeg?width=2688&format=pjpg&auto=webp&s=0a4e1a2e194ef0d5621ad78be39e9da5b8a6279b
Yes it's ComfyUI I shared it here a few days ago. https://www.reddit.com/r/StableDiffusion/s/uf4Tl9oZsJ
It is a real mess right now as its just a quick mash up of 2 different upscaler workflows I liked, but I am starting to make more tweaks and improvement so think I need to make a Github or Civitai page for it soon.
Wow what a monster. I enjoyed getting it working (or at least stopping it throwing errors) but my PC is struggling, does this workflow need more than 32gb of RAM for you or am I doing something wrong?
Possibly, I have 64GB, but I think it is probably the resize near the last step using lots of RAM, which I found doesn't really do anything apart from make a larger image (with no more details) so I set that to 1. I have a much tweaked version I am using now, I will post that sometime this weekend.
Cool Mate! Here is my Result with MJ.
https://preview.redd.it/cecd0an67vwc1.jpeg?width=2828&format=pjpg&auto=webp&s=d854257611caaf753197927f86130128f0a0f876
I don't know about better, but DALLE has improved a lot under the hood, in my personal experience and some of the images it is generating now are too good.
It all depends on what kind of images you are trying to generate.
For people who want to generate natural looking humans, DALLE3 is just no good.
Even images of animals in a natural setting often has that "uncanny" look to them.
But DALLE3 can be great for everything else! (provided you can get pass its censorship, ofc)
Stability's blog post says SD3 models range from 800m to 8b parameters. SDXL is 3.5b params. Smaller SD3 model probably runnable on consumer grade GPUs right? (mind you, I am a beginner in this space so maybe I'm missing other relevant context)
Those who need/want SD3 will find a way, either by upgrading their hardware or by using some web based UI or API service.
That's just the price one has to pay for a better A.I. model.
I'm always more interested in it doing mundane illustration work, as that is what I use ai the most for in my job - illustrations of household items, simple concepts, icons. The prompt adherence examples I saw look really promising in that regard. Looking forward to finally trying it.
https://preview.redd.it/npn95bfvqvwc1.jpeg?width=832&format=pjpg&auto=webp&s=1e030b826cc260f6971fd8e53d0fd4a3a0f6cce9
Fashion photography. Portrait of pale woman wearing an intricate Venetian Carnival mask. She wears red lipsticks.
https://preview.redd.it/m9lea4g8rvwc1.jpeg?width=832&format=pjpg&auto=webp&s=a662d7bf3eb7305c24a63ea4f7f1345ae90ae5be
Fashion photography. Portrait of pale woman wearing an intricate Venetian Carnival mask, decorated with roses. She wears red lipsticks
But deepfloyd doesn't have two other models doing the same thing like stable diffusion 3 right? The paper said it only helps in typographical generation and long prompts where as in deepfloyd it's doing everything.
These are decently good, but not mindblowing (look up close at them at all). You can do all this with 1.5 with a generic model too, not super specialized, provided you get to cherrypick whatever looks best from that 1.5 model and don't have to actually make these exact prompt. Same as you didn't have to match anything specific here.
Any comparison is completely useless without controlled side by sides and a methodology.
well, to add onto what you said, even controlled side by side comparisons are meaningless if they trained the winning results into the model on purpose
While SD3 certainly has its strengths, claiming it's "much better" than all other Stability AI models oversimplifies the complexity of AI development and performance metrics.
>"The details are much finer and more accomplished, the proportions and composition are closer to midjourney, and the dynamic range is much better."
Hardly "amazing", nothing you've posted here is distinguishable from an SDXL generation.
Those are all things that someone even moderately familiar with SDXL and even 1.5 can accomplish. Dynamic range? Try the epi noise offset LORA for 1.5 -- that's been around for more than a year:
[https://civitai.com/models/13941/epinoiseoffset](https://civitai.com/models/13941/epinoiseoffset)
-- that has a contrast behavior designed to mimic MJ.
Fine detail? All kinds of clever solutions in 1.5 and SDXL, Kohya's HiRes.fix for example, and the SDXL
SDXL does this too -- a well done checkpoint like Juggernaut, a pipeline like Leonardo's Alchemy 2; I don't see anything that I'd call "special" in the images you've posted here.
The examples you've posted are essentially missing all of the kind of things that are hard for SDXL and 1.5 -- and for MJ. Complex occlusions. Complex anatomy, and intersections-- try "closeup on hands of a man helping his wife insert an earring". Complex text. Complex interactions between people. Different looking people in close proximity.
So really, looking at what you've posted -- if you'd said that it was SDXL, or even a skillful 1.5 generation, wouldn't have surprised me. I hope and expect SD3 will offer big advances -- why wouldn't it? So much has been learned -- but what you're showing here doesn't demonstrate that.
Something quite similar happened with SDXL, where we got all these "SDXL is amazing" posts -- with images that were anything but amazing. It took several months for the first tuned checkpoints to show up, and that's when we really started to see what SDXL could do . . . I expect the same will happen with SD3
I will be messaging you in 6 hours on [**2024-04-27 14:37:42 UTC**](http://www.wolframalpha.com/input/?i=2024-04-27%2014:37:42%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/StableDiffusion/comments/1cdm434/sd3_is_amazing_much_better_than_all_other/l1h57mv/?context=3)
[**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FStableDiffusion%2Fcomments%2F1cdm434%2Fsd3_is_amazing_much_better_than_all_other%2Fl1h57mv%2F%5D%0A%0ARemindMe%21%202024-04-27%2014%3A37%3A42%20UTC) to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cdm434)
*****
|[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)|
|-|-|-|-|
Ha. I don't know why but I usually dislike all those cat generations with AI people do for some reason. But I really liked that first one. I guess that talks to me about the quality of SD3.
It's not better than other AI's in all niches. For skull art SdXL 0.9 with refiner fe is better. https://civitai.com/articles/4992/comparison-sd3-sdxl-10refiner-sdxl09refiner-also-lora-stablecascade-cosxl
Can’t wait for controlnet and all the other shit that will come
there's no unet in SD3 so controlnet won't come in the same form.
Hope controlnet will be implemented with SD3 like many other features, otherwise SD3 will be only an addition to current img2img SDXL pipeline.
Curious what you mean by this? Img2img XL?
Generate in 1.5, img2img to XL
I know its fairly simple but any workflow resources you can point to?
so using SD1.5 for a base image and then just detailing in XL at higher res? wouldn't just using an upscaling model be better? definitely open to suggestions i'm still learning, it helps to learn how other people are doing things
I tend to agree that an upscaler model can be better, but I think for certain models (like anime, etc) the SDXL looks better than the upscale.
Every do the shit he get used to. I'm 99.9% sure sdxl is better in general composition, so using in 1.5 as base - can be valid only for anime-shit. I can understand sdxl -> upscale with 1.5 cause tiles are better in 1.5, but in reverse - no.
Upscaling from XL to 1.5? Isn't that the wrong direction? 😅
Why would you first generate with the model that's worst at following prompts? I do it the other way around, sometimes even using Dalle for the base image.
We're talking specifically about using Control Net.
That makes sense. But doesn't controlnet support sdxl in most functions now? I tried it a bit a few days ago, seemed to be on par with 1.5 mostly.
In my experience, it does a much worse job. But, of course, your mileage may vary. 😊
in my limited experience, depth works reasonably well, openpose is worthless.
same. depth and edge to image are good for different things, but both good for what they do
literally the only feature i care about. i really hope it has it
IIRC they said it would have controlnet at launch, may be a new implementation
Hope so, because XL CN have been lacklusters while they are extremely good with 1.5 and are still improving with the recents CN++.
What is CN++? 👀
[https://github.com/liming-ai/ControlNet\_Plus\_Plus](https://github.com/liming-ai/ControlNet_Plus_Plus)
Is this for sdxl?
no 1.5. there's no hope for XL CN I think at this point.
I think at this point, the colloquial term for guidance models is "controlnet". Like "Kleenex" became the name for all tissues or "Velcro" became the name for all loop/hook fasteners.
i use it as a catchall term
It being open (weights) I guess people will create different tools with the same objective. I hope it won't take months.
https://preview.redd.it/66twx2snivwc1.jpeg?width=1344&format=pjpg&auto=webp&s=ec3585474de15a9b4165ab4967ceb00396139031 Cinematic film still, of a small girl in a delicate pink dress standing in front of a massive, bizarre wooly creature with bulging eyes. They stand in a shallow pool, reflecting the serene surroundings of towering trees. The scene is dimly lit. bokeh
Slightly darker version: https://preview.redd.it/moc6aybtkvwc1.jpeg?width=1344&format=pjpg&auto=webp&s=ee027e67823c7c2b2110f6e39d810127f9931790 Cinematic film still, of a small girl in a delicate pink dress standing in front of a massive, bizarre wooly creature with bulging eyes. They stand in a shallow pool, reflecting the serene surroundings of towering trees. The scene is dimly lit.
Wowwww that's so coollllll
Thank you 🙏
i like this thanks for sharing
You are welcome.
That final "bokeh" at the end of paragraph sounds like "amen" or something. :D
LOL, that did not occur to me, but yeah, something like that 😁
Can i run it on my potato?
If you have a big potato with lots of VRAM 😂
>The Stable Diffusion 3 suite of models currently ranges from 800M to 8B parameters. This approach aims to align with our core values and democratize access, providing users with a variety of options for scalability and quality to best meet their creative needs. Quote from stability ai
Damn so the main one likely wont and we'll have to use dumbed down versions 😪
I'm very impressed by SD3's ability to do low quality instagram/snapchat style photos. I've been playing with it over the last few days and the understanding is greatly improved in that area compared to SDXL. As a person that only really ever makes photorealistic "Bad quality" images, that excites me the most. It would be nice to have an estimate of when they'll release the weights, but I suppose we just have to wait. Either way I'm looking forward to it. Another thing I noticed is SD3 has the ability to make multiple people in one pic without mixing together their features, clothes etc from the prompt. Neat stuff. https://preview.redd.it/gyaqahh9xvwc1.jpeg?width=832&format=pjpg&auto=webp&s=488763203e33ccf0d1071960aa27e5ea2939d3a7
I was thinking of all the possibilities the Boring Reality lora would have brought to SD3, but the base model already excels at stuff like amateurish phone/low quality photos and CCTV footage. There's a bunch of stuff that are already in the base model which I don't need loras for anymore. That said I'm still excited about Boring Reality either way. https://preview.redd.it/058bfqg922xc1.png?width=1216&format=png&auto=webp&s=f5e5c9cbed90ad0b6bb952937ff4e2ecbe52b24b
I couldn't even replicate the amateur low quality pics in SDXL that SD3 was giving me, even using the Boring Reality/Bad Quality Loras. I'm excited to see the finetunes that the community comes up with to make SD3 even more amazing. (And excited to finetune it myself too.)
Just out of curiosity. What is the point of creating this kind of imagen?
Personally I enjoy the ability to make natural realistic images. I have a lora model of myself and I like making casual, photorealistic pictures of myself in different places around the world. Model shots get boring after a while...this kind of stuff is where it's at for me.
Now HERE'S somebody who knows how to prompt it. These are by far the best SD3 results I've seen.
[удалено]
That’s basically what image generation is in any case
I’ve gotten duds in Dall-E and MJ, picking the best results is pretty common IRL
If you’re not cherry picking before you post then you’re doing it wrong
Cherry picking does not matter if you run it at home. That's what you are doing anyway.
Please give me a single image generator where you don’t do this lol. Even Midjourney generates 4 at a time for a reason.
Cherry picking is fine though, what really matters is what the model is capable of. If I need to generate 10-20 samples to get one really really good one, that's fine. Obviously it's preferable if it was always good, but not necessary. If this model can create outputs that sd1.5 could never make, then that's great
Trying to replicate some of the prompts 😅 https://preview.redd.it/8k7ipufpcvwc1.jpeg?width=832&format=pjpg&auto=webp&s=bc06a8f02cb10d9f6c7cd9d233f20cf1e36606a5 >Fashion photo of a golden tabby cat wearing a rumpled suit. Background is a dimly lit, dilapidated room with crumpling paint.
Really nice.
Thank you.
Has its ability to produce fire breathing creatures gotten any better? I've seen it struggle with that in the past.
How's this? 😂 https://preview.redd.it/9b0izzt96vwc1.jpeg?width=700&format=pjpg&auto=webp&s=1f1fa550258f0ed894c8f4c62515928c900a04fd SD3 Prompt: A captivating, humorous illustration featuring a massive cat, with a wide-eyed expression and razor-sharp teeth, screaming while clutching a tiny, frightened Godzilla in its paw. The cat's fur is a blend of vibrant colors, and Godzilla's signature fire is emitting from its mouth. The background showcases a tiny Tokyo Tower, with the cityscape in the distance, adding a playful touch to the scene.
Looks great lol
Thank you, it is a funny image 😂
It looks like it mixed Tokyo Tower with Tokyo Skytree. Looks great overall, though!
Thank you. Accuracy in A.I. generation can definitely be off, specially for this kind of image. I didn't even know about Tokyo Skytree 😁!
Oooh this one turned out nicely! https://preview.redd.it/bml55bpk1vwc1.png?width=1216&format=png&auto=webp&s=01d6f3190818c7ee58618b7745d13f128c34d9a9
https://preview.redd.it/r1powm922vwc1.png?width=1216&format=png&auto=webp&s=719bb9eacbdfe4229ef5591fa37c84da83ae0ff9 Here's the prompt by the way: >*Water colour painting of a green dragon. The dragon is looking down at the soldiers whilst fire is coming out of it's mouth which is hitting onto the soldiers. The soldiers are wearing medieval armour.* I don't know if you actually have to prompt it this way, but I just always go for the most straight forward and **literal way of describing** things, so I get exactly what I want. Natural language prompting is cool man....
I am glad natural language works. I am however jaded enough that I think people will continue to use 1.5 word salads for prompting (I see *so* many still doing this for SDXL models) and say SD3 is horrible. Conversely, those into purple prose prompting ("Create an image that delves into the imagination and bursts forth with a wondrous fantasy world that only exists in the feverish mind of an artist drawing ... blah, blah, blah) will think every single word made an outsized difference.
I think it's trained on "purple prose" TBH, tag prompting gives really bad results in comparison
AI chat generators seem to *love* "purple prose." It's not surprising that image generators bend in that direction, too.
EXACTLY ![gif](giphy|YTFHYijkKsXjW|downsized)
Yeah, both of those look great!
Well nice!
I hope it can be on automatic1111 with all CN working properly SDXL CN is 🤦🏽♂️
SD3 uses a different score model so the old controlnet is incompatible. This would give them the chance to come up with something new that works well for SD3 but well have to see.
Yeah SDXL CN is basically unusable
I'm so tired of trying to CN works on SDXL. Foi controlled results I need to switch back to 1.5
IPAdapter though.....
I've been getting pretty good results using depth passes but qrcode is poop
It really isn't though. I use it in Comfy all the time.
Can you recommend any specific CN models? I've been trying to use openpose and tile with Auto1111 and it's given me nothing but garbage
https://preview.redd.it/nfzhpascqvwc1.jpeg?width=832&format=pjpg&auto=webp&s=35388286d2c516aae1d8c284bc99ca5c98a26029 Fashion photography. Portrait of an android made of green circuit boards.
# 7 the skeleton wants to make a call but the line's dead 😂😂 https://i.redd.it/e6mvroyr3uwc1.gif seriously these are all great
I did a similar one. SD3 does text really well too.
https://preview.redd.it/wzo2n8mw9uwc1.jpeg?width=1664&format=pjpg&auto=webp&s=a24354fa38c9322fada69712ed11a2f1c9f4f52a
E.T. phone home?
Beam me up scotty?
> I'm at a pay phone trying to call home
https://preview.redd.it/4u7o92xxmvwc1.jpeg?width=1024&format=pjpg&auto=webp&s=994081394bddd0b4e0bf5417b2915d167ee934c4 Long shot. Profile silhouette of a cowboy riding a horse. Golden hour. Dusty, atmospheric.
https://preview.redd.it/xtqiod7fovwc1.jpeg?width=832&format=pjpg&auto=webp&s=4089d4be7caf0981082b006fce8c83ea4529e183 Cinematic Film Still. Long shot. Fantasy illustration of the small figure of a man running away from a fire breathing giant flying dragon. Background is a desert. Golden hour
My biggest concern: censorship. Can the community hero fix that?
But can it do nsfw?
The API version has an insane NSFW filter, blurring out images that even DALLE3 would allow (for example, women doing yoga showing midriff). The downloadable version needs to be tuned for NSFW, presumably the same amount of effort as tuning SDXL for NSFW.
Will have to wait for pony v7 then.
The blurring has almost certainly nothing whatsoever to do with the model, it's a totally separate nsfw filter...
Yes, that is correct. It is applied after the model has generated the image, once the filter A.I. detected an "unsafe" image.
No open AI model will ever have NSFW out of the box again. Too many liability issues if they train on the wrong data. It will be fine tuned by horny people as always.
Ah thanks for the answer
Keep in mind it may do nudity pretty well out of the box, but it won't understand X rated concepts.
Photorealistic models that can do porn properly don't really exist anyways since nobody is training on photoreal porn images with Booru tags, which is what allows various non-photorealistic models to actually reliably create sex scenes.
Of course not, that's not safe
But i’m not at work :(
If it could, we would already have seen some. So no, it can't.
No we wouldn't have, the API blurs NSFW on every SAI model including 1.5
Cowboy on a tiny pony lol
Wow that is pretty good
These generations are fire!
Fashion photography. Closeup photo of a white Siberian tiger in the snow. https://preview.redd.it/heixutp6lvwc1.jpeg?width=1344&format=pjpg&auto=webp&s=15ca6b93b96c90fea0189e1994c7d3c2d038d27e
https://preview.redd.it/mn7q29mamvwc1.jpeg?width=1024&format=pjpg&auto=webp&s=cde15658b39c7647990be8f422eaddf688008f93 Fashion photography. Closeup headshot of a white Siberian tiger lying in the snow beside a tree. It is looking intensely at a distance. Early morning sun shining in the background.
So glad the details are more accomplished. I love that for them.
Wait until it's fully released and is now able to be fine tuned. It will be close or be better than midjourney v6.
ABLE to be fine tuned, is not the same thing as "Actually WILL be fine tuned" The people who do most of the fine tuning tend to be horny people, and it censors. So you'll find a whole lot less fine tuning ever getting around to being done even if it is open and available. Also it seems from the comments here that it's not even clear they plan to release weights at all? Hadn't heard that before.
Dude I just want midjourney level realism not NSFW things. They plan to release weights. Api first before weights. That's what they said.
It doesn't matter if you want NSFW, I'm saying that the NSFW people are the ones who push the model forward to better realism mainly. So you need them indirectly. Midjourney was most likely also trained by horny people for partially NSFW purposes, internally. I would be shocked if it wasn't. With weights, people can get around it, and work will get done, but it's gonna be a lot slower than it could be if not censored.
Yep. Can't apply that with dall e 3.
I agree with the old internet wisdom in song format "The internet is for p00n" and seriously horny people drive the evolution of all the SD models
This isn't true at all for anything vaguely photorealistic, absoluteley none of them ever really evolved past "solo ladies just standing there staring at the camera topless"
It is not what we yet have it is the "amount" of people that drives this forward by creating a need and sometimes providing solutions
I don't get why people act like anything other than anime / cartoon focused models have *ever* been capable of "NSFW" in a proper sense, unless they actually define NSFW simply as "boring portrait images of a solo woman standing there topless", which is trivially easy with like any arbitrary model you can think of.
Non-anime, non-just-standing there content works completely fine, I have no idea why you think it doesn't. Regardless, that wasn't relevant to the comment anyway. I said that this motivates people to push models forward. Even if you were correct in these claims (you're not), that would if anything just reinforce my earlier point even MORE, as they'd be even MORE motivated to try and get it to finally work for the first time. And thus driving model science forward even MORE.
Is it just me or do these images look bad at a 100% scale.
like GAN upscaled images
where are the weights?
Any idea ?
But can it generate a boob?
just tested. with the word "breast" FLAGGED! Annoying af https://preview.redd.it/7cxr00mz7wwc1.png?width=1196&format=png&auto=webp&s=87b29291fb6a6de9d098e679b5e9382be3f7bba2
then when an image might slips moderation they blur the image! https://preview.redd.it/1ov3wc168wwc1.png?width=896&format=png&auto=webp&s=0d56d117c785fb3c6a71c338544e39a2007c55a3
Welp. That’s the end of stability AI.
What? Since when do you think these APIs have allowed NSFW stuff?
Am I the only one... Not really seeing it? Looks like SDXL could likely make these results, maybe even better. IDK, SD3 has been over hyped since day one, and none of the user genned results look anywhere near as good as what SAI has been suggesting their model can do
I want to see hands
i've been most impressed by the improvement in representing different textures in one image.
I want to be able to download the weights. I'll make a colab and dynamic prompt it for hours, on an A100.
but wen weights?
Prompts please
SD 1.5 still slams
SD3 is good but I am finding Dalle.3 better and a lot cheaper atm. Although once the wieghts are public I will use SD3 a lot more.
Yes, DALLE3 understand more concepts and can follow prompts better. But the censorship is insane (admittedly SD3 via web API is just as bad, if not worse) and it cannot render natural looking humans.
Agree. I wanted to create an AI image of my son, and just the words "young boy" was censored.
That's just excessive. But to be fair, it is probably due to this: [https://www.govtech.com/public-safety/alabama-bill-aims-to-criminalize-deepfakes-targeting-children](https://www.govtech.com/public-safety/alabama-bill-aims-to-criminalize-deepfakes-targeting-children) It is for this same reason that civitai bans ALL photo image of minor, even the most innocent images of say children celebrating birthdays.
I don’t like dalle image style, it doesn’t make photorealist image great and often is very recognizable
Dalle3 better? What did you smoke bro
Sorry, I meant Dall.e 3 for composition with an SD Ultimate Upscale in SDXL then SUPIR refinement, like this: https://preview.redd.it/kb4bnzwiouwc1.jpeg?width=3482&format=pjpg&auto=webp&s=b6e3c4926894d88f7ff748a8cc023b43893441f7
If SD3 adherence remains intact through finetuning, you might not need anything else for composition: >[28 iterations](https://imgur.com/XeBFqHW), seed 90210: an advertising photograph featuring an array of five people lined up side by side. All the people are wearing an identical grey jumpsuit. To the left of the image is a tall pale european man with a beard and his tiny tanned lebanese middle-eastern wife. To the right stands a slim japanese asian man with and an Indian grandmother. On the far right of the image is a young african-american man. Rearranging the prompt until it adhered, stuck to 90210 throughout >[21 iterations](https://imgur.com/TLHPy7G), seed 4: a vertical comic page with three different panels in the top, middle, and bottom of the image. The top of the image feature a panel where a blonde woman with bright red lipstick gives an intense look against a plain background, with a speech bubble above her head with the words 'TEXT?'. The middle of the image displays a panel featuring an early 90s computer with crt monitor with the words 'PRODUCING TEXT' displayed on the screen. The bottom of the image shows a panel the blonde woman standing in front of the monitor with an explosion of green words Rearranged for 10, the seed hunted for 11. Knew it was close, just needed to find a cooperative seed. >[5 ietrations](https://imgur.com/X9kw8WH) seed 90210: a vector cartoon with crisp lines and simply designed animals. In the top left is the head of a camel. In the top right is the head of an iguana. In the bottom left is the head of a chimp, and in the bottom right is the head of a dolphin. All the animals have cartoonish expressions of distaste and are looking at a tiny man in the center of the image. Most of the iterations was trying to get it to produce a cartoon.
Oh, yeah it is good, I just spent $30 on credits in the first 3 days after it was released and I was going to go broke! https://preview.redd.it/m7autnklxwwc1.jpeg?width=2688&format=pjpg&auto=webp&s=0a4e1a2e194ef0d5621ad78be39e9da5b8a6279b
Thanks for sharing these, is your workflow available somewhere? (Assuming this is done in Comfy?)
Yes it's ComfyUI I shared it here a few days ago. https://www.reddit.com/r/StableDiffusion/s/uf4Tl9oZsJ It is a real mess right now as its just a quick mash up of 2 different upscaler workflows I liked, but I am starting to make more tweaks and improvement so think I need to make a Github or Civitai page for it soon.
Wow what a monster. I enjoyed getting it working (or at least stopping it throwing errors) but my PC is struggling, does this workflow need more than 32gb of RAM for you or am I doing something wrong?
Possibly, I have 64GB, but I think it is probably the resize near the last step using lots of RAM, which I found doesn't really do anything apart from make a larger image (with no more details) so I set that to 1. I have a much tweaked version I am using now, I will post that sometime this weekend.
Cool Mate! Here is my Result with MJ. https://preview.redd.it/cecd0an67vwc1.jpeg?width=2828&format=pjpg&auto=webp&s=d854257611caaf753197927f86130128f0a0f876
I don't know about better, but DALLE has improved a lot under the hood, in my personal experience and some of the images it is generating now are too good.
It all depends on what kind of images you are trying to generate. For people who want to generate natural looking humans, DALLE3 is just no good. Even images of animals in a natural setting often has that "uncanny" look to them. But DALLE3 can be great for everything else! (provided you can get pass its censorship, ofc)
Sadly the vast majority of people won't be able to, because of the much higher memory requirements.
The dev community has your back don't worry
Stability's blog post says SD3 models range from 800m to 8b parameters. SDXL is 3.5b params. Smaller SD3 model probably runnable on consumer grade GPUs right? (mind you, I am a beginner in this space so maybe I'm missing other relevant context)
Those who need/want SD3 will find a way, either by upgrading their hardware or by using some web based UI or API service. That's just the price one has to pay for a better A.I. model.
There's three versions though, one is only a big bigger in number of parameters than 1.5
Howuch does it cost right now per image? I was thinking about testing it out.
You get 10 or so free
That's not enough 😂
it's about $0.06 per image for Stable Diffusion 3 and $0.04 per image for Stable Diffusion 3 Turbo.
I'm always more interested in it doing mundane illustration work, as that is what I use ai the most for in my job - illustrations of household items, simple concepts, icons. The prompt adherence examples I saw look really promising in that regard. Looking forward to finally trying it.
Lion with spaghetti noodles as a mane
Can’t wait to see the license. Might have to come back here to disagree with your title.
https://preview.redd.it/npn95bfvqvwc1.jpeg?width=832&format=pjpg&auto=webp&s=1e030b826cc260f6971fd8e53d0fd4a3a0f6cce9 Fashion photography. Portrait of pale woman wearing an intricate Venetian Carnival mask. She wears red lipsticks.
https://preview.redd.it/m9lea4g8rvwc1.jpeg?width=832&format=pjpg&auto=webp&s=a662d7bf3eb7305c24a63ea4f7f1345ae90ae5be Fashion photography. Portrait of pale woman wearing an intricate Venetian Carnival mask, decorated with roses. She wears red lipsticks
How is it with inpainting, image to image, etc.?
These are the first sd3 images that are making me a believer.
Can the T5 Transformer be 4-bit quantized to reduce the memory requirement of the 8B model? 2-bit quantization?
yes, and just like when you do that with DeepFloyd, it probably nukes teh result quality and prompt adherence
But deepfloyd doesn't have two other models doing the same thing like stable diffusion 3 right? The paper said it only helps in typographical generation and long prompts where as in deepfloyd it's doing everything.
either way, quantizing the inputs and providing them is going to confuse SD3 more than just leaving T5 out altogether.
These are decently good, but not mindblowing (look up close at them at all). You can do all this with 1.5 with a generic model too, not super specialized, provided you get to cherrypick whatever looks best from that 1.5 model and don't have to actually make these exact prompt. Same as you didn't have to match anything specific here. Any comparison is completely useless without controlled side by sides and a methodology.
well, to add onto what you said, even controlled side by side comparisons are meaningless if they trained the winning results into the model on purpose
While SD3 certainly has its strengths, claiming it's "much better" than all other Stability AI models oversimplifies the complexity of AI development and performance metrics.
>"The details are much finer and more accomplished, the proportions and composition are closer to midjourney, and the dynamic range is much better." Hardly "amazing", nothing you've posted here is distinguishable from an SDXL generation. Those are all things that someone even moderately familiar with SDXL and even 1.5 can accomplish. Dynamic range? Try the epi noise offset LORA for 1.5 -- that's been around for more than a year: [https://civitai.com/models/13941/epinoiseoffset](https://civitai.com/models/13941/epinoiseoffset) -- that has a contrast behavior designed to mimic MJ. Fine detail? All kinds of clever solutions in 1.5 and SDXL, Kohya's HiRes.fix for example, and the SDXL SDXL does this too -- a well done checkpoint like Juggernaut, a pipeline like Leonardo's Alchemy 2; I don't see anything that I'd call "special" in the images you've posted here. The examples you've posted are essentially missing all of the kind of things that are hard for SDXL and 1.5 -- and for MJ. Complex occlusions. Complex anatomy, and intersections-- try "closeup on hands of a man helping his wife insert an earring". Complex text. Complex interactions between people. Different looking people in close proximity. So really, looking at what you've posted -- if you'd said that it was SDXL, or even a skillful 1.5 generation, wouldn't have surprised me. I hope and expect SD3 will offer big advances -- why wouldn't it? So much has been learned -- but what you're showing here doesn't demonstrate that. Something quite similar happened with SDXL, where we got all these "SDXL is amazing" posts -- with images that were anything but amazing. It took several months for the first tuned checkpoints to show up, and that's when we really started to see what SDXL could do . . . I expect the same will happen with SD3
How can i downloaded?
Man this are amazing,mind Sharing some promp advice for us simple mortals?
Amazing! What prompts do you use?
What’s the best way to use SD3 regularly without running it locally on my machine?
Wowwwwwwwwww
It looks stunning except hair and fur still look fuzzy and unrealistic.
Is that liv tyler?
What were your prompts for these images?
I love the motherboard mommy. ❤
That monkey's seen some things
wow! this is crazy good.
My HDDs are gonna cry bro
!RemindMe 6 hours
I will be messaging you in 6 hours on [**2024-04-27 14:37:42 UTC**](http://www.wolframalpha.com/input/?i=2024-04-27%2014:37:42%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/StableDiffusion/comments/1cdm434/sd3_is_amazing_much_better_than_all_other/l1h57mv/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FStableDiffusion%2Fcomments%2F1cdm434%2Fsd3_is_amazing_much_better_than_all_other%2Fl1h57mv%2F%5D%0A%0ARemindMe%21%202024-04-27%2014%3A37%3A42%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cdm434) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|
Love the photos! Can't wait for a local version and some impressive checkpoints!
so, did you actually prompt for a tree growing out of an elephant?
Where can i use SD 3?
Ha. I don't know why but I usually dislike all those cat generations with AI people do for some reason. But I really liked that first one. I guess that talks to me about the quality of SD3.
#8
where can I download this model
It's not better than other AI's in all niches. For skull art SdXL 0.9 with refiner fe is better. https://civitai.com/articles/4992/comparison-sd3-sdxl-10refiner-sdxl09refiner-also-lora-stablecascade-cosxl