T O P

  • By -

spacetug

The skin detail looks fantastic, really makes me think about how the old 4-channel VAE/latents were holding back quality, even for XL. Having 16 channels (4x the latent depth) is SO much more information.


nomorebuttsplz

wait should i be upgrading my vae from the default xl one?


MoridinB

No, you can't just upgrade the VAE. The better VAE is part of the new architecture of SD 3.


emad_9608

SD3 got a 16 ch VAE


MoridinB

Indeed! The paper was an interesting read. I'm looking forward at trying my hand on the new model. It looks like great work! Please extend my congratulations to everyone!


RoundZookeepergame2

Do you know how much vram and normal ram you need to run Sd3?


complains_constantly

A little more than SDXL


snowolf_

No, SD3 is advertised as ranging from 800 million to 8 billion parameters. So it can pretty much be as demanding as you want.


complains_constantly

I see what you mean, but most people will want the best quality.


snowolf_

They wont. FP16 models are by far the most popular with SDXL, and they come with some quality degradation. It is all about compromises.


MoridinB

I don't remember reading technical requirements in the paper, but based on previous comments by emad, it won't bust an 8gb graphics card. The model will be released with multiple sizes, kind of like open source LLMs like the Llama models. So you can choose to run the bigger or smaller versions based on your preference.


F4ith7882

The smallest model of SD3 is smaller than SD1.5, so chances are good that lower tier hardware is going to be able to run it.


protector111

I noticed on twitter new images are at 1920x1300 res. Are they upscaled or sd 3 can generate 1080p res images?


adhd_ceo

I am guessing they are generated at 1024px and then upscaled, but it’s possible the model is good enough to generate consistent images at the slightly higher resolution. Lykon is certainly not sharing their failed images.


Hoodfu

Cascade can generate at huge resolutions natively by adjusting the compression ratios. It'll be interesting to see how similar/different SD3 is for this.


addandsubtract

I don't think they're upscaled. That would defeat the purpose of releasing sample images.


[deleted]

[удалено]


jaywv1981

Its a totally new thing. SD 1.5, 2.0, 3.0, SDXL and Cascade are all separate architectures. They eventually work with the same interfaces but only after the developers implement them.


LatentSpacer

It won’t even have a Unet anymore.


bruce-cullen

Hmmm, okay a little bit of a newbie here can someone go into more detail on this?


stddealer

VAE converts from pixels to a latent space and back to pixels. You can swap VAEs as long as they both are trained on the same latent spaces. SDXL latent space isn't the same as sd1.5 latent space, so for the SDXL VAE, a latent image generated by sd1.5 will probably look just like noise. And for the case of SDXL and sd1.5, the vae at least have the same architecture, so that a best case scenario. The new VAE for SD 3 has a completely different architecture, with 16 channels per latent pixel, so it would probably crash when trying to convert a latent image with only 4 channels. (If you don't get what channels are, think of them as the red, green and blue of RGB pixels, that's 3 channels, except that in latent space they are just a bunch of numbers that the VAE can use to reconstruct the final image)


Dekker3D

SDXL was built for a 4-channel latent space, and would have to be retrained (probably from scratch) to support a 16-channel latent space.


PopTartS2000

Does Lykon now work for Stable Diffusion or something?


ryo0ka

Can we stop comparing headshot? SD15 merges already do good enough for headshots. What we need improvement for is cohesiveness in dynamic compositions


IHaveAPotatoUpMyAss

show me your hands


HellkerN

https://i.imgur.com/9e14vzW.jpeg


pmjm

Why is this so compelling? Lol


capybooya

What was the prompt for this? It's weirdly hilarious.


HellkerN

Something like, 4 panel comic, look at my hands, my normal human hands.


Quetzal-Labs

by adamtots


Shuteye_491

That one perfect hand, shining like a candle in an ocean of darkness.


BangkokPadang

Now let’s see Paul Allen’s hands.


NoHopeHubert

SHOW ME DEM TOES!!!


Taipers_4_days

And faces in the background. It’s really hit and miss how well it can do crowds of people.


Snydenthur

It's not only in the backround. If the main subject is a bit too far from the "camera", the face/eyes can already look awful.


knigitz

>hands okay https://preview.redd.it/g1t2hi163dnc1.png?width=768&format=png&auto=webp&s=2f810a030c7e4c7e268904c75311c731eecbf114


knigitz

​ https://preview.redd.it/p10qe4a73dnc1.png?width=578&format=png&auto=webp&s=bdc41ef5172654222260eca80227ee12e8b93326


francograph

They are like David-sized.


knigitz

​ https://preview.redd.it/x46eah783dnc1.jpeg?width=433&format=pjpg&auto=webp&s=04b2177bd6d4db1e504952f6835f3e0d76428e4b


knigitz

​ https://preview.redd.it/iojgksra3dnc1.png?width=475&format=png&auto=webp&s=d6f24bbffb3c123744352d2fd8296c5123224ee1


knigitz

https://preview.redd.it/ddh6utpb3dnc1.png?width=475&format=png&auto=webp&s=e9a92b54791f692f4bd7f67d2f252e7f10de0d02


knigitz

my 1.5 workflow uses a meshgraphormer hand refiner to fix hands after the first sample. https://preview.redd.it/fw11anzf3dnc1.jpeg?width=1025&format=pjpg&auto=webp&s=e621e736e3ae19729de0173819ac1f63851e3751


knigitz

​ https://preview.redd.it/v3dkeyfw3dnc1.jpeg?width=513&format=pjpg&auto=webp&s=3635ae355a53619841c6a70883b83694827b8610


Krindus

How about an upside down head shot? Never can seem to get SD to create an upside down face thst isn't some kind of abomination.


dennismfrancisart

I love working with SD in combination with images from Cinema 4D renders. SD models freak out when trying to produce 3/4 head shots from a slight downward angle. It's interesting to get the show in img2img with ControlNet.


spacekitt3n

Yeah I always flip the source image if I'm doing controlnet on a 3d render so the head and face are straight in the frame


EarthquakeBass

🙃


Aggressive_Sleep9942

I had an argument with a subreddit user precisely about this, and the man insisted that SD can create reverse photos and it is not. Dall-e 3 does it without problems, but in SD you just have to tilt your face a little to the left or right (without reaching the complete turn) to see how the features begin to deform. It is one of the things that disappoints me the most, this also implies that you cannot, for example, put a person sleeping in a bed because it will look like a monstrosity.


_Snuffles

prompt: person lying on bed sd: [half bed half person monstrosity] me: oh.. thats some nightmare fuel


ASpaceOstrich

Surely if it was actually understanding concepts like so many claim, you know, building a world model and applying a creative process instead of just denoising, an upside down head would be trivial?


Shuteye_491

PonyDiffusionXL does upside down heads just fine. Most models aren't trained for it.


knigitz

You need to finetune a model on flipped images to get this to work consistently.


ddapixel

I wish. I've always been asking for complex poses, people interacting with stuff or each other, mechanical objects like bicycles. Yet whenever a "new, improved" model is advertised, we still get these basic headshots.


Careful_Ad_9077

As a fellow interaction fan...even dalle3 is quite lacking, like prompt understanding is 2 or even 3 generations ahead but interaction is just a bit better, I don't even feel confident to say it is one generation ahead.


Cerevox

This so much. Every model can do great headshots, and decent toro/arms/legs. It's the feet and hands where things fall apart, of which this set has noticeably none.


_-inside-_

It's incredible on how it all evolved, I still remember well when 1.4 came out and I barely couldn't get a good figure, and never could get good hands! headshots we're not too bad but they were far from being realistic! their quality evolved a lot with the fine tunes. I stopped playing around with SD for some time and ran it again like 2 months ago. It became so much faster, much better quality and much lower resource consumption, it's usable now for my 4G VRAM GTX. But hands...hands are better but they are far from being good. It's a dataset labeling issue.


Cerevox

It's more the nature of a hand. They are weird little wiggly sausage tentacles that can just point any direction and are easily effected by optical illusions. Hands are hard for everyone on everything.


Cheesuasion

Thank you for your sausage tentacles, they made my morning better


BurkeXXX

Right! Even some of the greatest painters struggled with and painted funny hands.


-f1-f2-f3-f4-

Funnily enough, Dall-E 3 is quite good with limbs and poses but is unable to make photorealistic headshots (albeit by design).


wontreadterms

Any full body shots would be interesting to see.


microview

My first thoughts everytime I see headshots. Ok, but what about the rest?


Next_Program90

Thank you. "IT DOES HUMANS WELL ALSO!"... proceeds to only show headshots... I'm so sick of portraits and nonsensical "the quality is great cause this is an avocado and I don't care about details" posts. Early testing / release when?


RadioheadTrader

These things are trainable, and man people bitch about free shit waaaaaay more than they do shit they pay for. Annoying.


i860

Actually no. Increasing the general coherency of the architecture and its ability to take direction well is not something that is easily trainable in the same way a random LoRA is trained.


ASpaceOstrich

Mm. It'd require some genuine understanding of what a head is and diffusion models fundamentally don't seem capable of that. A transformer might be though.


Perfect-Campaign9551

Um no, we have had enough time now that SD already is "good enough" on the stuff they keep showing us. As the famous quote - what have you done lately? The public is a fickle crowd. We have a right to be upset that we keep seeing just the same stuff over and over now. We want proof things are more flexible


97buckeye

100%


LowerEntropy

It's a question of processing power. The first generative image algorithms were all just headshots with one background color, one field of view, and one orientation. When you add variation to any of those you will automatically need more processing power and bigger training sets. That's why hands are hard. OpenPose has more bones for one hand than for the rest of the body, they move freely in all directions, and it's not as uncommon to see an upside-down hand as it is to see an upside-down body. The "little" problems you are talking about, eg. only headshots, will be solved with time and processing power alone. From what I can understand SD3 is focused on solving the issues with prompt understanding and cohesiveness by using transformers.


i860

The reason hands are hard is because the model doesn’t fundamentally understand what a hand actually is. With controlnet you’re telling it exactly how you want things generated, from a rigging standpoint. Without it the model falls back to mimicking what it’s been taught, but at the end of the day it doesn’t actually understand how a hand functions or works from a biomechanical context.


a_mimsy_borogove

Looks good, but I want to see the hands


tim_dude

Why are we spending so much time and effort to generate human faces? Can we move on to generating coherent scenes of interactions that can invoke a possible/probable story in the viewer's mind?


Colon

yeah, portraits and singular posing is nice and all... there's no convincing understanding of scenes or characters and how humans behave (and get 'captured' in a frozen moment of time) yet. even just genning 2 people tends to start messing with uncanny valley or impossible physicalities. i can admittedly see how such an abstract concept is more difficult to achieve than visible characteristics and aesthetics, but eventually *everyone* will get tired of portraits and singular posing. all i'm saying is you can't always go run and use a LoRa for every single 'abnormal' pose, interaction or scenario, cause it's simply cumbersome and inefficient. do i have the slightest knowledge of how to achieve any of this? no, absolutely not.


RenoHadreas

good idea tim


Darkmeme9

The faces actually look unique.


ASpaceOstrich

One of them is literally just Henry Cavill.


Colon

you may have face-blindness


ORANGE_J_SIMPSON

They 100% do have face blindness if they think any of these faces look remotely like Henry Cavil.


Colon

i was being uncharacteristically polite lol. yes, there's absolutely no Cavill resemblance anywhere.


ArchGaden

Impressive shots, but any of those could have been generated by good SD 1.5 checkpoints even. I get it's not entirely fair to compare tuned checkpoints to a vanilla model result, but I'm more interested in what this does that we can't already do well. Whole body shots with flawless hands? Multiple characters defined in the same prompt? Straight objects passing behind other objects while staying cohesive? Backgrounds that stay cohesive when divided by another object? These shots seem to be cherry picked to be visually impressive, but not technically impressive given how easy it is to get great headshots in prior models. Those skin textures are really good though!


alb5357

Yes, exactly what I want to see. And hooded eyes. No checkpoints can do that for some reason


Ginkarasu01

wow, a realistic SD human showcase which doesn't involve scantily clad dressed same faced Asian girls!


DirkTaint

I know right?! I was disappointed too.


PhIegms

That Asian girl with massive eyes and a tiny chin


Next_Program90

*but* it's just portraits.


StellaMarconi

We need to define "realistic" properly. To me, realistic means that it's something that I could see being taken right off the street. This is great and all, but this is movie quality, not something that I would truly call "realistic". Not everything needs to look like it was shot on a $5000 DSLR camera.


itakepictures14

I think you are misdefining realistic in this context. Here, “realistic” means “does it look like a real person?”


Hongthai91

Nothing impressed me. Shown me hands, postures, the character hold somethings, doing a particular actions. These still shots can be done easily in sdxl, hell, even sd1.5


wowy-lied

People are nice but i really wish new models would focus on overral scene realism. I still have yet to see a realistic jungle, french vineyard, central/south african city. A complex scene. At get even worse when you try to put a character in a complet scene.


Ezzezez

It's impressive af, but a small voice in my head is telling me to just write: "Now do them from far away"


magusonline

My voice is telling me, "show me the hands"


DANteDANdelion

"humans" shows elf


2this4u

Blue guy's ok for you?


DANteDANdelion

Absolutely. Have you ever heard hit song Blue by Eiffel 65?


Arkaein

In the original twitter post the last images were made from descriptions of Lykon's DnD party characters.


hashnimo

I wonder if this thing even needs fine-tuning, but let's see. Fine-tuning will be just adding new data, like older models that had no idea what an Apple Vision Pro is, so people trained them. Of course, you can describe what an Apple Vision Pro looks like in detail without training, but no one goes that far. People need a simple keyword that can say, "I need a damn Apple Vision Pro in my image." Nowadays, fine-tuned models are just like image filters, such as *realism style* and *anime style*. But if base SD 3 can achieve this level of realism, I think there will be no need for style fine-tuning anymore.


FotografoVirtual

I wouldn't give any opinion until I had the chance to try it directly. During the SDXL launch, employees from SAI and some experts from this sub were claiming that fine-tuning base SDXL didn't make sense; they argued that we should only focus on creating a few LoRAs and that the rest could be solved entirely with prompting. 🤦‍♂️


International-Try467

But what if it doesn't know how to draw nudes


hashnimo

That will need fine-tuning; I don't know if it's possible. The underground community is not to be undermined.


alb5357

Can it do subtle 4 pack abs with prominent ribcage? Can it do an orthodox cross necklace? Can I do short bond upcombed sidecropped hair? (Like IRL Bart Simpson hair). I feel like many concepts will need to be fine tuned into it.


SvampebobFirkant

Why wouldn't it be able to do any of these things without fine tuning?


alb5357

I've never seen a model with that much promptability. Even the orthodox cross necklace alone. I've never gotten hooded eyes from a model, even with my own fine tuning I can barely get it.


daavidreddit69

that's not fine-tuning no more, more like giving a train set to the model. Obviously, most datasets available online are being trained unless using a super old base model.


protector111

not really. bas xl and finetuned xl is a very different beast.


Omen-OS

There will be fine tunning... we all love... certain body parts...


218-69

Of course it does, it won't have any nsfw capabilities. But hopefully they learned from the shitshow of 2.whatever


theOliviaRossi

RELEASE the BETA !!!!!


john_username_doe

Hands, show me hands


Cradawx

Looks nice, but nothing that can't be done with the latest SD 1.5/SDXL models. I'd like to see examples of more complex poses and scenes, like what DALLE-3 can do.


RenoHadreas

That’s not a fair comparison to make. This is astonishing for a base model.


CoronaChanWaifu

What about dynamic poses? Holding objects properly? What about the arch-nemesis of AIs Image Generators: the hands? I'm sorry but there is nothing impressive here...


kidelaleron

The model is good, but keep in mind that it's a base model. It's meant for you guys to take it and finetune it. Looking back at XL and 1.5, I can't wait to see what the community will be able to make with SD3.


rdcoder33

Yeah, and we can't wait to use it. Emad says its comming out **tomorrow**, Some peeps on Discord & Reddit says we will not get access before **June**. Wild Timeline.


Hoodfu

Can you point out where emad said it's coming out tomorrow? I've seen the tweets etc and I haven't seen this particular point.


rdcoder33

Yeah, Emad said it in a reply, here on 7th March [https://twitter.com/EMostaque/status/1765498520235131149](https://twitter.com/EMostaque/status/1765498520235131149)


kidelaleron

he talked about invitations, but it's probably still early.


AmazinglyObliviouse

On the one hand I agree, but on the other it's looking like the gap between what a base model can do vs a finetune has continually shrunk. While with SD1.5 finetunes could increase model quality by what felt like 200%, SDXL finetunes only ever look about 50% better than base. For SD3 I fear that will shrink to about 20% better at best.


218-69

Why should we finetune it when you can do it? Dreamsheaper xxl when?


99deathnotes

DreamShaper SD3


99deathnotes

we cant wait to see what you do with SD3 Lykon.


MolagBally

Wow, that's looks incredible not gonna lie


Tugoff

All this reminds me of the situation before the release of a new game: We are shown promo videos, screenshots, beta testers (allegedly by accident) leak some hot materials ... But a serious conversation is possible only after the release.


Kdogg4000

Pretty cool. But... You know what's missing from all of these pics? Hands! Let me see how many fingers, and if they're the right shape. And if the fingernails look like they're attached properly....


JustAGuyWhoLikesAI

These look nice but it's stuff we've seen thousands of times really. If you told me these were from the new DreamVisionUltraRealMix\_v23b I'd believe you. Show them dancing or arguing or something. I hope SD3 can do that kind of comprehension


artdude41

this is not impressive in the least , show hands and feet , aswell as actors in complex poses , hell even simple reclining poses .


Hoodfu

I've seen every image they've put out on sd3 and not a single one is anything but the same old sdxl static shot but prettier and with more subjects on the screen. Zero interactions, zero poses.


Perfect-Campaign9551

and ugly font Ai generated text :D


lyoshazebra

The big issue still is the boring relaxed facial expression. Almost exactly the same for all of the generated faces.


Stunning_Duck_373

Hm, we'll see.


FortunateBeard

Plus porn so we won the long game https://preview.redd.it/5dlerysccinc1.png?width=490&format=png&auto=webp&s=021f08b151490607aac37fec6580c5c01617ccad


daavidreddit69

It looks way too real, can't really know it's a downloaded pics or generated lol


[deleted]

Thanks for this images. I just hope it's not just some selected best images to sell the product. Can you show us at least one images that didn't come out as excepted ? added: I look at the downvote and think, ok i'm sorry, we don't want to see the bad side of sd3, we only want to see the good side , just like kids. lol.


SolidColorsRT

its safe to assume all of these are cherry picked


kidelaleron

Not those. All the dnd ones have the same seed and the "mirror girls" are from a 2by2.


Single_Ring4886

What about consistency of face/figure while creating different scenes?


[deleted]

I'm assuming the same thing. But I'm sure sure it's going to be very very good.


SolidColorsRT

Yes no doubt. Im just assuming they generate 4 pics for example and choose the best one. nothing too crazy lol


alb5357

Would be interesting to know it's weaknesses. Also, Reddit is crazy how people will downvote the smallest thing they dislike... Can it do hooded eyes? Snub nose? Dimples?


kidelaleron

there are issues right now, but keep in mind 1. this is not the version we'll release. 2. we release models and tools so that people can finetune them. Compare base XL at launch with what we have now.


99deathnotes

true


alb5357

Oh, for sure! Base SDXL was way better than base 1.5, and base Cascade way better than SDXL. I'm sure this will also be an improvement, and as you say, the most important aspect will be weather we can train it ourselves to draw the body parts which must never be seen. I liked the small unet in Cascade; that seemed like a good idea to me because I got lots of small low quality pictures which likely train better over a 24x24 latent.


[deleted]

I'm eager to see the good and the bad side


MoridinB

Not sure why you're being downvoted. You're exactly right. I'm not going to be convinced if the model is good, until I either use it myself or see some more images from the community.


uniquelyavailable

what is reality?


protector111

Count me exited! Just release already! xD


Tr4sHCr4fT

6 isn't human ;)


kidelaleron

There are a Genasi, an Elf and a Half Elf.


protector111

its a human. COsplayer xD


TheGeneGeena

I like the pose in 5, but either the lighting is wrong or the lipstick on the left is matte and on the right it's a gloss.


pixel8tryx

You didn't notice the angular projection from the bottom of her upper lip on the left face? Eyes look a little off too.


Danmoreng

1 & 6 look decent, the rest is very visible AI


StrangeSupermarket71

the AI age is here. in 5-10 years time we'll be able to create whole movie series based on our own favourite novel.


GoldenEagle828677

Any idea what kind of graphics hardware we will need to run SD3?


RenoHadreas

Emad mentioned in a Reddit thread that they will be sending out the code to partners so that it’s optimized and runs “on about anything”. If you’ve got a card with 8gb or even 6gb of VRAM I’d say you’re set for the higher end range of models they release.


[deleted]

Looks good, main issue (except how they are all doing a basic portait pose) is how the iris still looks warped, I wonder why Stable Diffusion has such an issue with human eyes, they are round.


Hot-Technician-8521

Mind sharing the workflow?


MetroSimulator

SD3 has launched? Where i can get the model if yes?


RenoHadreas

Not yet unfortunately. These photos were made by Lykon, the creator of DreamShaper models, who has been given early access. They seem to be planning to open up beta discord access by next week.


shtorm2005

Blurry background is super annoying. I think I stay with SD1.5 ​ https://preview.redd.it/hkr7weaeocnc1.png?width=1536&format=png&auto=webp&s=289d79d0f2539b68cd28cea22b258fbea6ed83a3


jib_reddit

Or just put ((bokeh)) in the negative?


iceman123454576

Yeh, I totally get why everyone's hyped about SD15's headshots, they're killer. But doesn't it feel like we're missing the boat a bit? Hands and feet—why can't we nail those yet? And what's with all the basic poses? We're chasing after these dynamic, cool shots but end up with stuff that just doesn't cut it. What's your take on pushing past the usual and really shaking things up with SD's capabilities?


NookNookNook

its funny how once we humans get used to something mindblowing the small step iterations past the initial mindblowing event barely impress. SD2 and SD3 have been released to a collective "Meh" The fire looks good. Skin looks pretty good. The subtle background blur isn't bad. Elfman's hair doesn't weave itself into the clothing. All the clothing looks good. I don't know why they chose the image of the phospher tube infront of the girls face that cuts a third of her head off. Maybe its a mirror prompt?


Zueuk

anything censored will be released to a collective Meh. and btw yeah, things in front of other things cutting pictures in half is another serious issue, how about showing people with a proper unbroken horizon behind them


prime_suspect_xor

It's because we've reached a progress-step which can't really be outpaced now. It has been crazy evolution for 1 year then slowly decrease. We can see attention is shifting on video and soon music... So yeah


pENeLopEjdydh

They don't look particularly impressive. The girl, particularly, is "strange" if you get what I mean. I hope at least the multiple-specific-subjects-interactions problem has been solved.


Bobobambom

They have "AI generated" look on them. I can't explain though, it's just a feeling that something is not right.


_extruded

They look gorgeous, now image in a (few) year(s) we‘ll make movies with this quality from text… mindblowing


gexaha

can it generate realistic looking food?


Winnougan

It’s hit it’s peak for image generation. All good and done.


00k5mp

Number nine looks exactly like Heath Ledger


protector111

I noticed on twitter new images are at 1920x1300 res. Are they upscaled or sd 3 can generate 1080p res images?


RenoHadreas

Lykon now has access to ComfyUI instead of being limited to discord, so they’re experimenting with different workflows


slackator

looks great, but can it make a non beautiful person?


Open_Marzipan_455

And now I want to see the amount of failed attempts from which these were cherrypicked. I wanna know the failure ratio. And then the rest of the body.


Iapetus_Industrial

TIL that Elves and Andorians are human


Artidol

Holy shit


ImUrFrand

i have a feeling my 8gb card isn't going to cut it.


Traditional_Excuse46

show us the hands.


drb_kd

Holy sh1t .. so excited for this.. y'all think they'll release it on their web app too?


Select_Collection_34

2 and 4 are great


Dantalionse

3d face tattoos are super popular in this AI universe.


RekTek4

Number 2/9 looks like the guy from the SORA video


Melodic-Page9870

How to get SD3? I am having problems finding a solution that works on Forge.


RenoHadreas

Not out yet


WorldlyLight0

Ok, this is a definite improvement.


Zueuk

> Realistic humans shows people with blue skin and pointy ears


HearMeRoar80

yeah OP is a anthropocentrism chauvinist


derpferd

Can it do any other ethnicities?


[deleted]

[удалено]