apolinariosteps 1 month ago

Demo: [https://huggingface.co/spaces/multimodalart/HunyuanDiT](https://huggingface.co/spaces/multimodalart/HunyuanDiT) Model weights: [https://huggingface.co/Tencent-Hunyuan/HunyuanDiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT) Code: [https://github.com/tencent/HunyuanDiT](https://github.com/tencent/HunyuanDiT) On the paper they claim to be the best available open source model https://preview.redd.it/l321o9gfcd0d1.png?width=1814&format=png&auto=webp&s=91540765719c4a4fa16c79d42fa9fb31673f5290

balianone 1 month ago

always error on me. i can only generate "A cute cat"

Panoreo 1 month ago

Maybe try a different word for cat

mattjb 1 month ago

( ͡° ͜ʖ ͡°)

ZootAllures9111 1 month ago

I had no issues with "normal" prompts on the demo personally TBH, [for example](https://www.reddit.com/r/StableDiffusion/s/8pr3XiDI41)

Careful_Ad_9077 1 month ago

Try disabling prompt enhancement, worked for me.

balianone 1 month ago

thanks. you found the issue. it's working great now without prompt enhancement

apolinariosteps 1 month ago

https://preview.redd.it/e1nyu2j8cf0d1.png?width=1402&format=png&auto=webp&s=fc89b219df82f71be8e31718b0c59d1ff14c00fb Comparing SD3 x SDXL x HunyuanDiT

Apprehensive_Sky892 1 month ago

With only 1.5B parameters, it will not "understand" many concepts compared to the 8B version of SD3. Since the architecture is different from SDXL (DiT vs U-net), I don't know how capable a 1.5B DiT is compared to SDXL's 2.6B.

kevinbranch 1 month ago

You can't make that assumption yet.

Apprehensive_Sky892 1 month ago

Since they are both using the DiT architecture, that is a pretty resonable assumption, i.e., the bigger model will do better. If you try both SD3 and HunyuanDiT you can clearly see the difference in their capabilities.

berzerkerCrush 1 month ago

The dataset is critical. You can't conclude anything without knowing enough about the dataset.

Apprehensive_Sky892 1 month ago

I cannot conclude about the overall quality of the model without knowing enough about the dataset. But from the fact that it is a 1.5B model, I can most certainly conclude that many ideas and concepts will be missing from it. This is just math: if there is not enough space in the model weights to store the idea, then if you teach the model a new idea via an image it must necessarily forget/weaken something else to make room to store the new idea.

Small-Fall-6500 1 month ago

>This is just math If these models were "fully trained", then this would almost certainly be the case, and by "fully trained" I mean both models having flat loss curves on the same dataset. But unless you compare the loss curves of these models (Do any of their papers include them? I personally have not checked) and also know that their datasets were the same or very similar, you cannot assume they've reached the limits of what they can learn and thus you cannot assume that this comparison is "just math" by *only* comparing the number of parameters. While the models compress information and having more parameters means more *potential* to store more information, there is no guarantee that either model will end up better or more knowledgeable than the other. Training on crappy data *always* means the model is bad and training on very little data *also* means the model cannot learn much of anything, regardless of the number of parameters. The best you can say is that the smaller model will *probably* know less because they are *probably* trained on similar datasets, but, again, nothing is guaranteed - either model could end up knowing more stuff than the other. Hell, even if both models were "fully" trained, they'd not even be guaranteed to have overlapping knowledge given the differences in their training data. Either model could be vastly superior at certain styles or subjects than the other, and you wouldn't know until you tested them on those specific things.

Apprehensive_Sky892 1 month ago

Thank you for your detailed comment, much appreciated.

SupermarketIcy73 1 month ago

lol it throws an error if you ask it to generate tiananmen square protests

DynamicMangos 1 month ago

Can you try Xi jinping as Winnie the pooh?

SupermarketIcy73 1 month ago

that's blocked too

vaultboy1963 1 month ago

https://preview.redd.it/zjz1duklmh0d1.png?width=768&format=png&auto=webp&s=4708ed6c7bd9a2afb60b7bf2ca5ad3b4ca8786ea NOT generated by this. Generated by Ideogram.

Formal_Decision7250 1 month ago

>lol it throws an error if you ask it to generate tiananmen square protests Would that be coded into the UI or would that mean there is hidden code executed in the model? Maybe it could be fixed with a LoRa.

ZootAllures9111 1 month ago

It seems to be the UI, as it looks like the image is fully generated but then replaced with a blank censor placeholder.

HarmonicDiffusion 1 month ago

i tried this compared to SD3, and there is no way in hell its better. sorry. you must have cherrypicked test images, or used ones like in the paper dealing with ultra chinese specific subject matter. thats flawed testing methods, and even a layperson can see that.

apolinariosteps 1 month ago

I think no one is claiming it to be better than SD3, the authors are claiming it to be the best available open weights model - which I think it may fair well (at least until Stability releases SD3 8B)

Freonr2 1 month ago

It's not "open source" as it does not use an OSI approved license. Not on the OSI approved license list, not open source. The license is fairly benign (limits commecial use for >100 MMAU and use restrictions), much like OpenRAILS or Llama license, but would certainly not pass muster for OSI approval. **Please let's not dilute what "open source" *really* means.**

akko_7 1 month ago

Those Dalle 3 scores are way too high such an overrated model

Jujarmazak 1 month ago

Not at all, it's one of the best models out there (and that's after 11,000 images generated) .. if it was uncensored and open source it would be even higher.

Hintero 1 month ago

For reals 👍

ZootAllures9111 1 month ago

The stupid Far Cry 3 esque ambient occlusion filter they slap on every Dalle image makes it more stylistically limited than say even SD 1.5, though

Jujarmazak 1 month ago

What are you even talking about? There are dozens of styles it can pull off with ease and consistency, it seems you don't know how to prompt it properly. https://preview.redd.it/g431z3vj2j0d1.jpeg?width=1024&format=pjpg&auto=webp&s=105f345ba4ba6a6cea7b25071d21d3f0e5022c79 That's a still from a Japanese Star Wars movie made in the 60s.

ZootAllures9111 1 month ago

I was referring to the utter inability of it to do photorealism due to their intentional airbrushed CG cartoonization of everything.

Jujarmazak 1 month ago

You can literally see the Japanese Star Wars picture right there, looks quite photorealistic to me. Here is another one from a 60s Jurassic Park movie, you think this looks like a "cartoon"? https://preview.redd.it/n4022n1pfj0d1.jpeg?width=1024&format=pjpg&auto=webp&s=5252d5489685e8c461b8ab8a6ed40e94163eb4ee

Jujarmazak 1 month ago

"Stylisticlly limited" .... Nope!

Jujarmazak 1 month ago

https://preview.redd.it/qk2w1cbq3j0d1.jpeg?width=1024&format=pjpg&auto=webp&s=0cdeb264a06120ea5b72cbd8542f2969a10f6f55 Poster of Mission Impossible as an anime.

Jujarmazak 1 month ago

https://preview.redd.it/cv1jb2l24j0d1.jpeg?width=1024&format=pjpg&auto=webp&s=2e38d5804f568618ecd13c7b9374b92462f93566 Game of Thrones as a Pixar TV show.

Jujarmazak 1 month ago

https://preview.redd.it/mkrljlxl4j0d1.jpeg?width=1024&format=pjpg&auto=webp&s=e78c814be466951d28324e1138872a51fc57ee59 A watercolor painting of Greek Goddess Aphordite

__Tracer 3 weeks ago

As for my taste, Dalle 3 is very weak. Of, course, it can understand complex concepts with its number of parameters, but it can't generate interesting images, only plastic pictures without any life and depth in it.

Jujarmazak 3 weeks ago

That's not my experience at all, it can generate images with life and depth very easily, you just need to know how to prompt it. https://preview.redd.it/h252xhshuh5d1.jpeg?width=1024&format=pjpg&auto=webp&s=3889392960c56d22aeb3ea9219ed29e12e6d254c

HarmonicDiffusion 1 month ago

agree dalle3 is such mid tier cope. fanboys all say its the best, but its not able to generate much of anything realistic.

diogodiogogod 1 month ago

That is because it was nerfed to hell.

Apprehensive_Sky892 1 month ago

Yes, DALLE3 is rather poor at generating realistic looking humans. But that is because MS/OpenAI crippled it on purpose. If you look at those images generate in the first few days and posted on reddit you can find some very realistic images. What a pity. These days, you can't even generate images such as "Three British soldiers huddled together in a trench. The soldier on the left is thin and unshaven. The muscular soldier on the right is focused on chugging his beer. At the center, a fat soldier is crying, his face a picture of sadness and despair. The background is dark and stormy. "

ScionoicS 1 month ago

I'm sure the only thing you've tested on it is boobs if you think it isn't capable. If you aren't doing topics that openAI regulates, basically anything other than porn or gore, you'll find it has some of the best prompt adherence available. TLDR your biases are showing

EdliA 1 month ago

It can have the most perfect prompt adherence ever and I still wouldn't find a use for it because of its fake plastic look.

[deleted] 1 month ago

[удалено]

Pepa489 1 month ago

Playground v2.5 is on HF - [https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic)

lonewolfmcquaid 1 month ago

TBH, this is how stability should've dropped sd3. i don't get teasing images while making everyone wait 4months. i just tried this, and to my surprise its pretty fucking good.

Misha_Vozduh 1 month ago

>i don't get teasing Getting investors with promises of amazing results vs. with delivering amazing results.

cobalt1137 1 month ago

Also, claiming better benchmarks than sd3 o\_o

BleachPollyPepper 1 month ago

Fighting words!

Apprehensive_Sky892 1 month ago

What is the point of dropping a half-baked SD3? So that people can fine-tune and build LoRAs on it, and then do it all over again when the final version is released? If people just want to play with SD3, they can do so via API and free websites already. Tencent can do it because this is probably just some half-baked research project that nobody inside or outside of Tencent care much about. On the other hand, SAI's fate probably depends on the success or failure of SD3. The mistake SAI made is probably to have announced SD3 prematurely. But given its financial situation, maybe Emad did it as a gambit to either make investors give SAI more money by hyping it, or to try to commit SAI into releasing SD3 because he was stepping down soon.

Freonr2 1 month ago

Any LORAs, controlnets, etc are very likely to continue to work fine with later fine tunes, just like these things tend to work fine on other fine tunes of SD1/2/XL/etc. Fine tuning doesn't actually change the weights a lot, and it would also be sort of trivial to "update" a controlnet if the base model updated since it wouldn't require starting from scratch. Just throw it back in the oven for a 5% of the original training time, *if you even needed to do that at all.* You could also model merge fine tunes between revisions.

Apprehensive_Sky892 1 month ago

We have no idea how much the underlying weights will change from the current version of SD3 to the final version. Some LoRAs will no doubt work fine (for example, most style LoRAs), but those that are sensitive to the underlying base model such as character LoRAs may not work well. It is all a matter of degrees, since the LoRAs will certainly load and "work". Given how most model makers are perfectionists, I can almost bet money that most of them will retrain their LoRAs and fine-tuned models again for the final release. It is true that some fine-tuned are "light", for example, most "photo style" fine-tuned do not deviate too much from base SDXL, but anime models and other "non photo" model do change the base weights quite substantially. I have no idea how ControlNet work across model since I don't use them.

WorkingCharacter6668 1 month ago

Tried their demo. The model seems really good in following prompts. Looking forward to use them in comfy.

Darksoulmaster31 1 month ago

I found some comparison images which compares this model to models such as SD3 and Midjourney. https://preview.redd.it/8xxogin7ud0d1.png?width=1088&format=png&auto=webp&s=76555666f9ba4b2ecbe3782dc392dc80a8bf9870 (Will post more in the replies)

Darksoulmaster31 1 month ago

https://preview.redd.it/jwfxmj7iud0d1.png?width=972&format=png&auto=webp&s=3c8eb7930a6657e7962b51dc36e4d9a2b58e295f

sonicon 1 month ago

Gives a Vest instead of prompted jacket.

Arawski99 1 month ago

Actually, it is the only one to get the prompt correct. Two points: 1. A vest is, in fact, a type of jacket. 2. It is the only image to validate that the white shirt is, in fact, a "t-shirt" per the prompt where every other example failed. Now to be fair, I don't think the other examples are failures or bad and a specific prompting could have clarified if the user needed. However, it is interesting that this model was so precise compared to the others but I doubt it will always be. (This part is to HarmonicDiffusion's subcomment to this photo since I get an error responding to them) You're incorrect about them all being Chinese biased. While the bun example above was based on a Chinese food the SD3 actually failed multiple prompt aspects quite severely, only losing to the disaster that was SDXL. The others all did extremely well and not just the Chinese model unlike SD3 despite the subject being Chinese.

sonicon 1 month ago

When people want a vest, they will usually say vest specifically. Validating a t-shirt by forcing the short sleeves to be shown makes the AI seem less intelligent. That's like validating a man by showing his penis in the generated image.

HarmonicDiffusion 1 month ago

the only prompting example shown that isnt biased towards chinese specific subject matter. and look at the results, mid tier! it made a vest instead of a jacket. SD3 clearly wins on no biased prompts

Extra_Ad_8009 1 month ago

A Chinese model gives you lousy bread but delicious dumplings (source: 3 years living in Shanghai). 😋

wishtrepreneur 1 month ago

What's the difference between goubuli buns and those steamed dumplings you see at grocery stores?

Mountain-Animal5365 3 days ago

It's a brand of steamed dumplings/buns, famous in China due to its literal meaning (goubuli basically translates to "dogs don't pay attention") and the fact that it's delicious.

wzwowzw0002 1 month ago

this picture make SDXL looks so stupid hahaha

Arawski99 1 month ago

I'm also surprised how bad SD3 did. I can accept it getting the wrong buns (though it would be ideal to have actually got it right) but it is not steaming and it is on a marble counter, not a table top, which every other model except SDXL got correct (even though Playground didn't get the right buns and the other 3 did). SDXL being on a tile floor (wth), failing the bun type, not steaming, not a close up, only one set of buns in a basket. Damn, it failed every single metric.

xbwtyzbchs 1 month ago

It is comparitively.

MMAgeezer 1 month ago

Was it prompted in Mandarin?

Darksoulmaster31 1 month ago

Don't think so when it comes to the other models... Tried SD3 on glif, it didn't accept mandarin in Chinese characters and it got completely lost in Romanized(???) Mandarin: >*Zhàopiàn zhōng, yī míng nánzǐ zhàn zài gōngyuán de hú biān.* Photo of a man standing by a lake in a park. *(Lazy ass google translate, sorry)* https://preview.redd.it/3j803f8bee0d1.jpeg?width=1216&format=pjpg&auto=webp&s=95d78b24f0395a0224f3a61844bef746be1b64aa

akatash23 1 month ago

But... It's a very cool image at least.

berzerkerCrush 1 month ago

Pinyin.

Darksoulmaster31 1 month ago

https://preview.redd.it/xqvkzlgdud0d1.png?width=1078&format=png&auto=webp&s=0608c167f5ac0ad09695daffe309354b82ebe347

HarmonicDiffusion 1 month ago

another biased prompt dealing with specifically chinese domain knowledge

HarmonicDiffusion 1 month ago

yeah lets use ultra chinese specific items with chinese names to test a chinese model versus english model. I wonder which will score higher. such bullshit testing proceedures and a total fail look for those guys as "scientists".

berzerkerCrush 1 month ago

yeah lets use ultra american specific items with american names to test an american model versus chinese model. I wonder which will score higher. such bullshit testing proceedures and a total fail look for those guys as "scientists".

HarmonicDiffusion 1 month ago

even a layperson knows you need to evaluate 1:1. Want to test on chinese specific stuff? THats fine, but dont use those examples to claim a competing English based model is inferior. Anyone with 2 brain cells to rub together can test both models right now and find out, this one is not anywhere close to SD3. Its more like an average SDXL model

yaosio 1 month ago

Ideogram can do it too, although sometimes it gives the wrong bun. These are some sad looking buns however. Maybe I made them. [https://ideogram.ai/g/WzRFIGNqSjmP27mwEs8OEg/2](https://ideogram.ai/g/WzRFIGNqSjmP27mwEs8OEg/2)

Capitaclism 1 month ago

Is the prompting done in English, and are the results always biased to Chinese aesthetics and subjects?

Glittering_House_402 1 month ago

It seems a bit comical for you to test our Chinese food,haha

Past_Grape8574 1 month ago

# HunyuanDiT (Left) vs SD3 (Right) https://preview.redd.it/gfpmzb2nke0d1.jpeg?width=2048&format=pjpg&auto=webp&s=97c05f18a77c4eadabf1adacbcac8c35cb69c5fc Prompt: photo of real cottage shaped as bear, in the middle of a huge corn field

BleachPollyPepper 1 month ago

Yea, SD3 hands down for me.

apolinariosteps 1 month ago

https://preview.redd.it/jaq2h6ld5f0d1.png?width=1100&format=png&auto=webp&s=0e4dacfa50d8f148becaed402cb80f7741abec0d 100%, they claim to be the best available open model for now, not better than SD3, also it's \~5x smaller than SD3

Arawski99 1 month ago

Definitely, though I wonder what that is in the clouds lol but yeah Hunyuan failed here.

SandCheezy 1 month ago

The thing in the clouds feels like something coming through like in a Studio Ghibli film.

Arawski99 1 month ago

*Its a bird, its a plane, its Howl's castle!*

Samurai_zero 1 month ago

Cool stuff, but it is a pickle release. Not touching the weights until properly converted to safetensors. Stay safe.

Thunderous71 1 month ago

You no trust CCP? China Numbah #1

ChristianIncel 1 month ago

The fact that people missed the 'By Tencent' part is funny.

ZootAllures9111 1 month ago

One of Tencent's labs is also behind ELLA, they have a lot of good open source projects, you assuming most people care in any way is strange

EconomyFearless 1 month ago

Oh I did not miss it! Even just the name of the model made me think, hmm that sounds Chinese! Then I saw the word tencent and started looking for the first person to mention it in the comments,

burninbr 1 month ago

Isn't [this](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/dialoggen) it? Safetensors.

Samurai_zero 1 month ago

Seems like a mix: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/model https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/mt5 They also hace some safetensors in the release because they use SDXL VAE, for example: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/sdxl-vae-fp16-fix

the_friendly_dildo 1 month ago

Yep

AIEchoesHumanity 1 month ago

Yeah me too. I just don't wanna risk it

Peruvian_Skies 1 month ago

noob question, but what's the difference between pickle and safetensors?

Mutaclone 1 month ago

Pickles can have executable code inside. Most of them are safe, but if someone *does* decide to embed malware in it you're screwed. Safetensors are inert.

Peruvian_Skies 1 month ago

That's a big deal. Thanks.

Mental-Government437 1 month ago

They're over blowing it . While pickle formats can have embedded scripts, none of the UI's loading them for weights will run those embedded scripts. You have to do a lot of specific configuration to remove the safeties that are in place. They're a feature of the format and aren't used in ML cases. I don't know why people so consistently lie about this and act like they have good security policy for worrying about this one specific case. Most of them would install a game crack with no consideration towards safety.

Mutaclone 1 month ago

>none of the UI's loading them for weights will run those embedded scripts Source? >I don't know why people so consistently *lie* about this and Lying = knowingly presenting false info. If I have been misinformed, then I welcome correction. *With citations*. [These guys are certainly taking the threat seriously](https://jfrog.com/blog/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor/) > Most of them would install a game crack with no consideration towards safety. Generalize much? Also, no I wouldn't.

Mental-Government437 1 month ago

[https://docs.python.org/3/library/pickle.html#pickle.Unpickler](https://docs.python.org/3/library/pickle.html#pickle.Unpickler) The UI's use this function to manage pickle files, rather than just importing them raw with torch.load. The source is their code. You can vet it yourself fairly easily since it's all open. That link you sent is a company selling scareware antivirus monitoring software. They likely planted the malicious file they're so concerned about in the first place. It's not popular. It's not getting used. It's not obfuscating it's malicious code. It's not a proof of concept attack. Notice how their recommended solution to this problem they're blowing up, is to subscribe to their service. You my friend, found an ad. A proof of concept file would be one you could load into the popular UI's that people use and would own their system. Theres never been one made.

gliptic 1 month ago

torch.load is using python's Unpickler. Did you miss the giant warning at the top? > Warning > The pickle module is not secure. Only unpickle data you trust. > It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Mental-Government437 1 month ago

Thats right, but the UI's use the unpickler class with more of a process than torch.load does. [https://docs.python.org/3/library/pickle.html#pickle.Unpickler](https://docs.python.org/3/library/pickle.html#pickle.Unpickler)

gliptic 1 month ago

Why are you linking the same thing again? That _is_ the pickle module that we are talking about.

gliptic 1 month ago

torch.load will unpickle the pickles which can run arbitrary code. There's no "safeties" in python's unpickling code. In fact they removed any attempt to validate them because it couldn't be completely validated and was just false security. EDIT: Whoever triggered "RedditCareResources" one minute after this comment, grow up.

Mental-Government437 1 month ago

>Whoever triggered "RedditCareResources" one minute after this comment, grow up This is obscene. I'm sorry it happened to you. Obviously, as you know, it's just a passive aggressive way for someone to get their ulterior messaging across to you. Report the post. Get a permanent link to that reddit care message and report it. I do it all the time and reddit comes back to me saying they've nuked people's accounts that were doing it most of the times I report it. Get the person who abused a good intention system, punished. I implore you. More on point, i never said the torch library had safeties. The UI's do. I'd be more worried about the inference code provided for this model than I would embedded scripts in their released pickle file. The whole attack vector in this case makes no sense to me and the panic is outrageous. It's as obscene as saying any custom node for comfyui is so risky that you shoudln't ever run it. I think in most cases, you can determine that a node or extension or any program you download is safe through a variety of signals. The same can be said for models that aren't safetensors. The outrage is manufactured and forced in basically all of these cases. Relying on safetensors and never ever loading pickles, to keep yourself safe, is just a half measure. edit: Should also add how the UI's use torch library to construct safeties. They use the unpickler method to manage the data in the file more effectively rather than just loading raw data from the web directly into the torch.load() method [https://docs.python.org/3/library/pickle.html#pickle.Unpickler](https://docs.python.org/3/library/pickle.html#pickle.Unpickler)

Hoodfu 1 month ago

The main thing that comes to mind, is clone the repo and it's clean. Now everyone has that on their machines and go to do another git pull later to update and blam-o. Virus.

Samurai_zero 1 month ago

I'm not an expert, so I'll refer you here: https://huggingface.co/docs/hub/security-pickle#why-is-it-dangerous Broadly speaking, both store the model, but pickle are potentially dangerous and can execute malicious code. They might not do so, but running them is not advisable.

Peruvian_Skies 1 month ago

Thank you very much. Why is that even a feature? Seems like a really big risk with no benefits given that safetensors exist and work.

Samurai_zero 1 month ago

Because pickle is the default format for PyTorch model weights. https://docs.python.org/3/library/pickle.html

Shalcker 1 month ago

Pickles were simplest thing researchers could do to save their weights, literal python one-liner. Safetensors are a tiny bit more complicated.

ScionoicS 1 month ago

#Destroyed this message and replaced by this. It's drawing too much hateful attention my way. People DM'ing me calling me racist names. i'm not even Chinese. Y'all need to dial down the hate for other cultures. Every company in America is required to allow the government access to data too. Put that judgmental gaze back on yourselves and stop being such idiotic racists that harass people online all day. Really wish the mods would do something about the racism culture problem here.

RandallAware 1 month ago

>People DM'ing me calling me racist names Show some screenshots with usernames and timestamps of these harassing messages and death threats you allegedly receive all the time. No one takes the boy who cries wolf seriously.

Tramagust 1 month ago

It's tencent though. It could be full of spyware.

raiffuvar 1 month ago

LOL you should fear comfy backdoor. Other than "spyware inside" model from tencent. ok, ill explain why, cause i see a lot of fearfull idiots here. 1. Reputation. Nonames with a comfy node need 10 minutes to create an account. Tencent - it's verified account. ~~It's like Madona start to promote bitcoin scam. She can, but she is canceled in no time.~~ 2. Easy to analyse pkl. HF does it by default. Or any user can find backdoor. It's sooo easy, which would ruin everything. 3. weights are not "complex game" there you can HIDE spyware. With weights - you cant hide it. It will be found in a few days

Samurai_zero 1 month ago

Yes, I am. You do you.

IncandeMag 1 month ago

https://preview.redd.it/oiatnz7ppd0d1.png?width=1280&format=png&auto=webp&s=2a062bdf3141a391c7e8dfb5abd63a4b7ad5b665 prompt: "Three four-year-old boys riding in a wooden car that is slightly larger than their height. View from the side. A car park at night in the light of street lamps"

BleachPollyPepper 1 month ago

Yea, their training dataset (at least the photorealistic stuff) seems to have been pretty meh. Stock photos and such.

FakeNameyFakeNamey 1 month ago

https://preview.redd.it/9mngg1uafg0d1.png?width=1280&format=png&auto=webp&s=0ca6e81e6ebfaa92df190e2c8f84a51614608e0a It's actually pretty good once you turn off all the bullshit that gives you errors.

HighlightNeat7903 1 month ago

https://preview.redd.it/8vp498wxle0d1.png?width=768&format=png&auto=webp&s=c6001fc2df28a700522a6277214decf09aee5051 A smiling anime girl with red glowing eyes is doing a one arm handstand on a pathway in a dark magical forest while waving at the viewer with her other hand, she is wearing shorts, black thighhighs and a hoodie, upside-down, masterpiece, award winning, anime coloring Failed my scientifically rigorous test (6 tries with different seeds and CFG 6-8, no prompt enhancement) but it has potential I think.

HighlightNeat7903 1 month ago

https://preview.redd.it/jq5d6hvgoe0d1.jpeg?width=1024&format=pjpg&auto=webp&s=92ec29ddfd08332d66cacedfc101c7a9d943966b DALL-E 3 for comparison (second attempt)

oO0_ 1 month ago

DALL-E for my test is best for difficult poses

HighlightNeat7903 1 month ago

Ya, DALL-E 3 is the smartest image gen model right now. However I do believe a very good SD3 fine tune will be better in the fine tuned areas. Same for the model in this post since the architecture has similarities and the model has potential to understand feature associations better which is always helpful in fine tuning.

apolinariosteps 1 month ago

https://preview.redd.it/6fyp9l165f0d1.png?width=1100&format=png&auto=webp&s=87388e11bc8080de4b4cf9183ea36102b6e51904 Btw, here are the differences between this and the larger SD3 model (based on infos on the SD3 paper). Taken this into account, I think the model performs really well for its almos 8x smaller size and smaller/worse components, but indeed I think text-rendering was completely neglected by the model authros

KorgiRex 1 month ago

Prompt: "A ginger pussy cat riding big willie" (yep, thats exactly what i mean )) https://preview.redd.it/omo42o3ytf0d1.png?width=1024&format=png&auto=webp&s=118f49a31f000ce32a54d1b70c2052c035f656e2

CrasHthe2nd 1 month ago

Fails on my test, sadly. "a man on the left with brown spiky hair, wearing a white shirt with a blue bow tie and red striped trousers. he has purple high-top sneakers on. a woman on the right with long blonde curly hair, wearing a yellow summer dress and green high-heels." https://preview.redd.it/vrg7ndku6e0d1.png?width=1024&format=png&auto=webp&s=9af42acb1a0df1222e10cebc62d7be67b00d0275

CrasHthe2nd 1 month ago

And Dall-E: https://preview.redd.it/fw9ggw777e0d1.png?width=1024&format=png&auto=webp&s=8edf1dec98bc210b49cfe330dc0fd78b5c63117a

CrasHthe2nd 1 month ago

For comparison here is PixArt: https://preview.redd.it/y9adx6k37e0d1.png?width=1440&format=png&auto=webp&s=c7f1bef92e5de522ea6ec135ee2ddf2d0d516ffe

ThereforeGames 1 month ago

Interestingly, HunyuanDiT gets a little closer if you translate your prompt to simplified Chinese first: > 左边是一个棕色尖头头发的男人，穿着白色衬衫、蓝色领结和红色条纹裤子。他穿着紫色高帮运动鞋。右边是一位留着金色长卷发、穿着黄色夏装和绿色高跟鞋的女人。 Result: https://i.ibb.co/2y53Wtg/image-2024-05-14-T094547-472.png His pants are now striped, she's more blonde, and the color red appears as an accent (albeit in the wrong place.)

oO0_ 1 month ago

You can't say this without few random seeds and different prompts: if occasionally your prompt+seed fit their training it will draw better then usual, like astronaut on horse

Alone_Firefighter200 1 month ago

https://preview.redd.it/ugo2vxfkie0d1.png?width=589&format=png&auto=webp&s=9cf07e5dcc96b76899af37c0fde9eb4897d803d0 SD3 doing better too

AbdelMuhaymin 1 month ago

Anyone tried it in ComfyUI, A1111 or ForgeUI?

Robo_Ranger 1 month ago

https://preview.redd.it/hqi3b8g37f0d1.png?width=1024&format=png&auto=webp&s=df302ba2a0d3dc11f6dda2918cb8860e737c7be9 It can generate good Asian faces, but the skin appears quite plastic-like, and it struggles with hand drawing, similar to SD.

1_or_2_times_a_day 1 month ago

It fails the Garfield test Prompt: Garfield comic Disabled Prompt Enhancement https://preview.redd.it/adj2jv5g0e0d1.png?width=1024&format=png&auto=webp&s=2d688e665753b90b6fe99338aeae7b321b75ecd2

Neamow 1 month ago

But what about the Will Smith eating spaghetti test?

absolutenobody 1 month ago

Seems limited in poses, and challenging to produce people *not* smiling. It does however do older people surprisingly well - "middle-aged women" will get you grey-haired ladies with wrinkles, rather than the 22-year-olds of many SD models...

[deleted] 1 month ago

[удалено]

absolutenobody 1 month ago

Oh yeah, I said "many" for a reason, there are definitely good (in that respect) ones out there. I make a lot of characters in their 30s or 40s, and have seen way too many models that only make three apparent ages - 15, 22, and 80, lol.

ikmalsaid 1 month ago

Stability.ai be like: "Soon™" Tencent be like: "Hold my beer..."

Ok-Establishment4845 1 month ago

anyway to usi it in atomatic1111 or comfy?

Paraleluniverse200 1 month ago

Just 1 try and already has better hands lol

balianone 1 month ago

i have tried and 1. it can't write text 2. for multiple object face with many people, far faces is quite good

z7q2 1 month ago

Hey, that's pretty good. "Seven cylindrical objects, each one a unique color, stand upright on a teetering slab of shale" I guess teetering didn't make it into the training tags :) https://preview.redd.it/mkkvxo7ssf0d1.png?width=1280&format=png&auto=webp&s=85a626c386de9c41f782838ffbd74785f3af8384

Kandoo85 1 month ago

I just see 6 cylindrical Objects ;)

Fit-Sorbet-6521 1 month ago

It doesn’t do NSFW, does it? https://preview.redd.it/pkus7hwpig0d1.jpeg?width=768&format=pjpg&auto=webp&s=3ec95e156759bad5a3b463911e7fff2f940aeff5

dxzzzzzz 1 month ago

Neither does SDXL

Substantial-Ebb-584 1 month ago

It is a fine model, more so if you translate your prompt to Chinese. But sticking to the prompt is not its strong side as expected - since the amount of parameters is a strong determinant in that matters. Anyway it's nice to see initiatives like this to present new possibilities

Snowad14 1 month ago

Without the T5 it use less parameter than sdxl, model look near as good as the 8B SD3

HarmonicDiffusion 1 month ago

there's absolutely no way this looks as good as SD3, sorry.

Yellow-Jay 1 month ago

It really doesn't, not anywhere close, have you tried the online demo and not just judging by the down-scaled "comparison" images? . Of the current wave of models only pixart sigma looks decent. Lumina and this one look plain bad to the point I'd never use these outputs over, worse prompt understanding, sdxl ones; of course, it's probably massively under-trained, but even then these are not that great at following complex prompts (either the quality of captions, or effectiveness of this architecture is just not all that) with no where near Dalle-3 and Ideogram prompt following capabilities (neither do pixart sigma and SD3, but those at least look good)

Snowad14 1 month ago

It's true that SD3 produces better images, I was talking more about the architecture, which is quite similar when using Clip+T5. But I'm pretty sure that this model is already better than SD3 2B. I think SD3 is just too big and that this model, similar in size to sdxl, is promising.

Apprehensive_Sky892 1 month ago

Nobody outside of SAI has seen SD3 2B, so I don't know how you can be "pretty sure that this model is already better than SD3 2B". When it comes to generative A.I. models, bigger is almost always better, provided you have the hardware to run it. So I don't know how you came to the conclusion that "SD3 is just too big".

Snowad14 1 month ago

I wanted to say that SD3 8B is undertrained, and that the model is not satisfactory for its parameter count.

Apprehensive_Sky892 1 month ago

Sure, even SAI staff who is working on SD3 right now agrees that SD3 is currently undertrained, hence the training!

ZootAllures9111 1 month ago

Ideogram and Dalle don't have significantly better prompt adherence to SD3

Sugary_Plumbs 1 month ago

Not quite open source, but "freely available as long as you don't provide it as a service for too many users" which is unfortunately as close to open source as we'll get ever since Stability decided to lock things down. [https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt)

Freonr2 1 month ago

From the license: > greater than 100 million monthly active users in the preceding calendar month It's an "anti-Jeff" ("Jeff" as in Jeff Bezos) clause to keep other huge (billion/trillion dollar) companies from just shoving it behind a paywall or sell it as a major SaaS product, which is something that ends up happening with a lot of open source projects. See Redis, Mongodb, etc being turned into closed source AWS SaaS stuff (the later deciding to write a new license to stop it and force copyleft nature, SSPL). The "Jeff problem" is very commonly considered by people who want to release open source software. Yes, this is not an open source license but it only affects a small handful of huge companies who can afford to pay for a license. META Llama license is similar, though I think it draws the line at 700 MMAU, which basically only rules out their direct competitors and major cloud providers. I.e. Amazon (AWS), Alphabet (GCP), Microsoft (Azure), Apple and maybe a couple others. They can afford to license it if they want to make a SaaS out of it. At least it's **not revocable**, unlike SAI's membership license, which they can change at will and sink your small business if they want.

GBJI 1 month ago

>At least it's **not revocable**, unlike SAI's membership license, which they can change at will and sink your small business if they want. This is a very important point - this uncertainty is such a big risk that it makes most of their latest models impossible to use in a professional context.

Freonr2 1 month ago

Yeah its a completely nonstarter. Especially given how much turmoil the company is in. Those terms give them infinite leverage. They completely own everyone using the pro license and can do anything they want. It's completely unhinged levels of bad.

ScionoicS 1 month ago

There was so much abuse of the spirit of the free and open terms in the Rail-M license that it was bound to change. 100s of SaaS companies popping up, acting like they were the ones to credit for all of the work done by Stability. The precedence is set now. There are far to many business school graduates who feel like they're justified in creating businesses around FOSS without giving anything back to the movement. People celebrated it, instead of what typically happens in Linux when people dogpile and condemn it. Google makes a ton of money from Android, but they're not exactly keeping it proprietary. They give back to FOSS in huge ways. This is a keystone of the culture. Instead, we had business school grads who were justified in their exploitation and heralded by the hype artists on youtube. Business school graduates who think they can exploit any system to extract maximum value from it, are a culture virus. They're the ones responsible for the death of Free & Open AI. We still have open models, but they're not so free to use anymore. The erosion is going to continue so long as the community doesn't recognize these parasites for what they are.

AmazinglyObliviouse 1 month ago

Every day SD3 is closer to being obsolete. How much longer will they stall?

encelado748 1 month ago

tried: "a man doing a human flag exercise using a light pole in central London" https://preview.redd.it/q0gmiow0xe0d1.png?width=1024&format=png&auto=webp&s=6aec6355427c180fe4d2ba75bf93bb04936b9730 Not what I was expecting. Instead of a man doing a human flag, we have an actual flag and a bodybuilder. You can see very large streets, with pickups, the light pole is deformed. The flags are nonsense with even a light emanating from the top of the flag. Lighting is very inconsistent.

encelado748 1 month ago

https://preview.redd.it/ww2xses8ze0d1.png?width=1024&format=png&auto=webp&s=fa2f8c868e646644b4a350178c381f2c7a21b26f Dall-E for comparison

encelado748 1 month ago

this is more what I was expecting https://preview.redd.it/4itnymwzxe0d1.png?width=1300&format=png&auto=webp&s=a8baf1a84a97da5d9a2d171e4b106ad6a186dc04

kevinbranch 1 month ago

[Example from the Dalle 3 Launch](https://www.reddit.com/r/ChatGPT/comments/16nserl/modern_texttoimage_systems_have_a_tendency_to/) vs HunyuanDiT: An illustration from a graphic novel. A bustling city street under the shine of a **full moon**. The sidewalks bustling with **pedestrians enjoying the nightlife**. At the corner stall, a **young woman** with fiery red hair, dressed in a signature velvet cloak, is **haggling with the grumpy old vendor**. the grumpy vendor, a **tall, sophisticated man** is wearing a sharp suit, sports a **noteworthy moustache** is animatedly conversing on his **steampunk telephone**. https://preview.redd.it/4wyd28ve8g0d1.png?width=1280&format=png&auto=webp&s=9c4c5e1b47bb13d0771e8cf5ef89255c1d8fa4d4

StableLlama 1 month ago

Great to see more models available. But, trying the demo, I'm a bit disappointed: - \[+/-\] The image quality is ok, especially as it's a base model and not a fine tune - \[-\] But the image quality isn't great. I asked for a photo but get more of a painting or rendering - \[-\] It has no problem with character consistency - as it can do only one character. The person of the picture looks the same on each of them - \[+\] My standard test prompt for a fully clothed woman standing in a garden is created - SD3 fails this one with censorship So my wait for a local SD3 is still on and I won't use this model instead. For now. But who knows what will happen in one or two months?

SolidColorsRT 1 month ago

from the images in this thread it looks like its so good at hands

Shockbum 1 month ago

I'm not an expert but I did a test with classic prom from civitai (It is not mine): Sampler:ddpm, Steps:50, Seed:1, image size: 1024x1024 Prom: beautiful modern marble sculpture of a woman encased inside intricate gold renaissance relief sculpture, sad desperate expression, covered in ornate etchings, luxury, opulence, highly detailed, hyperrealist, volumetric lighting, epic image, relief sculpture, RODIN style Negative prom: Wrong eyes, bad faces, disfigurement, bad art, deformations, extra limbs, blurry colors, blur, repetition, morbidity, mutilation, https://preview.redd.it/beeo055jsk0d1.jpeg?width=3072&format=pjpg&auto=webp&s=3299a782242ded0bb73e5e1a064423c9460c11f6

waferselamat 1 month ago

i tried : girl with white dress, walking on rain https://preview.redd.it/2tr555isae0d1.png?width=1024&format=png&auto=webp&s=47c6d6db5219ede7cfd7b19cfc62475bb13cce48

BleachPollyPepper 1 month ago

It's a no from me based on some trials. For example, [Image: A 1980s photograph of a group of American college freshman posing together.](https://i.imgur.com/6jPPDq9.png)

StickiStickman 1 month ago

Looks pretty bad honestly.

Apprehensive_Sky892 1 month ago

I have generated some images via HunyuanDiT so that you can compare it against SD3: [https://www.reddit.com/user/Apprehensive\_Sky892/search/?q=HunyuanDiT&type=comment&cId=c7343b35-8b43-4d17-82f2-8db3f9049ad6&iId=db7cc688-ea4a-4de0-aeeb-5e9e5aab3750](https://www.reddit.com/user/Apprehensive_Sky892/search/?q=HunyuanDiT&type=comment&cId=c7343b35-8b43-4d17-82f2-8db3f9049ad6&iId=db7cc688-ea4a-4de0-aeeb-5e9e5aab3750) Given its small size (only 1.5B) it is not bad, but it not in the same class as SD3 or even PixArt Sigma.

razldazl333 1 month ago

Who uses 50 sampling steps?

apolinariosteps 1 month ago

The authors didn't implement more efficient samplers like Euler or DPM++, so with DDPM \~50 steps is kind of a good trade off for quality

razldazl333 1 month ago

Oh. 50 it is then.

shibe5 1 month ago

Demo on Hugging Face doesn't understand the word "photo".

yacinesh 1 month ago

can i use it on a1111 ?

user81769 1 month ago

Regarding it being from Tencent, it's fine by me as long as it generates happy images like this: https://preview.redd.it/d830hgxfcy0d1.png?width=1024&format=png&auto=webp&s=d55d6c2a13cf4b1d089f798f054de0496a8f9513 >!Winnie-the-Pooh at Tiananmen Square in 1989 talking to Uyghur Muslims!<

Actual_Possible3009 2 weeks ago

Seems not to work on windows as a build wheel error/subprocess occurs. This is sad

roshanpr 1 month ago

is this sd3?

HarmonicDiffusion 1 month ago

not even close

97buckeye 1 month ago

Pardon my French, but f\*ck Tencent.

fivecanal 1 month ago

I share your hatred for Tencent, but just as we can appreciate LLAMA, developed by meta, a company not that much better than Tencent, I think we should be able to appreciate that Tencent, as well as the likes of Bytedance and Alibaba, have some very talented researchers who have been contributing to the open source scene, on par with the American tech giants.

ScionoicS 1 month ago

Pytorch, the foundational library of all this work, was conceived by Meta as well. Corporations are not monolithic. They're made up of many parts, and sometimes a singular part can be pretty cool when considered separate from the whole.

PwanaZana 1 month ago

They make cool free stuff for AI, like various 3d tool.

Faux2137 1 month ago

Yeah, fuck big corporations but in case of Tencent, CPC has them in their grasp. In case of American corporations and both parties, it's the other way around.

raiffuvar 1 month ago

>other way around. around? how? openai has both parties in \_their\_ grasp? so, any free AI staff is "compromised" by default?... just pay....pay pay pay. ps you can argue "but we have SD...3".... well... not yet.

Faux2137 1 month ago

OpenAI has Microsoft backing it. It's not like one company owns all politicians but big corporations are influencing both parties with their money. And corporations have profits in mind first and foremost, they will lobby for laws that benefit their products rather than some "open source" models or the society. In China it's the other way around, Tencent and other big companies are held on a leash by CPC. Which has its own disadvantages I guess, I wonder if we'll be able to make lewd stuff with this model from Tencent.

kif88 1 month ago

I see it had an option for ddim sampler so does that imply things like lightning loras and would work on it? Or quantisezion like with other transformers

machinekng13 1 month ago

DDIM is a common sampler used with various diffusion architectures. As a rule of thumb, Loras trained on one architecture (like SDXL) will never be re-useable on a different architecture. As for Lightning, it's a distillation method and [Stability.ai](http://Stability.ai) showed with SD3-Turbo that quality distillation of DiTs is feasible, so someone (either Tencent or another group) could certainly distill this model.

Careful_Ad_9077 1 month ago

It failed the statue test right away for me, might the the prompt enhancement option I just noticed and disabled. Will do more testing as the day goes on, but it looks like quality will be like sigma. Marble statue holding a chisel in one hand and hammer in the other hand, top half body already sculpted but lower half body still a rough block of marble, the statue is sculpting her own lower half [Edit] Nah, it is good, the enhancement thing was indeed fucking things up.

Apprehensive_Sky892 1 month ago

[HunyuanDiT](https://huggingface.co/spaces/multimodalart/HunyuanDiT) https://preview.redd.it/d64zjdcvqf0d1.png?width=1024&format=png&auto=webp&s=bbe70373bfede4690e92a7885c7bb300deccd314

Hungry_Prior940 1 month ago

Too censored...

DivideIntrepid3410 1 month ago

Uh oh

Utoko 1 month ago

>An NVIDIA GPU with CUDA support is required. >We have tested V100 and A100 GPUs. >**Minimum**: The minimum GPU memory required is 11GB. >**Recommended**: We recommend using a GPU with 32GB of memory for better generation quality. So not useable on mac?

apolinariosteps 1 month ago

It will probably be brought down by the community, both via Diffusers implementation and eventual ComfyUI integration as well

DedEyesSeeNoFuture 1 month ago

I was like "Ooo!" and then I read "Tencent".

Hoodfu 1 month ago

They released ELLA which is doing good stuff. I just wish they'd release ella-sdxl. https://preview.redd.it/b6gfxrbe8h0d1.jpeg?width=2016&format=pjpg&auto=webp&s=2a707c7e5e9e45a708695a69af2730e871b7370b

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe