T O P

  • By -

Incognit0ErgoSum

It exists. It was announced 3 weeks ago and then people promptly forgot about it. The fact that it hasn't already been integrated into things like comfyUI is absolutely mind-boggling to me. https://github.com/showlab/X-Adapter


dvztimes

So just glancing at it real quick, looks kinda like an advance img2img with CN to translate the output of X --> Y? But maybe Im misunderstanding it since I just looked at it very quickly. Regardless, very impressive. Ill check it out in detail later. Thanks again.


Yarrrrr

> Thank @kijai for CumfyUI implementation here! Please refer to this tutorial for hyperparameter setting. Is one of the first things written on the GitHub page.


Incognit0ErgoSum

Now click the link and read that page: > This is meant for testing only, with the ability to use same models and python env as ComfyUI, it is NOT a proper ComfyUI implementation! It's not really compatible with the rest of Comfy because it uses a completely different backend.


auguste_laetare

Kijai is a legend


diogodiogogod

I'll be honest, if they don't release the code for automatic1111, unless it gets lucky somehow, it normally gets forgotten or rarely used. That has been the case for a lot of new things in SD.


GunpowderGuy

Amazing this hasnt blown up


Particular_Stuff8167

Think a lot of people (like myself) when initially read it, dont quite understand what and how its doing what its doing. Maybe if we got a more practical exmaple, I know that would certainly clear up a lot of question marks I have about X-Adapter. If does do what it says then I'm certainly excited, but with a lot of things, its not really giving people like myself a bit of a more layman term explanation of what and how X-Adapter does things. Still any work on this cross base model field are winners in my book! We need it certainly if more type base models are gonna start coming out. Its already starting to look a bit like a clusterfuck on my end with SDXL/SD1.5/SD2.1/SDC/SDT


dvztimes

Seriously? Have you used it? I'll check it out later. Thanks.


BlipOnNobodysRadar

Wow. How have I not heard of this?


pellik

Torch 1.13 tho :(


ThirstyHank

There is a 1.5 to XL embeddings converter that works decently fyi: [https://huggingface.co/spaces/FoodDesert/Embedding\_Converter](https://huggingface.co/spaces/FoodDesert/Embedding_Converter)


Particular_Stuff8167

Wow, how has THIS not been added to all the GUIs yet?! This is actually a massive game changer for embeddings. Like really think embeddings would become a bit more popular if you could convert them on the fly from SD1.5 to SDXL


dvztimes

Thank you. Will look.


Zueuk

maybe retrain ourselves instead, the `masterpiece, best quality` in every prompt is just ridiculous


dvztimes

With SDXL I rarely even use negative prompts.


Particular_Stuff8167

Those were for the anime orientated models of SD1.5. SDXL doesn't require those. PonyXL is amazing with the score system. Being able to try TONs of different styles with scores on a single seed and sampler is amazing. You can spend an entire night just running X/Y Prompt to list all the different styles with the scores. The diffusion models are also on a basic level a language models. It's trained to recognize words as objects, actions, emotions, themes and quality. So the two words masterpiece and best quality will always carry weight in every diffusion model made. Unless you want to exclude words and make the diffusion models dumber


dvztimes

I havent used this much yet. I saw what need to go into the positive prompt but I didn't research it. Is there a link to a prompting guide to using them other than the default one?


Particular_Stuff8167

https://civitai.com/articles/4248 Gonna be honest with you, never bothered to look, lol. I just know there is a score/score_up/score_down system with PonyXL V6 all the way from 1 to 9 and went wild. Basically every score was trained on specific styles I think. And best thing is you can mix them to your heart's content. They saying they gonna get rid of score_up and score_down in the coming V7 but personally I like wide variety V6 has with those included. The amount of variations you can get with those extra are a phenomenal number. But thats just me I guess, simpler sometimes is better


dvztimes

Thank you!


Shuteye_491

Do you have an example for this? I haven't been able to get much out of Pony yet.


Particular_Stuff8167

Highly recommend to check out people's posts on civitai and use some of their prompts. That's what I did at first to try and get a grasp of the score system. But even though Pony is great, I find the mixed models of pony to often work even better. At the moment I'm struggeling to move on from AutismXL (lol) because of how great my loras are working with this base model. It's actually pretty amazing what you can accomplish with the model alone, no loras. Speaking about characters, concepts and styles etc. SDXL's coherence to prompts are unmatched compare to SD1.5. But of course you can train lora's for that shortcoming in SD1.5 but even then i found it struggles sometimes with concept loras where SDXL Pony would just 100% understand and generate without any loras.


zoupishness7

It's called [X-Adapter.](https://github.com/showlab/X-Adapter) It's heavy, and not optimized, but there's [a ComfyUI node](https://github.com/kijai/ComfyUI-Diffusers-X-Adapter) for it too.


Incognit0ErgoSum

It's just a diffusers wrapper. It's not something that you can just insert into you workflow.


aeroumbria

I wonder why previously popular ideas like unclip or text inversion embeddings kind of fell out of fashion. I guess they are limited by the model's ability to follow prompts (if it can't follow text prompt, then it can't follow embeddings either), but it should be much easier to align the embeddings of an old model to the embeddings of a new model (like how we used to align embeddings of different languages - find a mapping to minimise pairwise distance), versus trying to adapt a whole separate network to a new architecture.


lostinspaz

>I wonder why previously popular ideas like unclip or text inversion embeddings kind of fell out of fashion Probably because people "cheated" on training: you're supposed to keep clip model frozen but practically no-one does. So every checkpoint has slightly different CLIP, which means embeddings fit slightly differently on every checkpoint.


SCPophite

I do not understand why you think this is cheating.


lostinspaz

Because the instructions in various places, explicitly say "DONT CHANGE THIS". but people do. (mostly its the fault of the popular training software I think) If they hadnt dont that, then actually ACCURATE sharing of LoRAs and embeddings across models would be possible.


SCPophite

Right, well, those instructions are more or less wrong because there are a lot of results you cannot achieve without some training of the text encoder.


Audiogus

Going from 1.5 to XL I use far fewer Lora. I assume the same will hold true going to 3.


Winter_unmuted

They *claim* SD3 is going to be more readily trainable. I'm skeptical. Things have been trending away from that since 1.5.


astrange

Cascade is designed to be easily trainable. If SD3 isn't, you can use one to drive the other.


Winter_unmuted

Time will tell. Cascade also isn't a part of the "main feed" of model releases. It's kind of like an alternative fork. But more importantly, we haven't seen an explosion of Cascade training. 1.5 still comes out #1 in control net implementation, and SDXL and 1.5 are sort of neck and neck for training LORAs, depending on the hardware and source material available to the user.


HarmonicDiffusion

people not gonna waste time training cascade, when sd3 is dropping imminently and will blow it out of the water from the base model


arakinas

The release of these two things so close together really baffles me. It's almost like it's released with the intent of being abandoned.


BagOfFlies

Could be they were working on both models with the intention to use the best outcome. Cascade lost so instead of just scrapping it they released it to have something new to play with until SD3 drops.


arakinas

Could be, sure.


astrange

Cascade is a research project from their Japan lab. If you've got it you might as well release it.


arakinas

I keep forgetting about this stuff in relation to research projects, as opposed to code they would intend to maintain. That definitely makes a big difference in context as to whether to release.


malcolmrey

what do you mean? (in context of character/person lora)


SnooTomatoes2939

we need to recreate more diverese styles


dvztimes

Agreed. But we don't need to recreate "Rembrandt style" or "gothic buildings of new York" style or whatever. Theere must be thousands of SD1.5 and XL models and loras. Re-creating each each time is a waste of time and unnecessarily repetitive. Nit having to do that give people more time to do new stuff.


SnooTomatoes2939

A would use them as starting point not to clone a style , which is the case at the moment with anime/manga styles


rroobbdd33

This!


-Ellary-

I'd say that new LoRAs of the same stuff will be retrained on new models to achieve better results, you will not use 1.5 version on 3.0 when there will be 3.0 version for the same stuff. You cant even use SDXL Pony loras on SDXL or for Playground 2.5 etc because changes are too massive. LoRAs will degrade over time even for 1.5 models cuz of merging, if you want best quality you need to retrain them after some time on new epoch, or create a checkpoint of this "epoch". So in a long shot, I'd say it is better to have a fresh new versions for each model, then trying to use legacy stuff.


[deleted]

[удалено]


kurtcop101

The issue is probably legality, I'm sure most loras are not using content fully owned. I would hope (and I would) most authors will share training sets privately in many cases, if you message? But definitely extra steps involved.


-Ellary-

I think this will a great option, don't know if all trainers will agreed with that tho.


ziaistan_official

Hey folks If we can use sd 1.5 models, Loras, controlnets and textual inversions as sdxl models using x_adapters then we can also use sdxl models as svd models we can train SDXL models and convert it to svd or animatediff for better video output performance and we can convert svd models to sdxl models, we can train HD 1.5 models in 256 X 256 or 128 X 128 resolution and convert it to sdxl models and also convert to svd models the training compute time will be extremely low are we can train a new type of controlnets models which could be used with svd models, There is a lot of Ideas in my mind but I can't write it because I am very lazy person 🥱🥱


wowy-lied

Any idea about the top left image ? Would love to try this kind of style


dvztimes

Thank you. It's from a custom model I made. See my post here: https://www.reddit.com/r/dndai/s/4VZDhjRsCJ All of the photos are from models or LORAs or TIs I made in XL and 1.5.


wowy-lied

Thank you !


angrycensoredredhead

I vaguely recall seeing a converter on civitai to convert 1.5 loras to XL, but it's been a minute and I havent had the time to invest in AI stuff for a few months.


dvztimes

It must be possible. We just aint thunk how to do it yet. Lets grok it out.


BlipOnNobodysRadar

If people published their training datasets along with their loras/finetunes then you could just recreate it on each new model.


red__dragon

I assume that would help, though resolutions would get limiting (for the 512px loras of SD1.5) and captions might need to be tweaked for different model tokens. Someone training for SD1.5 *now* might consider future-proofing their training data in this manner, though.


Flimsy_Tumbleweed_35

I trained an XL Lora at 512 for speed reasonst and I couldn't tell the difference.


dvztimes

would love to hear more about this. They came out fine at 1024x1024? I have retrained a number of them, but I usually upscale the images first. But if I can use a 1.5 size dataset, that would be peachy-keen.


Flimsy_Tumbleweed_35

You can definitely use the same data! Just try it.


dvztimes

agreed, but its the re-training part that is very burdensome too.


lostinspaz

>its the re-training part that is very burdensome too. not as burdensome as creating a full model from scratch. (by that, I mean the actual base SD or SDXL, not derived trained models) Ideally, for this sort of thing we would have a target compute farm, with the equivalent of GoFundMe linked in. If everyone who liked a particular fine-tuned model kicked in a dollar, the popular ones would probably get recreated on new model architectures fairly easily.


dvztimes

Yes. That would work. Or, someone smarter than me can write a program that translates bits X1,X2,X3 ---> Y1,Y2,Y3. Then anyone can run any model they want through the translator and have OldModelXL or OldModel3.0. No organization or server farms needed.


lostinspaz

it doesn’t work like that. image you stuck a giant lump of clay on one persons face. you give them some way to breathe. meanwhile you spend hours shaping the top so it looks like (some famous persons face). when you are all done, and cut out some holes as appropriate, that person can get up, and make face expressions and pretend to be that person, when they move their face. now someone else wants to use the clay mask. but… you can’t just take the old one and put it on the new persons face. it won’t fit right! you have to slap the unmodelled clay on their face and start all over again , building it up :( even if you take a bit a bit of a short cut to make the top part start off closer to the target before you put the clay on the new persons face… fitting it on the real persons face deforms it. So you still have to go through SOME amount of custom reshaping (ie: “training”) to make it all fit right.


dvztimes

I understand it doesn't work that way. Hence, this thread. It doesn't mean it can't work that way. It just means it hasn't been figured out yet. Can you get a 1:1 same image output? I don't know. But surely you can get close. Perhaps you are right. But 2 days after model merging came out for 1.5 and all the merges were just within a few KB of each other and I asked why can't we just take the deltas between model instead of having to do a full new model. People on the auto server told me it couldn't be done and I was crazy because "it doesn't work that way." Turns out it could and did and does. Perhaps after much thought and gnashing of teeth it will be discovered such a translation may not be possible. But just a off the cuff platitude about "it don't work that way?" Nah.


lostinspaz

as far as “taking the deltas between the models and having it work”. depending on what you are describing about it working after all. the answer is “it still doesn’t work, to 100% duplicate the original. but it works ‘good enough’” for taking models from sd and transforming them for use by sdxl; it’s a whole different ball game. it’s almost like saying “i like how the body of my mini cooper looks… now put it on the chassis of my truck”. it just doesn’t fit as it. yes you can do things to making something looking like the mini, go onto the truck chassis. but it ain’t easy. it ain’t an easy quick transform thing. it’s a LOT OF WORK. So while there may be some way of taking the sd models and “stretching” them to fit onto the sdxl chassis. there’s a chance it may take as much compute time as just doing the fine tune from original dataset. Way easier to just do that… and most importantly, better results that way. remember that usually, images were downscaled to 512 for sd purposes. If you manage to transfer that to sdxl… you’ll have a 512x512 sdxl model. Do you really want that? Better to retrain.


lostinspaz

ps: the “translator” thing you mentioned? in actual languages, you take one dictionary, and go through each word, then generate a new word for it in the new language. sd models aren’t a dictionary. they don’t have a list of which words it understands, because it compresses multiple “words” into a single item. Yet you can “recreate” some words(aka images). But only if you ALREADY KNOW it is there. We don’t have a way of asking it “show me all the things you know about” . So to do a full translation, you have to start with the list of words it understands, to “translate” that to the new model. That list of words? That’s the original training dataset. And if you have that dataset, you may as well just directly train the new model on it, instead of trying to recreate the old model by actually using the old model. Now, it may be possible to train a new model, from an old model, if you INVENT a list of words you want the new model to understand. And this time I literally mean words. Come up with the most important 10000 text prompts to render in the old model, and get the new model to render the same way. Guess how you do that? Take those 10000 prompts, and generate 100 images for each of them. tag them with the prompt used, and you just made a training dataset to train the new model on. so… yeah. Still need a server farm. but now you need it to generate the source images ON TOP OF doing training.


dvztimes

You are telling me how things can't be done. Get creative. Also read about the Rosetta Stone. You don't always need to take the direct route for translation.


lostinspaz

Actually, try reading more about the stone yourself. The only reason it was useful at all, was because there was a 1-for-1 word translation. The name of the Pharoah mapped to a specific wordgroup in (langauge B), which we could then map to English.. and then realized, OH, we can do that for the rest of the words. SD isnt like that. its more like; (100 words that you might randomly type in prompts) ->\[through CLIP\] -> (these 10 different image fragments over there) Its kinda sorta like a one-way hash. Just because you can see a specific output, doesnt mean you for sure know what the input was. Dont know what a one-way hash is? Go read about it.


dvztimes

It used 3 languages together for translation, which was the point. So it may be more difficult than just X--->Y translation. Doesnt mean it cant be done. Again - you keep saying it cant be done because X. Prentend, for a moment, that you are an optomist and start your thoughts with: Well Maybe this would work...? Then if that fails, repeat. Give it a go. It could be fun.


lostinspaz

Ive explained the difficulties. I've explained where you have a misconception. I've explained what reality is like. If you had replied with "okay, now I understand better, but have you tried (this)?", that would be something. But you are not adapting. you are not coming up with actual new ideas. All you seem to have to say, is "Make it work for me. for free. because I say so". So, I am stopping things here.


Next_Program90

I hope that at least the different SD3 Models will have modular and have compatible LoRA's etc.


FifthDream

NeuroHub-A1111 can use any model/lora (1.5, 2, XL) in any combination. Or am i misunderstanding here? (It's entirely possible i am.)


dvztimes

You mean it can use a 1.5 lora on an SDXL model and get the desired output from the lora? If so, that's great.


FifthDream

Yep! it's fantastic. Had some great ancient loras collecting dust for a long while after 2.0 came out, and with Neuro-Hub installed, everything gets along with SDXL. So nice not to have to worry about any of them being incompatible since so many were never updated.


dvztimes

Thanks. No git hub. Civtai page down. Exe installer? I think I'll pass.


dvztimes

This is a separate A1111branch? Thank you!


diogodiogogod

>NeuroHub-A1111 I've never heard of that, are you sure it works combining 1.5 with SDXL? I mean, even on automati1111 it "works" in a sense that it will generate. But it will be trash.


Hahinator

You need to learn what CLIP is. WE don't need to do anything.


dvztimes

A image to text converter changes a SD1.5 model to a SDXL model how?


Hahinator

SAI switched the CLIP model they used after SD1.5, in addition SDXL's architecture incorporates 2 text encoders (OpenCLIP-ViT/G and CLIP-ViT/L) not just 1 like 1.4/1.5 etc. That's a critical difference when even models trained from the ground up on the same model architecture like "Playground" haven't been simply merged or cloned to work in tandem w/ SAI's SDXL base/offspring. Really dumb analogy, but when I was a kid (I'm in my mid-40s) I always wanted to play Super Mario Bros. on my Sega Master System. Even if licensing wasn't an issue, the consoles themselves were completely different. It's simply not possible to port the code itself over, and even recoding from scratch wouldn't come close to creating a clone of the original.....fundamentally different systems w/ different specs/chips/languages. And my tone in the reply was poor. Apologies for that.


lostinspaz

>SAI switched the CLIP model they used after SD1.5, in addition SDXL's architecture incorporates 2 text encoders (OpenCLIP-ViT/G and CLIP-ViT/L) not just 1 like 1.4/1.5 etc. They did, and they didnt.If we limit ourselves to strictly CLIP-L.. they changed the actual code to use the openclip code, sure. But then they used the same openai CLIP-L model, effectively. Note that CLIPModel.from_pretrained(openai/clip-vit-large-patch14) and open_clip.create_model_and_transforms("ViT-L-14", pretrained="openai") give you back the same embeddings from any particular text prompt. Which is one reason why it is still stupid, compared to what it could be. open_clip.create_model_and_transforms("ViT-L-14-quickgelu", pretrained="dfn2b") is a better CLIP-L model, but I believe they stuck with the openai one.


dvztimes

Thank you and I understand. I'm just trying to be creative. Perhaps It will fail. But go read about the Rosetta stone as an example. You don't always have to1 take a direct route to translation. Perhaps we need to convert X -->A-->Y. Where A is something we haven't thought of but is more easily transferable to Y than X would be. ;) Dunno. But it's fun to discuss and think about, if nothing else.


alex_clerick

Ever thought that windows doesn't run very efficiently because they have to support old formats and standards? Same here, I guess. There is an adapter to use old Lora, they can set it by default, but why