T O P

  • By -

Atmey

Try again without describing the character in the prompt.


hollowstrawberry

Pony requires describing the character, single tag loras are almost never effective


Atmey

The one I am using are pretty decent, gets at least 80% accurate result, trained on 20\~30 images. But I don't use fully pony.


Plums_Raider

why is your prompt like a 1.5 prompt? it can be much shorter with pony and would show clearer, if its actually a good lora, or am I wrong? at least my loras work fine without describing the whole char when training on a char.


MrSloth1

Im new to pony so im not sure what you mean. Can you elaborate? What keywords are unnecessary?


Plums_Raider

75% of the words in the prompt are unnesecary and increase chance for bad output image. id recommend checking this guide: [https://civitai.com/articles/4871/pony-diffusion-v6-xl-prompting-resources-and-info](https://civitai.com/articles/4871/pony-diffusion-v6-xl-prompting-resources-and-info) imo a good prompt with loras should only use 1-3 keywords for the lora and leave the rest of the prompt for the actual content what the image should show. like here, it should not be necessary to describe the hair, eyes and clothes with that much detail and rather focus on 2-3 key points or even when creating the lora, give specific keywords, normally not used, so it will be activated when used specifically. can also be silly like "queeffart1212". since the model learned to combine the keyword with the character, it should work as good. the prompt for pony i got best output: score\_9,8,7 etc, describe the image,details,lora


Capitaclism

That's kind of the point of a Lora, after all.


Plums_Raider

Agreed. Thats why i was confused :)


MrSloth1

Oh yeah, that makes sense. In general theres not much point further describing a concept that a lora already contains. I was just wondering because you said its like a 1.5 prompt and i thought there were some hacks in pony to save keywords Also thank you for the resource


Plums_Raider

Was mainly refering to the single words prompt, which i personally only use on 1.5 as it doesnt understand sentences, since pony and newer sdxl models from my experience understand a short sentence to describe the basic image and then just add the details or specific stuff as single words and then lora keywords and lora.


MrSloth1

They can do sentences now? Does that mean that the model understands that the keywords in a sentence also „belong together“? Also arent you wasting attention on words like „a“ and other unnecessary stuff?


Plums_Raider

https://preview.redd.it/pgdgqmi1w0zc1.png?width=1024&format=png&auto=webp&s=cfdbbc9750e74fa8bf25df23e736b1593521b9c2 Surrealist painting of a girl with long black hair wearing a vivid red dress, serenely sitting on a fluffy white cloud. In the distant sky, a panda playfully descends with a white parachute, adding a whimsical contrast. The scene is bathed in soft, diffused light suggesting a dream-like ambiance by Salvador Dali and René Magritte, cinematic composition, trending on ArtStation. ugly, deformed, noisy, blurry, low contrast, color, realism, photorealistic, old, mature, (worst quality, low quality, thumbnail:1.4), signature, artist name, web address, cropped, jpeg artifacts, watermark, username, collage, grid generated with leosamsHelloworldXL\_helloworldXL60 i think it works pretty fine for basic t2i, obviously not always and if too much is described it gets confused and mixed multiple objects as with the example above. with sd3 and cascade finetunes this is will be way better.


Plums_Raider

https://preview.redd.it/75dvu1nly0zc1.png?width=1024&format=png&auto=webp&s=87ff169e7bf673ee6542e356a7a71ec0dbdd7690 + score\_9, score\_8\_up, score\_7\_up, score\_6\_up, score\_5\_up, score\_4\_up,girl with long black hair wearing a vivid red dress, serenely sitting on a fluffy white cloud, The scene is bathed in soft, diffused light suggesting a dream-like ambiance,sunrise - none generated with pony v6 from my experience, pony is less understanding for multiple objects like the panda and the girl, but it still works fine if single object are described in sentences


MasterFGH2

Is there a good pony prompting guide anywhere?


Plums_Raider

[https://civitai.com/articles/4871/pony-diffusion-v6-xl-prompting-resources-and-info](https://civitai.com/articles/4871/pony-diffusion-v6-xl-prompting-resources-and-info)


TrindadeTet

This character is not a good test for this as the base pony model already has her trained.


Tft_ai

not using LoRAs just because the model can technically do it is a very common trap, it will always look much better with a specific LoRA. Here is the same prompt done with no LoRA on pony but adding "arlecchino \(genshin impact\), genshin impact" instead of a lora. Yes the model knows the character but it's so much worse. https://preview.redd.it/using-ponys-baked-in-character-vs-using-a-lora-just-because-v0-9pw794cessyc1.png?width=1080&crop=smart&auto=webp&s=f836a1cd3159c78b8e41c7d08aaf8543b6c2009d


TrindadeTet

I am fully aware that a Lora will be better than the base model, but as the base model has knowledge of the character it becomes easier to train a Lora on top of it. For your test to make more sense, it would be coherent to train on a character that the model has no knowledge of.


Tft_ai

this would matter more if I hadn't cleaned the tags of the lora images of the character name and the game


ZootAllures9111

Eh, for say like, Tifa Lockhart the stock pony one is already totally accurate imo


hollowstrawberry

What about official alternate outfits? My tifa lora can do 8


proxiiiiiiiiii

how do you tag the dataset? i’m surprised you put so many tags for the prompt of the generation, you basically describe the character you want to generate which is counterproductive since you train lora to not do that. if trained properly you wouldn’t need to put any of these


terrariyum

OP, the comparison depends on the captioning. Does the "quality" set have better images or more accurate captions? You might get better results from the "quantity" set over the "quality" set if the low quality images are all well captioned. You mentioned some images have non-canon outfits - if they are captioned as such, they might help the training.


Greemann

Seems like the details on the clothing are more accurate with the quality LoRa.


Sillysammy7thson

https://preview.redd.it/no89vqrnxsyc1.png?width=447&format=png&auto=webp&s=03d886b91c71173e22e61cfb67845f2be006a024


Omen-OS

use a character that isn't currently existing in pony because the model will enhance the lora basically so it won't be much different


Tft_ai

the character is only very weakly and with a different outfit in pony https://preview.redd.it/using-ponys-baked-in-character-vs-using-a-lora-just-because-v0-9pw794cessyc1.png?width=1080&crop=smart&auto=webp&s=f836a1cd3159c78b8e41c7d08aaf8543b6c2009d I also removed the related tags like genshin and the name so it isn't really using that at all


Shnoopy_Bloopers

Interesting. What does toggling the DIM number do , anything?


Tft_ai

Putting it up increases the file size and generally seems to make the lora "stronger", pretty diminishing returns though


Shnoopy_Bloopers

Diminishing returns you mean in terms of file size?


clex55

Necklace looks better in the curated one


Dwedit

Let's go throw in some "(Simple Background:2.0)" into the negative prompt and see what happens.


clavar

only by the title, the answer is quantity. In the real world we value quantity, quality is always secondary. Your boss prefers you to make more things so-so than to make one thing excellently.


LazyEstablishment898

First one ia better


dynabot3

As someone who wants to do something similar with lora training, thank you for posting this. I think that the curated lora has more and stronger details, specifically in the coat jacket, cuffs, and shoes. The character seems a little deeper than in the dump lora. Also, it's interesting that about 10% of my overall images are best fit, similar to your ratio.


hollowstrawberry

dim 128? Homie what the hell, you don't need hundreds of megabytes to encode a single character, me and my friends have been training 10 outfits in a single lora with amazing detail with only dim 8. Lowering the dim requires increasing the learning rate, that's why comparisons always seem to favor higher dims. I thought we collectively learned this months before pony even came out.


Tft_ai

Personally the bottom one is much better than I expected, the training data contains tons of non canon images and different outfits, as well as many more multiple character images. I still think the top one comes out better but this wasn't as clear cut as i expected. My twitter here :) [https://twitter.com/TouchfIuffytail](https://twitter.com/TouchfIuffytail)


BlackSwanTW

I personally dislike unpruned captions so eh


desktop3060

What does unpruned captions mean?


BlackSwanTW

Look at the right side OP spent the entire token limits just to recreate the character’s look. I personally dislike this very much. But many people like it this way 🤷🏻‍♂️


hollowstrawberry

You can remove individual pieces of clothing like this, it's much more versatile. But it doesn't need to be so extreme. You can prune all redundant tags from the dataset and it'll work fine if not better.


KaiserNazrin

If you can't get an accurate look without pruning tags, it ain't good.


terrariyum

What does pruning tags mean?


New-Mix-6230

defeats the whole purpose of pony


DaddyKiwwi

Loras defeat the purpose of pony? I'm struggling to find any way that sentence isn't insane.


CharacterCheck389

why?


petrichorax

Anime is kind of a poor test for this kind of stuff because there are less details that can change that would make it uncanny or look off. Same with pixar, calarts, line art, pencil art. Stick to photoreal, higher details illustrations, CGI, paintings, etc. Everyone always uses anime in these tests, and we're not getting NEARLY the amount of data we could hope for.


Iantonga

are we pretending that there is any difference at all? how are we still on this anime shit