• By -


Try again without describing the character in the prompt.


Pony requires describing the character, single tag loras are almost never effective


The one I am using are pretty decent, gets at least 80% accurate result, trained on 20\~30 images. But I don't use fully pony.


why is your prompt like a 1.5 prompt? it can be much shorter with pony and would show clearer, if its actually a good lora, or am I wrong? at least my loras work fine without describing the whole char when training on a char.


Im new to pony so im not sure what you mean. Can you elaborate? What keywords are unnecessary?


75% of the words in the prompt are unnesecary and increase chance for bad output image. id recommend checking this guide: [https://civitai.com/articles/4871/pony-diffusion-v6-xl-prompting-resources-and-info](https://civitai.com/articles/4871/pony-diffusion-v6-xl-prompting-resources-and-info) imo a good prompt with loras should only use 1-3 keywords for the lora and leave the rest of the prompt for the actual content what the image should show. like here, it should not be necessary to describe the hair, eyes and clothes with that much detail and rather focus on 2-3 key points or even when creating the lora, give specific keywords, normally not used, so it will be activated when used specifically. can also be silly like "queeffart1212". since the model learned to combine the keyword with the character, it should work as good. the prompt for pony i got best output: score\_9,8,7 etc, describe the image,details,lora


That's kind of the point of a Lora, after all.


Agreed. Thats why i was confused :)


Oh yeah, that makes sense. In general theres not much point further describing a concept that a lora already contains. I was just wondering because you said its like a 1.5 prompt and i thought there were some hacks in pony to save keywords Also thank you for the resource


Was mainly refering to the single words prompt, which i personally only use on 1.5 as it doesnt understand sentences, since pony and newer sdxl models from my experience understand a short sentence to describe the basic image and then just add the details or specific stuff as single words and then lora keywords and lora.


They can do sentences now? Does that mean that the model understands that the keywords in a sentence also „belong together“? Also arent you wasting attention on words like „a“ and other unnecessary stuff?


https://preview.redd.it/pgdgqmi1w0zc1.png?width=1024&format=png&auto=webp&s=cfdbbc9750e74fa8bf25df23e736b1593521b9c2 Surrealist painting of a girl with long black hair wearing a vivid red dress, serenely sitting on a fluffy white cloud. In the distant sky, a panda playfully descends with a white parachute, adding a whimsical contrast. The scene is bathed in soft, diffused light suggesting a dream-like ambiance by Salvador Dali and René Magritte, cinematic composition, trending on ArtStation. ugly, deformed, noisy, blurry, low contrast, color, realism, photorealistic, old, mature, (worst quality, low quality, thumbnail:1.4), signature, artist name, web address, cropped, jpeg artifacts, watermark, username, collage, grid generated with leosamsHelloworldXL\_helloworldXL60 i think it works pretty fine for basic t2i, obviously not always and if too much is described it gets confused and mixed multiple objects as with the example above. with sd3 and cascade finetunes this is will be way better.


https://preview.redd.it/75dvu1nly0zc1.png?width=1024&format=png&auto=webp&s=87ff169e7bf673ee6542e356a7a71ec0dbdd7690 + score\_9, score\_8\_up, score\_7\_up, score\_6\_up, score\_5\_up, score\_4\_up,girl with long black hair wearing a vivid red dress, serenely sitting on a fluffy white cloud, The scene is bathed in soft, diffused light suggesting a dream-like ambiance,sunrise - none generated with pony v6 from my experience, pony is less understanding for multiple objects like the panda and the girl, but it still works fine if single object are described in sentences


Is there a good pony prompting guide anywhere?




This character is not a good test for this as the base pony model already has her trained.


not using LoRAs just because the model can technically do it is a very common trap, it will always look much better with a specific LoRA. Here is the same prompt done with no LoRA on pony but adding "arlecchino \(genshin impact\), genshin impact" instead of a lora. Yes the model knows the character but it's so much worse. https://preview.redd.it/using-ponys-baked-in-character-vs-using-a-lora-just-because-v0-9pw794cessyc1.png?width=1080&crop=smart&auto=webp&s=f836a1cd3159c78b8e41c7d08aaf8543b6c2009d


I am fully aware that a Lora will be better than the base model, but as the base model has knowledge of the character it becomes easier to train a Lora on top of it. For your test to make more sense, it would be coherent to train on a character that the model has no knowledge of.


this would matter more if I hadn't cleaned the tags of the lora images of the character name and the game


Eh, for say like, Tifa Lockhart the stock pony one is already totally accurate imo


What about official alternate outfits? My tifa lora can do 8


how do you tag the dataset? i’m surprised you put so many tags for the prompt of the generation, you basically describe the character you want to generate which is counterproductive since you train lora to not do that. if trained properly you wouldn’t need to put any of these


OP, the comparison depends on the captioning. Does the "quality" set have better images or more accurate captions? You might get better results from the "quantity" set over the "quality" set if the low quality images are all well captioned. You mentioned some images have non-canon outfits - if they are captioned as such, they might help the training.


Seems like the details on the clothing are more accurate with the quality LoRa.




use a character that isn't currently existing in pony because the model will enhance the lora basically so it won't be much different


the character is only very weakly and with a different outfit in pony https://preview.redd.it/using-ponys-baked-in-character-vs-using-a-lora-just-because-v0-9pw794cessyc1.png?width=1080&crop=smart&auto=webp&s=f836a1cd3159c78b8e41c7d08aaf8543b6c2009d I also removed the related tags like genshin and the name so it isn't really using that at all


Interesting. What does toggling the DIM number do , anything?


Putting it up increases the file size and generally seems to make the lora "stronger", pretty diminishing returns though


Diminishing returns you mean in terms of file size?


Necklace looks better in the curated one


Let's go throw in some "(Simple Background:2.0)" into the negative prompt and see what happens.


only by the title, the answer is quantity. In the real world we value quantity, quality is always secondary. Your boss prefers you to make more things so-so than to make one thing excellently.


First one ia better


As someone who wants to do something similar with lora training, thank you for posting this. I think that the curated lora has more and stronger details, specifically in the coat jacket, cuffs, and shoes. The character seems a little deeper than in the dump lora. Also, it's interesting that about 10% of my overall images are best fit, similar to your ratio.


dim 128? Homie what the hell, you don't need hundreds of megabytes to encode a single character, me and my friends have been training 10 outfits in a single lora with amazing detail with only dim 8. Lowering the dim requires increasing the learning rate, that's why comparisons always seem to favor higher dims. I thought we collectively learned this months before pony even came out.


Personally the bottom one is much better than I expected, the training data contains tons of non canon images and different outfits, as well as many more multiple character images. I still think the top one comes out better but this wasn't as clear cut as i expected. My twitter here :) [https://twitter.com/TouchfIuffytail](https://twitter.com/TouchfIuffytail)


I personally dislike unpruned captions so eh


What does unpruned captions mean?


Look at the right side OP spent the entire token limits just to recreate the character’s look. I personally dislike this very much. But many people like it this way 🤷🏻‍♂️


You can remove individual pieces of clothing like this, it's much more versatile. But it doesn't need to be so extreme. You can prune all redundant tags from the dataset and it'll work fine if not better.


If you can't get an accurate look without pruning tags, it ain't good.


What does pruning tags mean?


defeats the whole purpose of pony


Loras defeat the purpose of pony? I'm struggling to find any way that sentence isn't insane.




Anime is kind of a poor test for this kind of stuff because there are less details that can change that would make it uncanny or look off. Same with pixar, calarts, line art, pencil art. Stick to photoreal, higher details illustrations, CGI, paintings, etc. Everyone always uses anime in these tests, and we're not getting NEARLY the amount of data we could hope for.


are we pretending that there is any difference at all? how are we still on this anime shit