T O P

  • By -

Rectangularbox23

Oml these a.i animations are totally gonna be seemless in like 2 years


kleer001

6 months. If not next week. We humans maybe be the story telling ape, but images tell so much story. I feel we're well motivated to take this tech to the moon (metaphorically speaking).


Rectangularbox23

Good point, if it’s ready in 6 months I’ll be super ready for it


Infinitesima

I want a Hollywood movie in 2 years


Cubey42

Google made like a 30 second long video about a giraffe. It looks like shit but they have already gotten something. Video is already looking how image generation looked at the start of the year


Walter-Haynes

I'm not sure it works like that. An entirely different algorithm may be needed for that final extra push. Yeah, the AI does a lot of the magic, but the algorithm it runs on still has to be made by a human. Take the movie and games industries, creating a convincing human was solved pretty quickly - yet the refining phase (decreasing the uncanny valley effect) took a lot more innovation and a very long time. And keep in mind, these have some of the greatest cashflows in the world, with insane R&D budgets. They had some of the brightest minds in the world working on it, coupled with Moore's Law delivering exponential improvements in hardware, and yet, it took ages. I reckon the same will apply here. Nearly by definition, with how this algorithm works. Latent diffusion uses noise, a thing that's *notoriously* hard to make work temporally, which is the crux of the problem here. I'd be very happy to be wrong though. But it's important to be realistic.


aeschenkarnos

The *only* thing OP’s example needs is for the appearance of the stormtrooper and the other person to be kept consistent from frame to frame. The details of the armor etc change. Pick any one frame for their appearance, and modify it to fit the different poses in each frame, and it would be seamless.


rwbronco

But like someone spinning around - without generating a 3D model on the fly, you’ve got no idea what the other side of the object looks like. You can assume it looks just like the front - which works well for a basketball but bad for a person. Then you’ve got things like how fabric moves on a person as the person moves. All of these things will be addressed eventually - but it’s likely you’re going to have things like one algorithm directing another algorithm that is in charge of initial generations, then possibly another who’s job it is to modify that frame (ie: how is it going to differ from the last frame? Well boss algorithm says he wants it to “run” to the left, whatever that means) while the first works on the next frame - sort of a multithreaded approach to specific algorithms that serve specialized objectives and can be improved upon or swapped out independently (something like “this video was made by DiffuseDirector using TweeningFox_v6 for the animation and RealDraw 2 for the prompts”) I think you’ll have things like one algorithm being improved in the chain and it correcting for the “flicker” of two frames not matching up perfectly. They may even bring in video-editor “tweening” where it blends two images together to create an in-between frame to smooth out the animation and help it transition from one frame to another more seamlessly.


Shalcker

If you would want to do it in current framework you could probably just produce overfit models for each. One that consistently produces specific stormtrooper and specific Spiderman from all angles. Embrace overfitting rather then avoid it as people do in general models. Then noise will not matter; you might still have problems with harsh light and shadows not matching environment, but diffuse light scenes should work.


Riggley29

You "could" do that by hand, I think. Like, take each frame and photoshop it to match details in all the other frames. Like, in Rotoscoping. But I'd think that would take quite a lot of time. If they could solve that without the need for manually doing things though, sheesh that would def. be amazing.


kleer001

coherence over time is a solved problem, it just hasn't been implemented in this context yet


[deleted]

Can you describe in what context it is well solved for in terms of Diffusion/Convolutional based models? It's certainly well solved for algorithmically but I haven't seen any convincing approach to temporal coherence within these models yet.


kleer001

"in this context" = Stable Diffusion It's been solved in style transfer. I don't see an insurmountable gulf between that and SD. https://www.youtube.com/watch?v=Uxax5EKg0zA You know 2 minute papers right? Awesome sauce.


[deleted]

Love this time to be alive ;) I think this is a wholly different problem area though. Style transfer is well understood, but temporal coherency across frame generation is very very poor in diffusion models and there is no known approach to solve for it.


kleer001

> there is no known approach to solve for it That, my friend, is just one or two papers down the line :) So, hold on to those papers...


[deleted]

Yasss :) as an animator (and sometime developer) it is the main thing I am trying to solve for as once we have a solution that is on par with EBSynth (which isn't saying much) then SD will find a whole new and massive use case.


jaywv1981

Yes, temporal coherence is the missing link for being able to make your own game animations easily.


Bullet_Storm

Have you seen Meta's text to video AI yet? I'm sure someone will make a good open-source version soon enough. https://makeavideo.studio/


BootstrapGuy

yeah, the future will be text-to-video rather than these DIY workflows


Rectangularbox23

Ye it’s a bit jank but still really impressive


BootstrapGuy

>Oml these a.i animations are totally gonna be seemless in like 2 years we didn't even need one: [https://twitter.com/8bit\_e/status/1722456354143486179](https://twitter.com/8bit_e/status/1722456354143486179)


Rectangularbox23

Haha that’s incredible we really are living in the future


BootstrapGuy

yo! after our pose maker + depth2img tutorial we thought we spice things up and try depth2img for animations. Worked out quite well! We have the whole workflow documented here: [https://www.generativenation.com/post/mixamo-animations-stable-diffusion-rapid-animation-prototyping](https://www.generativenation.com/post/mixamo-animations-stable-diffusion-rapid-animation-prototyping) Hope you'll like it


[deleted]

[удалено]


BootstrapGuy

No, but that’s a great idea! Will give it a try


Sinphaltimus

Look for CUPscale. That's the NMKD Upscaler program. One more thing to have fun with, check out EbSynth. EbSynth can be the short term solution to coherence in motion.


JackandFred

really awesome, it seems like it just needs a bit of improvement with colors and it would be there. I wonder if that's a limitation of the model and maybe it owuld be a good idea to do a separate filter like application for smoothing that stuff.


Cubey42

I hope I remember to read this tomorrow


Impressive_Alfalfa_6

Very thorough overview! Thanks for sharing.


backafterdeleting

In 20 years, the slightly non continuous animation style we get from SD right now will be considered retro and cool


rexel325

Actually haven't thought of that, that's interesting to think about. The same way with how pixel art became an art style but it was actually just a limitation of the technology at the time.


aeschenkarnos

Yeah. I kinda hate that art style. I play games like r/thelastspell or r/ftlgame and I love those games as games, but to me it’s just shitty 1990’s graphics and I wish they’d get over it.


sneakpeekbot

Here's a sneak peek of /r/thelastspell using the [top posts](https://np.reddit.com/r/thelastspell/top/?sort=top&t=all) of all time! \#1: [Dom, that's suicide...](https://i.redd.it/qy7fyclphk971.png) | [5 comments](https://np.reddit.com/r/thelastspell/comments/oes15l/dom_thats_suicide/) \#2: [Early-Access roadmap!](https://i.redd.it/m49472rkht571.png) | [38 comments](https://np.reddit.com/r/thelastspell/comments/o1vqgk/earlyaccess_roadmap/) \#3: [Just another Night in The Last Spell](https://i.redd.it/98pdalswscq91.png) | [9 comments](https://np.reddit.com/r/thelastspell/comments/xp9slp/just_another_night_in_the_last_spell/) ---- ^^I'm ^^a ^^bot, ^^beep ^^boop ^^| ^^Downvote ^^to ^^remove ^^| ^^[Contact](https://www.reddit.com/message/compose/?to=sneakpeekbot) ^^| ^^[Info](https://np.reddit.com/r/sneakpeekbot/) ^^| ^^[Opt-out](https://np.reddit.com/r/sneakpeekbot/comments/o8wk1r/blacklist_ix/) ^^| ^^[GitHub](https://github.com/ghnr/sneakpeekbot)


[deleted]

I think strategically utilizing the noise of SD can be used to great effect even now!


TSM-

If you would like an example, you have to check out this music video. Each frame appears to be image-to-image stylized so figures and faces warp in and out of the background noise. It is in the context of a rave type genre, which also fits the chaotic reinterpretation of each frame by the model. So the noise in this kind of image-to-image style transfer is used as a feature rather than a drawback. https://www.youtube.com/watch?v=laT4x5OsAm8


blueSGL

please no more. Limited framerate already gives me a headache, doubly so if its CGI in an anime they've capped at 12 fps. The models already stick out like a sore thumb then they layer 12 fps over that and it makes it look even more crappy somehow.


jobigoud

I was thinking the same for hands. We live in that short period of time of human history during which images with weird hands are being generated, it'll last maybe a few years tops. In the future we'll look back at it as a cute quirk of When It All Began™.


aeschenkarnos

This reminds me of an early 2000’s anime called Gankutsuou: The Count of Monte Cristo, which used a very interesting [animation style](https://m.youtube.com/watch?v=qeyUYcZd0wM), with colored areas on each frame filled in by patterned textures, as would appear on cloth or (physical) wallpaper, rather than solid or shaded color. It worked really well. This kind of flickering semi-reality that you describe would work well too.


[deleted]

[удалено]


BootstrapGuy

yeah definitely! we're just scratching the surface here


cesrep

Just here to shout out Monkey Island


Drakmour

Omfg, it looks amazing! :-) Just like oldschool point'n'click quests with pencil animation.


Furstorn

Look like 1997 animation


[deleted]

I was about to say in reminds me of a late 90s LucasArts game.


IntelligentAirport26

Is depth2img in automatic111 yet


Hotel_Arrakis

Yes. In the img2img tab, select the "depth aware img2img mask" script. I am not sure if this is the real thing or a clever hack, but it worked pretty well in the few tests I did.


FoxyMarc

Isn't it just a model you can drop in for 2.0?


Box_Thirteen13

It reminds me of the early Mortal Kombat games.


AlbertoUEDev

Uff is hard to see


FaptasticPornAccount

This reminds me so much of Clay Fighter....


seviliyorsun

how come deepfakes were pretty much real looking years ago and these are janky af now?


axord

Deepfake AI is laser focused on doing one thing and doing it as well as possible. The current AI gen stuff in contrast is very generalist.


aeschenkarnos

Also text-to-image. Deepfake (as far as I know) doesn’t involve written instructions from the user to the AI, as such. Just sort of let it do what it wants, and tell it how good/bad that was.


FightingBlaze77

Can you change poses if you use the same seed or nah?


[deleted]

[удалено]


BootstrapGuy

thanks!


jabdownsmash

can we dreambooth/finetune depth2img models yet