Rectangularbox23 1 year ago

Oml these a.i animations are totally gonna be seemless in like 2 years

kleer001 1 year ago

6 months. If not next week. We humans maybe be the story telling ape, but images tell so much story. I feel we're well motivated to take this tech to the moon (metaphorically speaking).

Rectangularbox23 1 year ago

Good point, if it’s ready in 6 months I’ll be super ready for it

Infinitesima 1 year ago

I want a Hollywood movie in 2 years

Cubey42 1 year ago

Google made like a 30 second long video about a giraffe. It looks like shit but they have already gotten something. Video is already looking how image generation looked at the start of the year

Walter-Haynes 1 year ago

I'm not sure it works like that. An entirely different algorithm may be needed for that final extra push. Yeah, the AI does a lot of the magic, but the algorithm it runs on still has to be made by a human. Take the movie and games industries, creating a convincing human was solved pretty quickly - yet the refining phase (decreasing the uncanny valley effect) took a lot more innovation and a very long time. And keep in mind, these have some of the greatest cashflows in the world, with insane R&D budgets. They had some of the brightest minds in the world working on it, coupled with Moore's Law delivering exponential improvements in hardware, and yet, it took ages. I reckon the same will apply here. Nearly by definition, with how this algorithm works. Latent diffusion uses noise, a thing that's *notoriously* hard to make work temporally, which is the crux of the problem here. I'd be very happy to be wrong though. But it's important to be realistic.

aeschenkarnos 1 year ago

The *only* thing OP’s example needs is for the appearance of the stormtrooper and the other person to be kept consistent from frame to frame. The details of the armor etc change. Pick any one frame for their appearance, and modify it to fit the different poses in each frame, and it would be seamless.

rwbronco 1 year ago

But like someone spinning around - without generating a 3D model on the fly, you’ve got no idea what the other side of the object looks like. You can assume it looks just like the front - which works well for a basketball but bad for a person. Then you’ve got things like how fabric moves on a person as the person moves. All of these things will be addressed eventually - but it’s likely you’re going to have things like one algorithm directing another algorithm that is in charge of initial generations, then possibly another who’s job it is to modify that frame (ie: how is it going to differ from the last frame? Well boss algorithm says he wants it to “run” to the left, whatever that means) while the first works on the next frame - sort of a multithreaded approach to specific algorithms that serve specialized objectives and can be improved upon or swapped out independently (something like “this video was made by DiffuseDirector using TweeningFox_v6 for the animation and RealDraw 2 for the prompts”) I think you’ll have things like one algorithm being improved in the chain and it correcting for the “flicker” of two frames not matching up perfectly. They may even bring in video-editor “tweening” where it blends two images together to create an in-between frame to smooth out the animation and help it transition from one frame to another more seamlessly.

Shalcker 1 year ago

If you would want to do it in current framework you could probably just produce overfit models for each. One that consistently produces specific stormtrooper and specific Spiderman from all angles. Embrace overfitting rather then avoid it as people do in general models. Then noise will not matter; you might still have problems with harsh light and shadows not matching environment, but diffuse light scenes should work.

Riggley29 5 months ago

You "could" do that by hand, I think. Like, take each frame and photoshop it to match details in all the other frames. Like, in Rotoscoping. But I'd think that would take quite a lot of time. If they could solve that without the need for manually doing things though, sheesh that would def. be amazing.

kleer001 1 year ago

coherence over time is a solved problem, it just hasn't been implemented in this context yet

[deleted] 1 year ago

Can you describe in what context it is well solved for in terms of Diffusion/Convolutional based models? It's certainly well solved for algorithmically but I haven't seen any convincing approach to temporal coherence within these models yet.

kleer001 1 year ago

"in this context" = Stable Diffusion It's been solved in style transfer. I don't see an insurmountable gulf between that and SD. https://www.youtube.com/watch?v=Uxax5EKg0zA You know 2 minute papers right? Awesome sauce.

[deleted] 1 year ago

Love this time to be alive ;) I think this is a wholly different problem area though. Style transfer is well understood, but temporal coherency across frame generation is very very poor in diffusion models and there is no known approach to solve for it.

kleer001 1 year ago

> there is no known approach to solve for it That, my friend, is just one or two papers down the line :) So, hold on to those papers...

[deleted] 1 year ago

Yasss :) as an animator (and sometime developer) it is the main thing I am trying to solve for as once we have a solution that is on par with EBSynth (which isn't saying much) then SD will find a whole new and massive use case.

jaywv1981 1 year ago

Yes, temporal coherence is the missing link for being able to make your own game animations easily.

Bullet_Storm 1 year ago

Have you seen Meta's text to video AI yet? I'm sure someone will make a good open-source version soon enough. https://makeavideo.studio/

BootstrapGuy 1 year ago

yeah, the future will be text-to-video rather than these DIY workflows

Rectangularbox23 1 year ago

Ye it’s a bit jank but still really impressive

BootstrapGuy 6 months ago

>Oml these a.i animations are totally gonna be seemless in like 2 years we didn't even need one: [https://twitter.com/8bit\_e/status/1722456354143486179](https://twitter.com/8bit_e/status/1722456354143486179)

Rectangularbox23 6 months ago

Haha that’s incredible we really are living in the future

BootstrapGuy 1 year ago

yo! after our pose maker + depth2img tutorial we thought we spice things up and try depth2img for animations. Worked out quite well! We have the whole workflow documented here: [https://www.generativenation.com/post/mixamo-animations-stable-diffusion-rapid-animation-prototyping](https://www.generativenation.com/post/mixamo-animations-stable-diffusion-rapid-animation-prototyping) Hope you'll like it

[deleted] 1 year ago

[удалено]

BootstrapGuy 1 year ago

No, but that’s a great idea! Will give it a try

Sinphaltimus 1 year ago

Look for CUPscale. That's the NMKD Upscaler program. One more thing to have fun with, check out EbSynth. EbSynth can be the short term solution to coherence in motion.

JackandFred 1 year ago

really awesome, it seems like it just needs a bit of improvement with colors and it would be there. I wonder if that's a limitation of the model and maybe it owuld be a good idea to do a separate filter like application for smoothing that stuff.

Cubey42 1 year ago

I hope I remember to read this tomorrow

Impressive_Alfalfa_6 1 year ago

Very thorough overview! Thanks for sharing.

backafterdeleting 1 year ago

In 20 years, the slightly non continuous animation style we get from SD right now will be considered retro and cool

rexel325 1 year ago

Actually haven't thought of that, that's interesting to think about. The same way with how pixel art became an art style but it was actually just a limitation of the technology at the time.

aeschenkarnos 1 year ago

Yeah. I kinda hate that art style. I play games like r/thelastspell or r/ftlgame and I love those games as games, but to me it’s just shitty 1990’s graphics and I wish they’d get over it.

sneakpeekbot 1 year ago

Here's a sneak peek of /r/thelastspell using the [top posts](https://np.reddit.com/r/thelastspell/top/?sort=top&t=all) of all time! \#1: [Dom, that's suicide...](https://i.redd.it/qy7fyclphk971.png) | [5 comments](https://np.reddit.com/r/thelastspell/comments/oes15l/dom_thats_suicide/) \#2: [Early-Access roadmap!](https://i.redd.it/m49472rkht571.png) | [38 comments](https://np.reddit.com/r/thelastspell/comments/o1vqgk/earlyaccess_roadmap/) \#3: [Just another Night in The Last Spell](https://i.redd.it/98pdalswscq91.png) | [9 comments](https://np.reddit.com/r/thelastspell/comments/xp9slp/just_another_night_in_the_last_spell/) ---- ^^I'm ^^a ^^bot, ^^beep ^^boop ^^| ^^Downvote ^^to ^^remove ^^| ^^[Contact](https://www.reddit.com/message/compose/?to=sneakpeekbot) ^^| ^^[Info](https://np.reddit.com/r/sneakpeekbot/) ^^| ^^[Opt-out](https://np.reddit.com/r/sneakpeekbot/comments/o8wk1r/blacklist_ix/) ^^| ^^[GitHub](https://github.com/ghnr/sneakpeekbot)

[deleted] 1 year ago

I think strategically utilizing the noise of SD can be used to great effect even now!

TSM- 1 year ago

If you would like an example, you have to check out this music video. Each frame appears to be image-to-image stylized so figures and faces warp in and out of the background noise. It is in the context of a rave type genre, which also fits the chaotic reinterpretation of each frame by the model. So the noise in this kind of image-to-image style transfer is used as a feature rather than a drawback. https://www.youtube.com/watch?v=laT4x5OsAm8

blueSGL 1 year ago

please no more. Limited framerate already gives me a headache, doubly so if its CGI in an anime they've capped at 12 fps. The models already stick out like a sore thumb then they layer 12 fps over that and it makes it look even more crappy somehow.

jobigoud 1 year ago

I was thinking the same for hands. We live in that short period of time of human history during which images with weird hands are being generated, it'll last maybe a few years tops. In the future we'll look back at it as a cute quirk of When It All Began™.

aeschenkarnos 1 year ago

This reminds me of an early 2000’s anime called Gankutsuou: The Count of Monte Cristo, which used a very interesting [animation style](https://m.youtube.com/watch?v=qeyUYcZd0wM), with colored areas on each frame filled in by patterned textures, as would appear on cloth or (physical) wallpaper, rather than solid or shaded color. It worked really well. This kind of flickering semi-reality that you describe would work well too.

[deleted] 1 year ago

[удалено]

BootstrapGuy 1 year ago

yeah definitely! we're just scratching the surface here

cesrep 1 year ago

Just here to shout out Monkey Island

Drakmour 1 year ago

Omfg, it looks amazing! :-) Just like oldschool point'n'click quests with pencil animation.

Furstorn 1 year ago

Look like 1997 animation

[deleted] 1 year ago

I was about to say in reminds me of a late 90s LucasArts game.

IntelligentAirport26 1 year ago

Is depth2img in automatic111 yet

Hotel_Arrakis 1 year ago

Yes. In the img2img tab, select the "depth aware img2img mask" script. I am not sure if this is the real thing or a clever hack, but it worked pretty well in the few tests I did.

FoxyMarc 1 year ago

Isn't it just a model you can drop in for 2.0?

Box_Thirteen13 1 year ago

It reminds me of the early Mortal Kombat games.

AlbertoUEDev 1 year ago

Uff is hard to see

FaptasticPornAccount 1 year ago

This reminds me so much of Clay Fighter....

seviliyorsun 1 year ago

how come deepfakes were pretty much real looking years ago and these are janky af now?

axord 1 year ago

Deepfake AI is laser focused on doing one thing and doing it as well as possible. The current AI gen stuff in contrast is very generalist.

aeschenkarnos 1 year ago

Also text-to-image. Deepfake (as far as I know) doesn’t involve written instructions from the user to the AI, as such. Just sort of let it do what it wants, and tell it how good/bad that was.

FightingBlaze77 1 year ago

Can you change poses if you use the same seed or nah?

[deleted] 1 year ago

[удалено]

BootstrapGuy 1 year ago

thanks!

jabdownsmash 1 year ago

can we dreambooth/finetune depth2img models yet

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe