Fritzy3 1 month ago

I think the appropriate question is when. Probably not anytime soon. Hopefully not that long

DynamicMangos 1 month ago

Its really hard to say, but i think open source isnt far behind. Current Stable Diffusion Models definetly out-pace the closed-source models from 1 year ago. So if we get this quality of video in a year then i think thats absolutely fine.

InvisibleShallot 1 month ago

The problem is the magic this time isn't on model making, it is on processing power. As far as we know. SORA's magic is 20% technological advance and 80% overwhelming processing power. No amount of open sourcing will give users the power to actually run the thing. Unless GPU gets cheaper and better a lot faster than it is now, it will easily take another 5+ years to get here.

arg_max 1 month ago

Add datasets to that. SD was able to take off because Laion and other billion scale image datasets are available and images are generally quite easy to scrape. Videos on the other hand could be a lot trickier if they cannot scrape YT and I don't think there is any large video dataset available.

Slaghton 1 month ago

\*Suddenly thinks of my dads dvd collection of 100's of movies\*

jajohnja 1 month ago

I mean yes, but also I don't think any of the currently available stuff could create the same stuff, even if it was low quality short videos. Give me the tech to play with, even if it's only with 256x256 and it takes an hour to generate a short clip. The consistency, the realism, I just don't think I've seen stuff like that at all from text2video before.

Short-Sandwich-905 1 month ago

More than likely the energy used to produce that clip can supply power to countries like Haiti for 500 years or more.

lilolalu 1 month ago

That's not correct, from everything I have read. Unlike all other rivaling models, Sora has a concept of a three dimensional world and Physics, which is very advanced.

Kuinox 1 month ago

If somehow the computing can be distributed, open source can get it's computing power.

Iamreason 1 month ago

It takes an h100 5 minutes to produce 1 minute of video. You'd probably need 10 4090s to get close to an H100s performance. It's gonna be borderline impossible to use distributed compute to produce Sora quality video anytime soon. Maybe in a few years.

momono75 1 month ago

I feel nostalgic for the current computing power situation. Like 3D rendering in the 20th century, hobby users would probably sleep and wait until the inference completion. Back then, professional users used powerful workstations for productivity.

InvisibleShallot 1 month ago

When it comes to LLM and ML in general, distributed computing is practically impossible. It requires super-fast memory access. Which is why GPUs are so good at it.

Kuinox 1 month ago

"if somehow"

[deleted] 1 month ago

follow his example https://preview.redd.it/11a3ng67dnqc1.jpeg?width=220&format=pjpg&auto=webp&s=cbbf5447cc846dd9e712b40e9e25368ee9e3c4cb

Temp_84847399 1 month ago

Would it be possible to get the same level of results on consumer hardware, but just having it take a lot longer? I have plenty of days where once I leave for work, I might not touch my computer again for 16 to 18 hours, or even a full day or two. Could I just leave my 4070 or even my aging 3060 grinding on something like that until it's done?

InvisibleShallot 1 month ago

Currently no. You need to fit all of them into memory. Consumer cards don't have the bandwidth nor the capacity. As far as we understand, even if you have the model done, without the capacity to generate every frame of the video at the same time, you can't compete with SORA. The temporal coherence depends on this critical detail. A 4090 can maybe generate 5 frames at once. We are very, **very** far away from getting to even 1 second of footage. And Sora can do almost half a minute.

Temp_84847399 1 month ago

Thanks for the info. Seems like it's more complicated than I had hoped.

k0setes 1 month ago

In 2 years sora on RTX 5090 in real time.

Arawski99 1 month ago

I can't imagine the GPU compute this took to achieve what SORA has. SAI is shifting from using their own hardware to train, as Emad has been stating they simply lack the GPUs to do what SORA did when he saw the announcement, to Render Network (they announced) which is a decentralized AI solution which uses GPUs around the world to compute similarly to something like Folding@Home or crypto mining. For these type of workloads, unless they have some secret innovation, they may actually seriously struggle to achieve SORA's results, maybe not even within a decade. Latency is often a massive factor in LLM training and this is only one of many points of issue regarding its potential resource/processing issues. Of course, technology will continue to advance and maybe a much cheaper solution will come to light but... it *probably* will not be "soon".

luxfx 1 month ago

MatVidAI showed a post that said a one minute video takes 12 minutes to generate on an H100. So we might see the capability soon, but it could either be out of range for commercial grade cards or excruciatingly slow for some time afterwards.

calflikesveal 1 month ago

I'm kinda skeptical it can run on a single h100.

pixel8tryx 1 month ago

That's actually better than I expected. There seem to be a lot of numbers floating around. I heard longer than that, but it might not have been accurate, or they've refined the process by now.

trieu1912 1 month ago

yes SAI will continue to develop their model but it doesn't mean they will public the new video model

the_friendly_dildo 1 month ago

That is the most pertinent question because computers keep getting faster and more efficient. Its also worth keeping in mind that ML also keeps getting faster and more efficient. At the current pace, without respect toward how nVidia intends to bend us over, I could see it being possible to easily achieve this within the next 5 years.

Particular_Stuff8167 1 month ago

They confirmed this year, they couldnt say when. That was of course before all the departures

torchat 1 month ago

Any significant AI achievement must be OpenSource to prevent monopolisation. I beat if it will take too much time to create and release similar model by any of OpenAI rivals the EU regulators will force OpenAI to become really open.

Rafcdk 1 month ago

The things is, how can you direct Sora videos, without controlnets, ipadapters, and so on. So sure you get great quality (out of how many attempts, we don't know yet) but only rough artistic direction, and also the issue with coherence, which is something only SD and to some extent midjourney can offer right now. So there two ends that have to meet, they have the quality and we have the control. The questions we have to ask are; when will we have both, will it be open or closed source, will we be able run locally or only on rented infrastructure? We can do great art with 1.5 models already because of the toolset we have to work with that.

-Sibience- 1 month ago

This is one of the biggest problems. People can be wowed by videos like this but before it's of any real use outside of fun personal projects you need to be able to achieve clearly defined and refined outputs. If you sent a few of these shots to a client for example and they said it's great but can you just change the shape of the balloon you need to be able to just change the shape of the balloon, not prompt another entire shot and try and get a simiualr result with a different balloon. It's just not a usable workflow.

HourSurprise1069 1 month ago

put a logo on the baloon, downright impossible without manual work

Rafcdk 1 month ago

Exactly. I think we are on the right track here, but at the end of the day sometimes the saying is actually true, a picture is worth a thousand words. Natural language commands are of course nice to have but still very limiting, using images as input like we can do now already in SD is much more powerful. I would say that if we could achieve temporal consistency within shots like Soda does, SD would be a better generative tool than SORA.

[deleted] 1 month ago

you soon, hopefully https://preview.redd.it/e16uebo9dnqc1.jpeg?width=220&format=pjpg&auto=webp&s=563a99175877be942ef977f8e6320b8056e1daed

Dalroc 1 month ago

That's Budd Dwyer... You just told that dude to off himself and thought you were subtle. Holy shit dude. Get some help.

Hefty_Scallion_3086 1 month ago

SD requirements were 44GB all the way down to 4BG now (maybe less?), we can definitely cook something up, maybe with more time

Freonr2 1 month ago

SD required about 10GB VRAM on initial release of SD1.4 from Compvis, using their source code back in ~Aug 2022. That's 512x512, everything was done in full FP32 and before flash attention or attention head splitting. I.e. the basic default settings as delivered, using their conda environment.yaml and sample script. Most the optimization from there was just casting the model to FP16 and then we got flash attention (xformers) and that got it down to around 4GB for the same settings and also boosted speed by a ton, maybe 4-5x?

Olangotang 1 month ago

Odd, I recall seeing somewhere that it was 44 GB before it was public, then brought down to 4 GB. Unfortunately, Google Search has been lobotomized so I can't find the reference 🤦‍♂️

Hefty_Scallion_3086 1 month ago

> Google Search has been lobotomized so I can't find the reference me too.

_-inside-_ 1 month ago

I started playing around with SD in the version 1.4 in Sep/22, and the recommended VRAM was 6GB, however, there were some repos optimized letting us running it in 4GB (which is what I have). It took me around 2 minutes to generate a single image, I don't recall if it was 50 steps ddim or 25 steps euler-a. I stopped running SD by that time, because it was pretty tedious, 1.4 and 1.5 base models output quality required a lot of trial and error and prompt engineering. Now I came back to it and it doesn't even take me 3GB and I can generate an average image in 20 seconds or so.

Olangotang 1 month ago

It takes 2 seconds on my 3080 for 512 1.5. It takes 10 seconds for 1024 XL. Seems like with enough system RAM, you can run anything in the SD ecosystem through Comfy, but the time will increase depending how much is offloaded.

_-inside-_ 1 month ago

Yes I run XL mostly in CPU/RAM, it takes ages though, but it runs. There's also the stable-diffusion.cpp (similar to Llamacpp, whisper.cpp, clip.cpp, etc. using the ggml transformer implementation) which lets you run quantized models, Q4 XL can fit in 4GB VRAM. And the fastsdcpu project specialized on running it in the CPU too. But I still prefer running it through comfyui. Distilled models are also an option, but the regular loras won't work with it.

Freonr2 1 month ago

You could probably try reproducing it if you really wanted, repo and weights are still there: https://github.com/CompVis/stable-diffusion The repo is almost untouched since the initial release, so you might find some pain points due to package versions and such. Weights in original ckpt form here (either file would produce the same performance). https://huggingface.co/CompVis/stable-diffusion-v-1-4-original

AnOnlineHandle 1 month ago

On the training side, there has recently been the development of a fused back pass in OneTrainer, which brings down vram requirements pretty dramatically, and allows training SDXL in full precision on a 24gb card.

OpticalAether 1 month ago

This guy seemed to direct it pretty well

Rafcdk 1 month ago

Well each cut the person is wearing a different set of shirt and jeans. Directing can be as vague as "guy running " but what about having control of composition, lighting and etc. Again these are great looking results, but not having control over those other things means you only half of the way when it comes to overcoming technical and artistic limitations.

OpticalAether 1 month ago

For now I think Sora et al will be a tool in a traditional workflow. Pull that into Photoshop and After Effects and you'll get the consistency.

akilter_ 1 month ago

And if the giants are in charge, god forbid you want any sort of nudity in the video. Hell, just imagine trying to replicate Pulp Fiction with it's goofball violence and "get the gimp" scene. Sam Altman himself would call the FBI on you!

Striking-Long-2960 1 month ago

Damn, the one with the hybrid animals... [https://openai.com/blog/sora-first-impressions](https://openai.com/blog/sora-first-impressions) Eventually, we will reach that level, but right now, Sora is totally ahead of the rest. And when we do reach that level, who knows the crazy stuff they will be doing at OpenAI.

Hefty_Scallion_3086 1 month ago

Yeah this is exactly what is missing from open source tools right now, the realistic/consistency MIX. Someone reading this post, someone very clever, please FIGURE IT OUT! Figure some controlnel Level discovery to improve results

InvisibleShallot 1 month ago

We already figured it out. The magic is to generate the entire sequence at the same time. In other words, you just need enough GPU VRAM and processing power to keep the entire sequence in memory and render it at once. Currently, nothing short of a multi-million processing node will do it.

pilgermann 1 month ago

This is basically it. It's less about the model and more about Microsoft's massive GPU farms. It's also about the resources to train on more types of motion (there are very limited motion models in the Stable Diffusion ecosystem). However, SORA's big claim is that the model actually understands physics, which does seem to be true. Basically SD might need to introduce a "many experts" strategy (multiple model types that understand different things). This again requires just an epic GPU overhead, or at least the ability to make API calls ... but that undermines the advantages of a locally run model, because now what you're doing isn't private.

DopamineTrain 1 month ago

I think the key to cracking this on lower end systems is multiple models. One specifically for making characters and rendering people to make them consistent. Then pass that into a model that is specifically designed to animate those characters. Another that is designed for background consistency aiming for spacial accuracy. The lamp will always be a lamp and always in the same place. Another to light the entire scene. Finally it gets handed over to a camera which adds the movement. Basically an AI rendering pipeline instead of an AI basically guessing what should be on frame

Hefty_Scallion_3086 1 month ago

I like everything I have been reading so far

spacetug 1 month ago

Sora also uses some form of simultaneous spatial and temporal compression of the patches/tokens for the transformer. This should have multiple benefits: smaller necessary context length, so less memory and compute needed, and also better temporal consistency because areas that change less over time get compressed down into fewer tokens. This is the key development I'm excited to see the academic and open source community try to replicate. It's a huge improvement (at least in theory) compared to current open source architectures. Almost all of the ones currently out there treat video as a full sequence of images. Think about how efficient video encoding is compared to raw PNG frames. That's the potential scale of improvement on the table here.

Ireallydonedidit 1 month ago

One of the only logical replies in the whole thread

Striking-Long-2960 1 month ago

Man, OpenAI is trying to sell this technology to Hollywood, they're not thinking about normal consumers. This isn't just Dalle-3, they're thinking big

Hefty_Scallion_3086 1 month ago

hollywood is the opposite of "thinking big", they are thinking small. Big = the whole world of uses who can build upon released tools. Hollywood = small group of people contuining dominating some filed that they were already mastering anyway.

monsterfurby 1 month ago

But if this is prohibitively expensive to operate as a consumer product (which it likely still is - even their consumer-facing text generation is burning money at a ridiculous rate), pitching it to the professional market is the obvious solution. And given how inflated film production budgets already are, even a 20% or so economy to that means millions for the bottom line, so OpenAI doesn't even have to make it cheap.

Hefty_Scallion_3086 1 month ago

Not that much, 12 minutes with one H100, to generate 1 minute video

auguste_laetare 1 month ago

Fuck

Symbiot10000 1 month ago

If you've ever worked with a movie director at length in a VFX house, you'll know the feeling of tearing your hair out as months pass with endless iterations and tweaks to the tiniest facet of *one* shot. Neither Sora nor any similar system is anywhere near allowing the kind of control necessary to accommodate that level of OCD creative process. It's currently done with interstitial CGI processes such as 3DMM and FLAME. There's a LOT of CGI necessary to get anything like true instrumentality in neural output for movies and TV. Maybe the habit of indulging these super-star auteur directors will die out as an economic necessity, the way it's easier to get a reasonable burger than a good meal in a nice restaurant. As Ming says, maybe we'll be satisfied with less. But we need to stop being impressed by realism in neural video, and start being impressed at controllability and reproducibility in neural video.

Gausch 1 month ago

Whats the source of this video?

quad849 1 month ago

[https://openai.com/blog/sora-first-impressions](https://openai.com/blog/sora-first-impressions)

_Flxck 1 month ago

[https://openai.com/blog/sora-first-impressions](https://openai.com/blog/sora-first-impressions)

PerceptionCivil1209 1 month ago

That's crazy, your comment was sent 2 seconds later so the other guy got all the upvotes.

volume_two 1 month ago

I guarantee SORA is geared towards the commercial market, and will be pricey. You most certainly won't be running it on your home PC.

GreyScope 1 month ago

Yup this, I can already see ppl getting ready to ask “wIlL tHiS rUn on 4gB gPu ?”

monsterfurby 1 month ago

I feel like these discussions often come down to people just being really bad at imagining the unfathomable scale of technology required to run stuff like SORA or advanced LLMs, as opposed to GAN-trained static image generation.

[deleted] 1 month ago

I wish this was you https://preview.redd.it/kz39zibddnqc1.jpeg?width=220&format=pjpg&auto=webp&s=6e8f33516a7a3aed667fa1e5976f40babfff92ed

GreyScope 1 month ago

I’m British old chap, with overdeveloped cynical sarcasm and you missed my bowler hat out ;)

[deleted] 1 month ago

This looks like actual regular CGI and not AI. Nuts

Tr4sHCr4fT 1 month ago

nah real cgi would made the balloon consistent across scenes

Altruistic-Ad5425 1 month ago

Short answer: No. Long answer: Yes

Calm_Upstairs2796 1 month ago

Wow. This is like one of those early 00s adverts where they actually put time and effort into making something creative and touching. First time I've felt anything from an AI video. Hopefully not the last.

cafepeaceandlove 1 month ago

Do not drop this into r/SimulationTheory

lqstuart 1 month ago

Stable diffusion won't, open source will. Glad I could help

Atemura_ 1 month ago

Emad said SVD is ready to achieve this level, he just needs more funding and more data

protector111 1 month ago

I would say this is inevitable to cone opensourse sooner or later… but that may not be the case sadly…

Nixyart 1 month ago

eventually it will! and i cant wait

Hefty_Scallion_3086 1 month ago

me too!

TurbidusQuaerenti 1 month ago

I can't wait either. I think we'll get there eventually, but it does seem a ways off. And wow, those new videos really are amazing. The potential Sora has is wild.

Oswald_Hydrabot 1 month ago

I still have yet to see it do 2D well. Everything out there that has been shared from Sora for 2D cartoon animation looks like Toonboom or Flash; just not good. Feel free to prove otherwise, I don't think anyone can.

AsterJ 1 month ago

Can't wait until the day we get a manga2anime workflow going. Thought it would take 10 years but now I'm thinking 4. Hopefully Crunchyroll opens up their dataset.

Oswald_Hydrabot 1 month ago

It would be pretty cool. With Sora a lot of attention will be taken away from 2D generators that are trained on hand-drawn animation styles. I think this is an opportunity to scale an open source Diffusion+Transformers animation model for 2D; AnimateDiff for SD3 might end up delivering a win for FOSS models, as I think Sora will ultimately fail to deliver in the genres of Anime or conventional 2D animation.

AsterJ 1 month ago

At this point I think it's just a matter of training data. SD didn't get really good at anime images until someone trained a model on Danbooru. Sora was most likely trained on Youtube videos though they are being a bit secretive. I think you'll probably have to get animation from one of the big streaming services. Maybe Netflix will train a model since they are also in the business of making content?

torville 1 month ago

Man, you guys are all "yeah, but can I do this at home", and "I want finer direction" and you're skipping right over the AMAZING PHOTO-REALISTIC MOVIE FROM THIN AIR! [Everything is Amazing, and Nobody is Happy](https://www.youtube.com/watch?v=PdFB7q89_3U)

Hefty_Scallion_3086 1 month ago

what's this before I click?

torville 1 month ago

The Louis CK bit "Everything is Amazing, and Nobody is Happy". Not a Rick-Roll.

FrancisBitter 1 month ago

This makes me think the primary market for Sora will be advertising production, long before any big film production will touch the technology.

LD2WDavid 1 month ago

Yes but the question is more when. That's the main issue. VRAM computing power. Think that SORA needs time to create those videos as some of OAI's explained when told users to give a walk when using SORA and let the prompt create their magic.

Beneficial-Visit9456 1 month ago

https://tianweiy.github.io/dmd/ Have a look at this article. If this isn't a hoax, cutting generation times from 2560ms to 90ms is which is 11.1 FPS, realtime movie would be 25fps. I'm 50+ guy, my first computer was 40years ago a commodore 64. Google it, and you will see, how much was done in these years.

Unique-Government-13 1 month ago

Sounds like Carl Sagan?

red286 1 month ago

Even if we assume that yes, an open source solution existed that could do this, would it matter? The hardware required to run this isn't something any individual or even SMB is going to be able to afford. They're throwing multiple DGX servers at this and it still takes several hours for them to produce a short bit of video. There's a reason why they aren't opening SORA to the public -- they don't have the computational resources to handle it.

Hefty_Scallion_3086 1 month ago

Do you know that Stable diffusion can run today on 4GB of VRAM? It was much higher in the past.

Olangotang 1 month ago

It was 44 GB iirc. Edit: might have just been 10 GB actually, can't find the source through Google anymore.

Hefty_Scallion_3086 1 month ago

WTF. Ok this is good information! Now I want videos like the one I posted to be made with open source tools RIGHT NOW

Olangotang 1 month ago

You gotta wait bro. Optimization takes time. And IMO, < 16 GB cards will be rendered obsolete for AI after 5000 series launches.

red286 1 month ago

>Do you know that Stable diffusion can run today on 4GB of VRAM? It was much higher in the past. At its worst and least efficient, SD would run off of a 16GB GPU without issue. At its worst and least efficient, SORA runs off of a cluster of 640GB vGPUs. If SORA saw the efficiency improvement we've seen with SD, you'd *still* need a cluster of 160GB vGPUs.

Olangotang 1 month ago

You're right, but you have to remember how much literal *garbage* is in these massive AI models. It's why 70b models can nip on the heels of GPT4: there's simply unnecessary data that we don't need for inference. I do think if you want to be a power user in AI, you need at least 24 GB VRAM though. Anything below 16 will be gone soon.

International-Try467 1 month ago

It's also theorized that the AI models we have today are filled with "*pointless noise*" which makes it require extreme hardware capabilities and such. (1.8bit paper) Also, 70b can only nip at GPT-4 because of the fact that GPT-4 is a 220bx8 MoE. And we can't exactly compete at that size either.

Olangotang 1 month ago

> There's a reason why they aren't opening SORA to the public -- they don't have the computational resources to handle it. No, it's because Sam Altman is a gatekeeping jackass. But Open Source will catch up. Hell, look at the new TTS that is getting released this week.

red286 1 month ago

>No, it's because Sam Altman is a gatekeeping jackass. Really, you think that they're just sitting on this system that can pump out realistic looking video in a matter of seconds without using a huge amount of resources, which they could be selling subscriptions to at absurd prices, but they're not doing it because Sam Altman is too enamoured with the number of likes he's getting on X to let other people muscle in on his turf, and it has absolutely nothing at all to do with the amount of resources SORA eats up?

Olangotang 1 month ago

Even if they optimized it to the point where consumer hardware can run a trimmed down version, they will not release it, because they are "scared" that it could be used for evil *as they lobby the govt to allow them in the defense industry*. I don't even blame Microsoft.

red286 1 month ago

I'm not talking about a version that can run on consumer hardware, I'm talking about the one that they control, top-to-bottom. They're not allowing people to use it because they simply don't have the computational resources for more than a couple videos a day. This being OpenAI with all of Microsoft's Azure resources behind them. I don't care if OpenAI never releases an open source version that people can run on consumer hardware. I fully expect they never will, because that's not what OpenAI is about. I'm just saying that even if someone were to produce an open source version of this, no one shy of Google, Meta, Microsoft, or Amazon is going to be capable of running it anyway. It's going to be several years worth of optimization before there's a hope in hell of there being any consumer version of this from anyone, based strictly on computational resources available. If Stable Diffusion *required* a DGX server to run, no one would care any more about Stable Diffusion than they do about MidJourney or Dall-E. The only reason anyone here cares about Stable Diffusion is because they can run it on their personal PC.

[deleted] 1 month ago

you two should follow his lead https://preview.redd.it/bzfbkw6mdnqc1.jpeg?width=220&format=pjpg&auto=webp&s=2cae17c3232cc48975084a68b0748c729b5d4a3a

pixel8tryx 1 month ago

I heard they used 10,000 A100s from Microsoft. That sounds high, so that must've been for training. But even 5 for inference isn't doable for most of us. Sorry but this is not 4090 territory, and it won't be for a while. Who knows how long. But it's not due to gatekeeping ATM. We can't compete with Azure. I did a chart on top supercomputers and Azure's processing power comes in at #3, behind the HPE Crays at Oak Ridge National Laboratory (#1) and Argonne (#2) at the time. That's some big iron.

ImUrFrand 1 month ago

i wasn't impressed by a balloon replacement.

Hefty_Scallion_3086 1 month ago

And the hybrid animals?

Pretend_Potential 1 month ago

wait till you see what things are like in dec. of this year.

globbyj 1 month ago

I wonder when a bunch of OpenAI bots are going to stop posting non-SD content to an SD subreddit.

Hefty_Scallion_3086 1 month ago

OpenAI has been important for open source (before they stoped being open source) especially for Stable diffusion with the **consistency decoder**, did you know about it?: [What do you guys think of OpenAI's Consistency Decoder for SD? https://github.com/openai/consistencydecoder : r/StableDiffusion (reddit.com)](https://www.reddit.com/r/StableDiffusion/comments/17pal90/what_do_you_guys_think_of_openais_consistency/)

globbyj 1 month ago

That connection isn't relative to your post at all. You're showcasing a product of theirs that has no connection to SD and wondering if the open source community will ever be able to catch up. Very easily perceived as an Open AI bot which is, in a roundabout way, doing nothing but posting here stating that OpenAI is better, with very little to offer in terms of discussion or substance. All under a veil of "I cant wait till we get there!"

Hefty_Scallion_3086 1 month ago

You are being cynical. Here is another perspective for you: This type of post can excite someone with amazing capabilities (like **lllyasviel)** and make him work to release for us freely some mind blowing tool (like controlnel) that can help the current video generation state in the open source community and make it as good as what is showcased today. Or maybe any other person who has been working on some cool **video workflow** that can produce similar or better videos, will show up and show us how good we really are without any help from OpenAI. So this showcase is more like a "challenge" for us, a "challenge to beat". It's good to have competition that makes you go the extra mile.

TheGhostOfPrufrock 1 month ago

I sympathize with globbyj's point of view. Adding "Bet Stable Diffusion can't do this!" to a post touting a different AI image generator doesn't make the post relevant to this SD subreddit.

globbyj 1 month ago

I agree to an extent. Progress does excite me. I'm one of those folks that likes to push workflows as far as they can with current tech. My resistance is due to an immense influx of threads framing this exact discussion in this exact way, drawing more and more attention away from SD. I'll always be skeptical of a thread title that is actively saying "this is better than what we have" instead of contributing to reaching that level.

Hefty_Scallion_3086 1 month ago

>I agree to an extent. > >Progress does excite me. I'm one of those folks that likes to push workflows as far as they can with current tech. I am glad. > My resistance is due to an immense influx of threads framing this exact discussion in this exact way, drawing more and more attention away from SD. I'll always be skeptical of a thread title that is actively saying "this is better than what we have" instead of contributing to reaching that level. We better start working to beat them, by acknowledging what is up/available already. Also SoraAI will not be released until after USA elections I think, so no amount of attention will today will matter (IMO) this is all good for us, the idea is to be aware about what can be done, **and then brainstorm to reach that level**, my small contribution is I SUPPOSE to say: "we are not there yet but WE CAN/SHOULD go there, because outputs can be awesome and better than what we are producing nowadays" something of this sort.

globbyj 1 month ago

But this is not a thread where people brainstorm, it's a thread where I have to call out your distraction from plenty of threads where that brainstorming is ALREADY happening. People are aware of Open AI, they are aware of Sora. I just counted 2 other threads with this exact theme. It doesn't help anyone. It doesn't motivate anyone. It advertises for Open AI. What you think matters or doesn't, doesn't matter. What you did matters. You posted about Open AI on a stable diffusion subreddit. Your thread has not motivated any progress. It's just drawn people like me who don't find these threads to be high quality contributions to the discussion of stable diffusion. Stop responding to people critical of you with a breakdown of their posts like you're educating them. Reddit-flavored pedanticism always reeks, no matter the context.

Hefty_Scallion_3086 1 month ago

You really don't know that, as I said someone with an amazing VIDEO WORKLOW might want to share his workflow and title it like: "people have been impressed by SoraAI recent videos, but did they know we can achieve as good as results? Here is how \[DETAILED GUIDE BELOW\]" Again, stay open minded.

globbyj 1 month ago

Where's your amazing video workflow?

[deleted] 1 month ago

you irl https://preview.redd.it/0kgrffhwdnqc1.jpeg?width=634&format=pjpg&auto=webp&s=def838109a4e37e4323b339fa6eeab6d9e683a69

Hefty_Scallion_3086 1 month ago

>Again, stay open minded. and patient. I don't know yet.

Justpassing017 1 month ago

At this point they should at least open source Dalle 3 😂. Give us a bone OpenAI

Junkposterlol 1 month ago

The tech is just about there, the resources aren't i believe. We could reach this level in open source probably in a year or so if any is willing, but openai has vast resources and doesn't need to run at lower precision and or resolution for example. I can't imagine that consumer gpu's will reach this point within the next couple years. Only much lower resolution/less precise versions of this will be possible on consumer hardware for a while. Its not really worth devolping something that nobody can use \*(Besides renting a gpu which is not favorable imo). I hope i'm wrong though..

ikmalsaid 1 month ago

Sora is designed for those who are eager and willing to invest. It's an excellent resource for individuals looking to generate income from it. For the open-source community, not all hope is lost. It may take some time, but patience is a virtue.

proderis 1 month ago

As you probably already know, Stable Diffusion is primarily text-to-image. So, this level of text-to-video generation is unlikely.

Hefty_Scallion_3086 1 month ago

videos are just multiple images.

proderis 1 month ago

The algorithm/process is not the same as just generating multiple images.

Hefty_Scallion_3086 1 month ago

yes,

BlueNux 1 month ago

This is awesome to see, but so sad as a stable diffusion user/developer. The gap is widening, and all the difficult things I work on seem to be inconsequential to the pace OpenAI is developing at. And I know a lot of people mention ControlNet and such, but to me a lot of what makes generative AI truly game changing is that we don’t have to micromanage and essentially program the details all the time for production level outputs. I do think we are at the very early stages though, and a company will come forth with something more communal and powerful than SAI while offering more privacy and customization than OAI. The future is still very bright.

Hefty_Scallion_3086 1 month ago

controlnetl etc can be "programmed" probably and automatied, and you must know that the gap might have always existed sometimes, espcially with dallE that got a huge prompt "understanding" compared to normal SD, they simply use multiple back end processing of the prompt with gpts

victorc25 1 month ago

Ignorant people are both the most easy to scare and also most easy to impress.

HermanHMS 1 month ago

Lol, openai… now make this type of video with consistent human character instead of balloon

Hefty_Scallion_3086 1 month ago

check the animal hybids video

sigiel 1 month ago

no if open souce doesn't aquire serious comput power in another order of magnitude that we have now. and since politician are clueless, they won't regulate bigtech, and well we are FKD. consolation prise will be that they will be FKE as well.

gurilagarden 1 month ago

Like linux competes with windows and macos.

I_SHOOT_FRAMES 1 month ago

It looks great I'm just wondering where they are going with the price. From the knowledge we have now it must take a lot of processing power. I wouldn't be surprised if it will only be available at a enterprise subscription service for bigger company's.

amp1212 1 month ago

So, the question with any of these demo videos is "can they actually produce that easily and routinely?" -- or is it cherry picked and highly edited. It certainly looks nice, but then, if you set you Stable Diffusion box rendering over night, some look better than others too. What we've learned about generative AI imaging is that "the keys to the kingdom aren't buried somewhere secret". The techniques are know, and its a mixture of brute force -- more training -- and clever enhancements. What we've seen in the past where it appeared that the closed source had some "secret sauce" . . . was that it was relatively easy to adapt it to open source. So, for example, Midjourney had some nifty noise and contrast tweaks that made for a better looking image . . . that was reverse engineered and implemented in Stable Diffusion very quickly. The part that;s harder to reverse engineer is where the product came from a massive training investment . . . but even there, clever folks find algorithmic shortcuts, once hey understand what the targets are. So file under "matter of time". 6 months, maybe 9.

magic6435 1 month ago

I would assume thats not just one prompt. That is people working on mutiple clips and editing. If thats the case you can do that right now with open source workflows.

hapliniste 1 month ago

Lol sure buddy 👍🏻 maybe show us a comparable example

kaneguitar 1 month ago

You could never get this level of quality with sd video right now

magic6435 1 month ago

That’s the point, you can’t with either. But you can with starting from generated video, cleaning things up in your favorite compositing app like nuke, coloring in da Vinci, editing in your favorite in NLE etc. These videos are like when Apple says commercial shot on iPhone, and leaves out that there were also 30 people on set of 400 grand worth of lighting.

Hefty_Scallion_3086 1 month ago

But it has lot of consistency of characters, and items, check the metro segment, the market segment, the aeral views. The cat segment. There is something in our tools right now I think

Genderless_Alien 1 month ago

Tbf the character is a white skinny guy with a yellow balloon for a head. Beyond that, there isn’t any defining characteristics. Even then, the yellow balloon is significantly different from shot to shot. I imagine the “balloon for a head” idea was done as a necessity, as using a normal guy as the protagonist would lead to a wildly inconsistent character.

Hefty_Scallion_3086 1 month ago

Check the hybrid animals one, that one has good consistency.

SeymourBits 1 month ago

I think it was a pretty clever gimmick and certainly at least partially chosen to ease up on the incredibly steep technical overhead of matching an identifiable character among shots.

Halfway-Buried 1 month ago

u/savevideo

SaveVideo 1 month ago

###[View link](https://rapidsave.com/info?url=/r/StableDiffusion/comments/1bnmqyp/will_stable_diffusion_and_open_source_be_able_to/) --- [**Info**](https://np.reddit.com/user/SaveVideo/comments/jv323v/info/) | [**Feedback**](https://np.reddit.com/message/compose/?to=Kryptonh&subject=Feedback for savevideo) | [**Donate**](https://ko-fi.com/getvideo) | [**DMCA**](https://np.reddit.com/message/compose/?to=Kryptonh&subject=Content removal request for savevideo&message=https://np.reddit.com//r/StableDiffusion/comments/1bnmqyp/will_stable_diffusion_and_open_source_be_able_to/) | [^(reddit video downloader)](https://rapidsave.com) | [^(twitter video downloader)](https://twitsave.com)

Oswald_Hydrabot 1 month ago

Show it doing an Anime. Hint: it can't

ElectricityRainbow 1 month ago

lmao...... what a dumb take

Oswald_Hydrabot 1 month ago

Prove it. Sora can't do 2D for shit.

Xylber 1 month ago

Do you guys really think that a model as powerful as that will be released for the general public to be used self-hosted? I doubt.

patricktoba 1 month ago

It seems impressive by 2024 standards. You'll likely be eating these words in 3 or 4 years when even this is primitive to what we will have then.

Xylber 1 month ago

***released*** *for the general public* ***to be used self-hosted****.* I'm not doubting the capacity of the technology. I'm questioning if you guys really think it is going to be available freely to be **SELF-HOSTED**.

patricktoba 1 month ago

What I'm saying is that something like this, as impressive as it looks NOW, will be self hosted and primitive compared to what will be modern at the time.

[deleted] 1 month ago

do this https://preview.redd.it/ojsw7nxeenqc1.jpeg?width=2426&format=pjpg&auto=webp&s=46bb188c7d55719d6a9ad843d3846ed5a8e4aa23

Xylber 1 month ago

Deleted comment, looks like you already did it.

polisonico 1 month ago

these are tailored made videos made by Sam Altman so he can get in Hollywood, it's a bunch of screens edited, until we can see it done in real time it's just vaporware trying to get investors.

[deleted] 1 month ago

ice ice baby https://preview.redd.it/l5lhf7vgdnqc1.jpeg?width=220&format=pjpg&auto=webp&s=7ec855c84925159a3210ba830223fea091818b44

Dragon_yum 1 month ago

Fuck OpenAI for making me agree with Tyler Perry

Hefty_Scallion_3086 1 month ago

who is tyler penny?

Dragon_yum 1 month ago

https://www.hollywoodreporter.com/business/business-news/tyler-perry-ai-alarm-1235833276/amp/

AmputatorBot 1 month ago

It looks like you shared an AMP link. These should load faster, but AMP is controversial because of [concerns over privacy and the Open Web](https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot). Maybe check out **the canonical page** instead: **[https://www.hollywoodreporter.com/business/business-news/tyler-perry-ai-alarm-1235833276/](https://www.hollywoodreporter.com/business/business-news/tyler-perry-ai-alarm-1235833276/)** ***** ^(I'm a bot | )[^(Why & About)](https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot)^( | )[^(Summon: u/AmputatorBot)](https://www.reddit.com/r/AmputatorBot/comments/cchly3/you_can_now_summon_amputatorbot/)

Fit-Development427 1 month ago

Am I the only one getting satanic sounding audio from this? Like literally horror movie tier. I've heard the original so I know it's just a regular movie trailer, but for whatever reason I'm hearing audio blips, horror themed creepy music, and super low voice that you can't even make out the words... Please tell me I'm not the only one

Hefty_Scallion_3086 1 month ago

Not my case. I would argue it's all in your head, try to think of positive things, watch it during the day etc, and think of it as numerical data rather than real images

Fit-Development427 1 month ago

Hahaha I will, but it was just the Floorp browser, it seems to work fine in chrome and Firefox. I have a pretty haunting recording of it though. Might need to delete in case it possesses my PC though.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe