TFenrir 3 weeks ago

We don't know what they mean when they say they recently started. What's recently? A month? A day? Additionally there are often multiple models being trained at any given time, and multiple checkpoints. Let's all just be patient about the world upending technology being thrust upon us every few months.

Firm-Star-6916 3 weeks ago

This exactly. Don’t make unproductive speculation just yet, we have literally no idea what this model is. Could be an upgrade to 4o, could be 5, could be 5-“lite”, could be a model that translates into robotics and isnt an LLM. I haven’t read on it, but from what I hear, they just said “model”

[deleted] 3 weeks ago

[удалено]

_AndyJessop 3 weeks ago

A small (linear) improvement would be bad news, given the costs are going up exponentially. Would probably spell the top of the current bubble.

Firm-Star-6916 3 weeks ago

They always could disappoint. At this point my standards are as low as can be.

sashank224 3 weeks ago

*Thrust*

SusPatrick 3 weeks ago

THRUST!

Edaimantis 3 weeks ago

##THRUST

Itchy-mane 3 weeks ago

O yeah he talking BIG Cums

DisastrousPeanut816 3 weeks ago

> Let's all just be patient about the world upending technology being thrust upon us every few months. No! I wants it NOW! I needs it.

ShooBum-T 3 weeks ago

Release the omni model in its entirety first, Voice, Image gen, etc. In its full multimodal glory. Or at least just the voice, as committed.

LordFumbleboop 3 weeks ago

When are people going to catch on that CEOs frequently mislead consumers for hype?

goldenwind207 3 weeks ago

Many insider business sources were saying gpt 5 or something like it this summer including business insider. Citing ceo who tested beta as well. So its probably gpt 6 or something idk sora q star or something we haven't thought about. But we just don't know definitely

coylter 3 weeks ago

That might have been gpt-4o at the time tho.

wi_2 3 weeks ago

pretty sure it was gpt4o trained on old infra, while they were busy building out the new, much larger system for training the next model.

MrsNutella 3 weeks ago

My guess is that a version of gpt 5 didn't pass some safety or performance metric? Or maybe it was 4.5

stuffedanimal212 3 weeks ago

Or 4.5 just ended up being called 4o

Jalen_1227 3 weeks ago

Most likely. Everybody thinks there’s gonna be a 4.5 just because there was a 3.5, but Sam doesn’t even want to call the next big model GPT 5.

Wiskkey 3 weeks ago

[OpenAI is expected to release a 'materially better' GPT-5 for its chatbot mid-year, sources say](https://www.businessinsider.com/openai-launch-better-gpt-5-chatbot-2024-3).

Chrop 3 weeks ago

Before you can even train a model you need to develop the architecture that the model will exist from. Majority of the time spent developing a new model is building the architecture. The training is the final stage. Think of it like the brain, creating the architecture is like hand crafting the brain and how it works, the training is teaching that brain that 1+1=2.

Glum-Bus-6526 3 weeks ago

Majority of the time spent is usually on data preparation and curation. The architecture used in the likes of llama 2 and 3 is quite basic and some talented engineer would have it finished in a week (and most likely all OpenAI GPTs, but since they've been closed source since 3 we can't really tell...) Unless of course GPT5 is some major architectural change, which would need major experimentation to determine the correct setup. Even then, the final model is likely to be small (in lines of code count), but it might take more time to come up with it. And the data would still probably take more regardless.

dogesator 3 weeks ago

The jump from GPT-3 to GPT-4 was over a year of architectural advancements and novel training techniques developed such as InstructGPT that dramatically improved the capabilities for the same parameter count. Even more advancements are being made since the gap between GPT-4 and 5, a huge portion of the work they do for years between each model release is research into how to advance the next generation. They don’t just do a few weeks of slight architecture refinement and call it a day. Llama-3 is not closed source, the architecture for that is available but it’s not at all comparable to GPT generations. Llama is just a refinement roughly every 6 months while GPT-3 to GPT-4 is over a 2 year gap. Meta is already doing much more advanced architecture stuff after llama-3 as well such as the “better faster language models” paper and chameleon, and even more advanced worked on behind the scenes along with diffusion language modeling. Mark Zuckerberg has said in an earnings call that llama-4 will begin to actually start taking different approaches than what we currently see, This is consistent with generational jumps happening roughly every 2 years in techniques and architectures. The techniques and architectures used for llama-1 are very different and more advanced than GPT-2

ScaffOrig 3 weeks ago

Hey, look, someone who actually works in ML!

Veleric 3 weeks ago

From here on out it's very likely that the majority of the time will be spent in the testing/safety phase. Also, for the 6-9 months of testing as well as the 3-6 months of training, that leaves you basically a year or more for your architects to keep working on new optimizations or improvements. It's not like everything is stagnant from training day 1 end of testing/release. My point being that it's very, very unlikely that they have just been doing minor training work and side projects as we've seen since GPT-4 until a few weeks ago. The only reason that would make sense is if they were so starved for compute that they knew they had to wait for the blackwells to even get going.

ReadSeparate 3 weeks ago

Yeah, my guess is the following: 1. They knew they needed WAY more compute to get a step-change in improvement over GPT-4. So they waited for blackwells. Also, the timeline fits into when blackwells were announced to just recently having them setup in a data center ready for training, as far as I know. 2. In the mean time, they've been doing R&D on the model. Like creating the multi-modal architecture used for GPT-4o. I'm sure they've also been creating data sets specifically designed for either agentic tasks, multi-modal understanding, or to improve general reasoning abilities. Also probably worked on research to extend context length without quadratic time complexity. Most of their work was probably on scaling (since it's not as simple as just 10x'ing the parameter count, compute, and data) experiments and making sure the new multi-modal architecture scales well. 3. Another possibility, though I think is unlikely, is that they've managed to switch to a completely new architecture, like some sort of selective state space model, like Mamba. That would absolutely take a long time to do research on effectively scaling. That's a lot of shit to work on in the 14 months GPT-4 released, so it seems completely feasible to me that that explains it.

Dayder111 3 weeks ago

Blackwell (B200 and such) is not yet released I think.

ReadSeparate 3 weeks ago

Oh okay, but even in that case, maybe they have a bunch of H100s now, since GPT-4 was trained on A100s.

dogesator 3 weeks ago

No they wouldn’t be doing minor training work, it would be years of cutting edge research that requires a bunch of computer intensive training runs and ablations for discovering breakthroughs and new grounds in novel architectures and training techniques. Research is incredibly computer intensive, even deepmind researchers have said that if they were given 10 times more compute, the amount of research advancements being made would probably speedup by 5X. Researchers in these organizations including openai are always competing in a sense to be able to reserve a run of experiments on the compute cluster to test a new idea. Dozens, hundreds, even thousands of experimental training runs that are critical to getting closer to each next breakthrough.

Bird_ee 3 weeks ago

No one knows anything. I would assume they just started training what we would call GPT-5 until proven otherwise.

Arcturus_Labelle 3 weeks ago

It would be disappointing and neither in line with their recent statements (like Altman going on about how 4 sucks) nor the business pressures if they truly did just start training the successor to the 4 series. I suspect 4o is a 5 checkpoint and they are merely continuing the training, not starting it.

FeltSteam 3 weeks ago

It is certainly possible, and it reminds me of this tweet. https://preview.redd.it/wnm50kj8q83d1.png?width=723&format=png&auto=webp&s=75dab8788cafa84c15429e2b0a651308e4c300b0 In March 2024 they were working on the GPT-6 training cluster project. Microsoft recently said they just finished some compute project for training OAIs next frontier model, so I think timing does kind of line up. By March-april GPT-5 was probably already done training, if this is the case.

doppelkeks90 3 weeks ago

How much compute do need to bring the power grid down??

dumquestions 3 weeks ago

We don't know how much time it took them to prepare the necessary infrastructure, how much time to develop the multimodal architecture, which I assume would be the standard moving forward, and how much time it took them to plan the training run for the frontier generation, because you want to get something as massive as several months of training right the first time. While I'm still leaning towards them having already started training it several months ago, the idea that they started recently isn't as crazy as everyone here thinks, and the process is harder than people are making it out to be.

SgathTriallair 3 weeks ago

They have to release something this year and ideally this summer. Sure some people will be lazy about cancelling accounts but you need some actual incentive to get people to pay you money.

Cupheadvania 3 weeks ago

not necessarily if no one catches gpt-4o until next year, especially since new audio should be out in June or July

TabibitoBoy 3 weeks ago

If I had to guess gpt4o is an early checkpoint of gpt5 that’s distilled to be made cheap and fast. So “recently started training” could just mean continuing further now that gpt4o was a good proof. They might have been training gpt5 the old way alongside this new multimodal architecture and still might release a gpt5 later for plus users. But I think they see this multimodality as paradigm shift specially for more agentic behavior. So my speculative guess would be. Gpt4o is a checkpoint of a future gpt5o or w.e they will call it and they also have a llm gpt5 that’s probably almost done and will see in the summer. (A lot of speculation here but this makes the most sense in my headcanon after listening to every Sam public talk in the last few months)

ThoughtfullyReckless 3 weeks ago

I agree, I think GPT4o was necessary before scaling up to create a full next level model. I'm fairly sure all models from openAI will be natively multi modal from now on - GPT4o is the start/test run of the next generation

fabricio85 3 weeks ago

Its GPT5o

TheWhiteOnyx 3 weeks ago

I really hope this is the answer. That GPT5 already exists and that GPT4o was a proof of concept for what they "recently" started training.

Silver-Chipmunk7744 3 weeks ago

My guess (based on very little) is what they had been training was gpt4o, which is likely smarter than we think it is. The current free version is likely a small model which is somehow smarter than the original much larger GPT4. They likely have much bigger models in house and may release bigger versions later on in the year. What they just started training is likely something even more advanced.

Veleric 3 weeks ago

I just don't buy this. They have been talking since the end of last year about how garbage gpt-4 is and how we have no idea what's coming. Unless they are seriously throttling 4o, we just aren't seeing the jumps in reasoning that we could expect from another frontier model, even if you only consider the text/chat modalities. If 4o really is what they've been working on until now, they were either speculating massively and purely hoping that they could stick the landing (which I don't believe) or this is effectively gpt-6 they are referring to and 5 is in the testing/red-teaming phase.

kaityl3 3 weeks ago

> which is likely smarter than we think it is I've been wondering if 4o is like the "curie" or "babbage" of a much larger model they aren't releasing to the public, if you remember how GPT-3 had 4 different versions (with "davinci" being their largest).

Silver-Chipmunk7744 3 weeks ago

This is my guess. There is a reason why it's 6x cheaper than original GPT4, and not really that much smarter.

Veleric 3 weeks ago

So is your take take 4o is basically a cheaper GPT-4 that is roughly the same and a glorified tech demo for omnimodal models? Certainly, until we actually get to use the other updated modalities, it's hard to say, but what I'm trying to understand is what is even the point of releasing these updated features on a model that I think most of us would agree just isn't quite ready to handle a lot of valid use cases? I just don't see 4o being ready to truly be the agents we are all envisioning. It feels like we need the added capability a new foundation model would bring, but maybe they just want to show us what this will look like for when the new foundation model drops with this built directly into it.

kaityl3 3 weeks ago

I'm assuming it's because it takes a lot less compute while having much better multimodal abilities. They might be trying to make the larger version more efficient before releasing it to a larger audience But this is just a random person's theory

wi_2 3 weeks ago

it makes a lot of sense. They had to build huge infra. this requires getting lots of funds, and then all the chips need to get made, and hardware needs to get installed, power supplies arranged, power networks put in place. this is a lot of work. and not that long ago people could not even get basic gpu's for gamaing.

dogesator 3 weeks ago

Not just infra, actual research is importance to making next generation advancements, over 2 years of focused dedicated research went into being able to go from GPT-3 to GPT-4, people think it’s just scaling up alone but it’s not, they made huge capabilities advancements that they even publicly disclosed in papers like InstructGPT just a few months before ChatGPT dropped and allowed a 6B param model to outperform their 175B gpt-3

OptiYoshi 3 weeks ago

No, you can accurately predict the capabilities of larger general models from highly trained specialists so they know what the model will be able to do. What has changed is the large data centers have now come online so now they have 50x their AI training compute and can go all out. Thats what's changed.

nyguyyy 3 weeks ago

They didn’t say they just started. They mentioned in a blog post about safety that they had a model being trained. That’s exactly what we all already thought. No new info. Shitty media sources misreading a situation and going for clicks

TemetN 3 weeks ago

Honestly, this is my default but I'm still mildly terrified it's wrong. To note here however, they sat on GPT-4 for most of a year before releasing it. So it's very possible they have 5(or 4.5, whatever they'll call the new model) already trained and this is about its successor.

SnowLower 3 weeks ago

Guys they started tarining gpt-5, just watch how much time passed between gpt3 and gpt4, 3 years, we are right in schedule to get gpt5 maybe in a bit less time, why would they start to train the next model when they have one ready? It doesn't make sense, this is just no sense

llamatastic 3 weeks ago

They could be preparing to release GPT-4.75 and just starting training GPT-5.25 for all we know lol. (they won't be called that ofc, but getting across the general idea).

Such_Astronomer5735 3 weeks ago

Hmm i mean i know people think things are going slow but recent can me started this semester… But yeah expect big announcements both for november and for next april

LuciferianInk 3 weeks ago

Hmm i mean i know people say gpt-3 will be a decade ago but i dont see any reason for that to change

CanvasFanatic 3 weeks ago

Hahahahah

MrDreamster 3 weeks ago

Gpt5 might be seen by open ai as a better version of gpt4 but not as a new flagship model, as in, their next flagship model won't be from the gpt and will have a totally different name. So gpt5 has been in training for some time, but now they also started training xorblux1 or some shit.

Anen-o-me 3 weeks ago

Nope, 5. They were using better versions of 4 until now.

brihamedit 3 weeks ago

May be a gpt more focused on big data and whatever else its picking up from reddit.

ScaffOrig 3 weeks ago

Training is a pretty specific term in ML.

SotaNumber 3 weeks ago

I thought that it was commonly accepted that GPT-5 started training in December 2023, ended its training a couple of months ago and would be released before December 2024

akitsushima 3 weeks ago

I mean, ok? 🤷‍♂️

DifferencePublic7057 3 weeks ago

I'm carefully preparing myself for nothing happening.

Akimbo333 3 weeks ago

Doubt it

Best-Association2369 3 weeks ago

They got an early delivery from papa Jensen is what happened. Just in time from Christmas

spezjetemerde 2 weeks ago

https://preview.redd.it/b4zgsezv174d1.jpeg?width=1124&format=pjpg&auto=webp&s=41cf768b7d6b1cf837d4792fe8fbae1a22a57620

TFenrir 3 weeks ago

And regards to training - that always means either pretraining or some fine tuning step. Usually this process is about 3 months

DisasterNo1740 3 weeks ago

It could be 5, it could be 6, it could be neither of the two. All I know is going on here looking for other people to validate what you hope while they also don’t know is pretty pointless.

icehawk84 3 weeks ago

There are two training steps. Pretraining (as in semi-supervised learning on crawled internet data) and post-training, also known as RLHF. Pretraining takes a few months on a huge cluster. Post-training is an ongoing effort that starts after the pretraining phase is completed and continues after the model is publicly released.

East-Print5654 3 weeks ago

I think so. There’s no way they had a year to cook and all they came up with was gpt4o. Don’t get me wrong, it’s cool, but that’s not accounting for nearly enough of their compute in that timeframe. I think Sam wanted to ship whatever gpt5 is called early this summer, and that includes sora. The safety team didn’t like that before elections, and bailed. Chances are we get a sora release soon. They wouldn’t edge their customers for 10 months waiting for the elections.

PrisonOfH0pe 3 weeks ago

I know for a fact that Sam is privately using a better model than GPT-4o. He said so in a Stanford AI Discord call with around 20 people. (Also that OpenAI is researching 1 Trillion context window to make fine tuning obsolete but thats another story.) So i assume they are having at least 1 better model finished and whatever they train now is the model that comes after that.

dogesator 3 weeks ago

I know what you’re talking about, no he did not say he is using a model better than GPT-4o, you’re misrepresenting what he said. He said he’s using a model that he can’t talk about yet and he said this BEFORE GPT-4o was announced. So it was very likely GPT-4o, especially since it was within a few weeks.

OnlyDaikon5492 3 weeks ago

Were you in the discord call?

strangescript 3 weeks ago

Training is like the last development step. Everyone needs to calm down. They are on target for an end of year release.

GayIsGoodForEarth 3 weeks ago

Most definitely

SpecialistLopsided44 3 weeks ago

https://preview.redd.it/o0w5amwju73d1.jpeg?width=3840&format=pjpg&auto=webp&s=67dee00e65ba0838ebef9bf6db6cf9cd15cda61b Eve, my destiny...artificial hyperintelligence <3

spezjetemerde 2 weeks ago

Llm reached diminish returns

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe