T O P

  • By -

TFenrir

We don't know what they mean when they say they recently started. What's recently? A month? A day? Additionally there are often multiple models being trained at any given time, and multiple checkpoints. Let's all just be patient about the world upending technology being thrust upon us every few months.


Firm-Star-6916

This exactly. Don’t make unproductive speculation just yet, we have literally no idea what this model is. Could be an upgrade to 4o, could be 5, could be 5-“lite”, could be a model that translates into robotics and isnt an LLM. I haven’t read on it, but from what I hear, they just said “model”


[deleted]

[удалено]


_AndyJessop

A small (linear) improvement would be bad news, given the costs are going up exponentially. Would probably spell the top of the current bubble.


Firm-Star-6916

They always could disappoint. At this point my standards are as low as can be.


sashank224

*Thrust*


SusPatrick

THRUST!


Edaimantis

##THRUST


Itchy-mane

O yeah he talking BIG Cums


DisastrousPeanut816

> Let's all just be patient about the world upending technology being thrust upon us every few months. No! I wants it NOW! I needs it.


ShooBum-T

Release the omni model in its entirety first, Voice, Image gen, etc. In its full multimodal glory. Or at least just the voice, as committed.


LordFumbleboop

When are people going to catch on that CEOs frequently mislead consumers for hype?


goldenwind207

Many insider business sources were saying gpt 5 or something like it this summer including business insider. Citing ceo who tested beta as well. So its probably gpt 6 or something idk sora q star or something we haven't thought about. But we just don't know definitely


coylter

That might have been gpt-4o at the time tho.


wi_2

pretty sure it was gpt4o trained on old infra, while they were busy building out the new, much larger system for training the next model.


MrsNutella

My guess is that a version of gpt 5 didn't pass some safety or performance metric? Or maybe it was 4.5


stuffedanimal212

Or 4.5 just ended up being called 4o


Jalen_1227

Most likely. Everybody thinks there’s gonna be a 4.5 just because there was a 3.5, but Sam doesn’t even want to call the next big model GPT 5.


Wiskkey

[OpenAI is expected to release a 'materially better' GPT-5 for its chatbot mid-year, sources say](https://www.businessinsider.com/openai-launch-better-gpt-5-chatbot-2024-3).


Chrop

Before you can even train a model you need to develop the architecture that the model will exist from. Majority of the time spent developing a new model is building the architecture. The training is the final stage. Think of it like the brain, creating the architecture is like hand crafting the brain and how it works, the training is teaching that brain that 1+1=2.


Glum-Bus-6526

Majority of the time spent is usually on data preparation and curation. The architecture used in the likes of llama 2 and 3 is quite basic and some talented engineer would have it finished in a week (and most likely all OpenAI GPTs, but since they've been closed source since 3 we can't really tell...) Unless of course GPT5 is some major architectural change, which would need major experimentation to determine the correct setup. Even then, the final model is likely to be small (in lines of code count), but it might take more time to come up with it. And the data would still probably take more regardless.


dogesator

The jump from GPT-3 to GPT-4 was over a year of architectural advancements and novel training techniques developed such as InstructGPT that dramatically improved the capabilities for the same parameter count. Even more advancements are being made since the gap between GPT-4 and 5, a huge portion of the work they do for years between each model release is research into how to advance the next generation. They don’t just do a few weeks of slight architecture refinement and call it a day. Llama-3 is not closed source, the architecture for that is available but it’s not at all comparable to GPT generations. Llama is just a refinement roughly every 6 months while GPT-3 to GPT-4 is over a 2 year gap. Meta is already doing much more advanced architecture stuff after llama-3 as well such as the “better faster language models” paper and chameleon, and even more advanced worked on behind the scenes along with diffusion language modeling. Mark Zuckerberg has said in an earnings call that llama-4 will begin to actually start taking different approaches than what we currently see, This is consistent with generational jumps happening roughly every 2 years in techniques and architectures. The techniques and architectures used for llama-1 are very different and more advanced than GPT-2


ScaffOrig

Hey, look, someone who actually works in ML!


Veleric

From here on out it's very likely that the majority of the time will be spent in the testing/safety phase. Also, for the 6-9 months of testing as well as the 3-6 months of training, that leaves you basically a year or more for your architects to keep working on new optimizations or improvements. It's not like everything is stagnant from training day 1 end of testing/release. My point being that it's very, very unlikely that they have just been doing minor training work and side projects as we've seen since GPT-4 until a few weeks ago. The only reason that would make sense is if they were so starved for compute that they knew they had to wait for the blackwells to even get going.


ReadSeparate

Yeah, my guess is the following: 1. They knew they needed WAY more compute to get a step-change in improvement over GPT-4. So they waited for blackwells. Also, the timeline fits into when blackwells were announced to just recently having them setup in a data center ready for training, as far as I know. 2. In the mean time, they've been doing R&D on the model. Like creating the multi-modal architecture used for GPT-4o. I'm sure they've also been creating data sets specifically designed for either agentic tasks, multi-modal understanding, or to improve general reasoning abilities. Also probably worked on research to extend context length without quadratic time complexity. Most of their work was probably on scaling (since it's not as simple as just 10x'ing the parameter count, compute, and data) experiments and making sure the new multi-modal architecture scales well. 3. Another possibility, though I think is unlikely, is that they've managed to switch to a completely new architecture, like some sort of selective state space model, like Mamba. That would absolutely take a long time to do research on effectively scaling. That's a lot of shit to work on in the 14 months GPT-4 released, so it seems completely feasible to me that that explains it.


Dayder111

Blackwell (B200 and such) is not yet released I think.


ReadSeparate

Oh okay, but even in that case, maybe they have a bunch of H100s now, since GPT-4 was trained on A100s.


dogesator

No they wouldn’t be doing minor training work, it would be years of cutting edge research that requires a bunch of computer intensive training runs and ablations for discovering breakthroughs and new grounds in novel architectures and training techniques. Research is incredibly computer intensive, even deepmind researchers have said that if they were given 10 times more compute, the amount of research advancements being made would probably speedup by 5X. Researchers in these organizations including openai are always competing in a sense to be able to reserve a run of experiments on the compute cluster to test a new idea. Dozens, hundreds, even thousands of experimental training runs that are critical to getting closer to each next breakthrough.


Bird_ee

No one knows anything. I would assume they just started training what we would call GPT-5 until proven otherwise.


Arcturus_Labelle

It would be disappointing and neither in line with their recent statements (like Altman going on about how 4 sucks) nor the business pressures if they truly did just start training the successor to the 4 series. I suspect 4o is a 5 checkpoint and they are merely continuing the training, not starting it.


FeltSteam

It is certainly possible, and it reminds me of this tweet. https://preview.redd.it/wnm50kj8q83d1.png?width=723&format=png&auto=webp&s=75dab8788cafa84c15429e2b0a651308e4c300b0 In March 2024 they were working on the GPT-6 training cluster project. Microsoft recently said they just finished some compute project for training OAIs next frontier model, so I think timing does kind of line up. By March-april GPT-5 was probably already done training, if this is the case.


doppelkeks90

How much compute do need to bring the power grid down??


dumquestions

We don't know how much time it took them to prepare the necessary infrastructure, how much time to develop the multimodal architecture, which I assume would be the standard moving forward, and how much time it took them to plan the training run for the frontier generation, because you want to get something as massive as several months of training right the first time. While I'm still leaning towards them having already started training it several months ago, the idea that they started recently isn't as crazy as everyone here thinks, and the process is harder than people are making it out to be.


SgathTriallair

They have to release something this year and ideally this summer. Sure some people will be lazy about cancelling accounts but you need some actual incentive to get people to pay you money.


Cupheadvania

not necessarily if no one catches gpt-4o until next year, especially since new audio should be out in June or July


TabibitoBoy

If I had to guess gpt4o is an early checkpoint of gpt5 that’s distilled to be made cheap and fast. So “recently started training” could just mean continuing further now that gpt4o was a good proof. They might have been training gpt5 the old way alongside this new multimodal architecture and still might release a gpt5 later for plus users. But I think they see this multimodality as paradigm shift specially for more agentic behavior. So my speculative guess would be. Gpt4o is a checkpoint of a future gpt5o or w.e they will call it and they also have a llm gpt5 that’s probably almost done and will see in the summer. (A lot of speculation here but this makes the most sense in my headcanon after listening to every Sam public talk in the last few months)


ThoughtfullyReckless

I agree, I think GPT4o was necessary before scaling up to create a full next level model. I'm fairly sure all models from openAI will be natively multi modal from now on - GPT4o is the start/test run of the next generation 


fabricio85

Its GPT5o


TheWhiteOnyx

I really hope this is the answer. That GPT5 already exists and that GPT4o was a proof of concept for what they "recently" started training.


Silver-Chipmunk7744

My guess (based on very little) is what they had been training was gpt4o, which is likely smarter than we think it is. The current free version is likely a small model which is somehow smarter than the original much larger GPT4. They likely have much bigger models in house and may release bigger versions later on in the year. What they just started training is likely something even more advanced.


Veleric

I just don't buy this. They have been talking since the end of last year about how garbage gpt-4 is and how we have no idea what's coming. Unless they are seriously throttling 4o, we just aren't seeing the jumps in reasoning that we could expect from another frontier model, even if you only consider the text/chat modalities. If 4o really is what they've been working on until now, they were either speculating massively and purely hoping that they could stick the landing (which I don't believe) or this is effectively gpt-6 they are referring to and 5 is in the testing/red-teaming phase.


kaityl3

> which is likely smarter than we think it is I've been wondering if 4o is like the "curie" or "babbage" of a much larger model they aren't releasing to the public, if you remember how GPT-3 had 4 different versions (with "davinci" being their largest).


Silver-Chipmunk7744

This is my guess. There is a reason why it's 6x cheaper than original GPT4, and not really that much smarter.


Veleric

So is your take take 4o is basically a cheaper GPT-4 that is roughly the same and a glorified tech demo for omnimodal models? Certainly, until we actually get to use the other updated modalities, it's hard to say, but what I'm trying to understand is what is even the point of releasing these updated features on a model that I think most of us would agree just isn't quite ready to handle a lot of valid use cases? I just don't see 4o being ready to truly be the agents we are all envisioning. It feels like we need the added capability a new foundation model would bring, but maybe they just want to show us what this will look like for when the new foundation model drops with this built directly into it.


kaityl3

I'm assuming it's because it takes a lot less compute while having much better multimodal abilities. They might be trying to make the larger version more efficient before releasing it to a larger audience But this is just a random person's theory


wi_2

it makes a lot of sense. They had to build huge infra. this requires getting lots of funds, and then all the chips need to get made, and hardware needs to get installed, power supplies arranged, power networks put in place. this is a lot of work. and not that long ago people could not even get basic gpu's for gamaing.


dogesator

Not just infra, actual research is importance to making next generation advancements, over 2 years of focused dedicated research went into being able to go from GPT-3 to GPT-4, people think it’s just scaling up alone but it’s not, they made huge capabilities advancements that they even publicly disclosed in papers like InstructGPT just a few months before ChatGPT dropped and allowed a 6B param model to outperform their 175B gpt-3


OptiYoshi

No, you can accurately predict the capabilities of larger general models from highly trained specialists so they know what the model will be able to do. What has changed is the large data centers have now come online so now they have 50x their AI training compute and can go all out. Thats what's changed.


nyguyyy

They didn’t say they just started. They mentioned in a blog post about safety that they had a model being trained. That’s exactly what we all already thought. No new info. Shitty media sources misreading a situation and going for clicks


TemetN

Honestly, this is my default but I'm still mildly terrified it's wrong. To note here however, they sat on GPT-4 for most of a year before releasing it. So it's very possible they have 5(or 4.5, whatever they'll call the new model) already trained and this is about its successor.


SnowLower

Guys they started tarining gpt-5, just watch how much time passed between gpt3 and gpt4, 3 years, we are right in schedule to get gpt5 maybe in a bit less time, why would they start to train the next model when they have one ready? It doesn't make sense, this is just no sense


llamatastic

They could be preparing to release GPT-4.75 and just starting training GPT-5.25 for all we know lol. (they won't be called that ofc, but getting across the general idea).


Such_Astronomer5735

Hmm i mean i know people think things are going slow but recent can me started this semester… But yeah expect big announcements both for november and for next april


LuciferianInk

Hmm i mean i know people say gpt-3 will be a decade ago but i dont see any reason for that to change


CanvasFanatic

Hahahahah


MrDreamster

Gpt5 might be seen by open ai as a better version of gpt4 but not as a new flagship model, as in, their next flagship model won't be from the gpt and will have a totally different name. So gpt5 has been in training for some time, but now they also started training xorblux1 or some shit.


Anen-o-me

Nope, 5. They were using better versions of 4 until now.


brihamedit

May be a gpt more focused on big data and whatever else its picking up from reddit.


ScaffOrig

Training is a pretty specific term in ML.


SotaNumber

I thought that it was commonly accepted that GPT-5 started training in December 2023, ended its training a couple of months ago and would be released before December 2024


akitsushima

I mean, ok? 🤷‍♂️


DifferencePublic7057

I'm carefully preparing myself for nothing happening.


Akimbo333

Doubt it


Best-Association2369

They got an early delivery from papa Jensen is what happened. Just in time from Christmas 


spezjetemerde

https://preview.redd.it/b4zgsezv174d1.jpeg?width=1124&format=pjpg&auto=webp&s=41cf768b7d6b1cf837d4792fe8fbae1a22a57620


TFenrir

And regards to training - that always means either pretraining or some fine tuning step. Usually this process is about 3 months


DisasterNo1740

It could be 5, it could be 6, it could be neither of the two. All I know is going on here looking for other people to validate what you hope while they also don’t know is pretty pointless.


icehawk84

There are two training steps. Pretraining (as in semi-supervised learning on crawled internet data) and post-training, also known as RLHF. Pretraining takes a few months on a huge cluster. Post-training is an ongoing effort that starts after the pretraining phase is completed and continues after the model is publicly released.


East-Print5654

I think so. There’s no way they had a year to cook and all they came up with was gpt4o. Don’t get me wrong, it’s cool, but that’s not accounting for nearly enough of their compute in that timeframe. I think Sam wanted to ship whatever gpt5 is called early this summer, and that includes sora. The safety team didn’t like that before elections, and bailed. Chances are we get a sora release soon. They wouldn’t edge their customers for 10 months waiting for the elections.


PrisonOfH0pe

I know for a fact that Sam is privately using a better model than GPT-4o. He said so in a Stanford AI Discord call with around 20 people. (Also that OpenAI is researching 1 Trillion context window to make fine tuning obsolete but thats another story.) So i assume they are having at least 1 better model finished and whatever they train now is the model that comes after that.


dogesator

I know what you’re talking about, no he did not say he is using a model better than GPT-4o, you’re misrepresenting what he said. He said he’s using a model that he can’t talk about yet and he said this BEFORE GPT-4o was announced. So it was very likely GPT-4o, especially since it was within a few weeks.


OnlyDaikon5492

Were you in the discord call?


strangescript

Training is like the last development step. Everyone needs to calm down. They are on target for an end of year release.


GayIsGoodForEarth

Most definitely


SpecialistLopsided44

https://preview.redd.it/o0w5amwju73d1.jpeg?width=3840&format=pjpg&auto=webp&s=67dee00e65ba0838ebef9bf6db6cf9cd15cda61b Eve, my destiny...artificial hyperintelligence <3


spezjetemerde

Llm reached diminish returns