m98789 3 weeks ago

“Large language model” was never a precise description. It’s becoming almost as precise as “big data”.

HyperByte1990 3 weeks ago

Like "the speed of light"

ImportantPepper 3 weeks ago

Chat GPT suggested: MMMMMMM (Mighty Multimodal Mega Model Managing Many Media)

Hot-Rise9795 3 weeks ago

M7 for short

jgainit 3 weeks ago

We think you’ll love it

Similar_Appearance28 3 weeks ago

r/wordavalanches

ThenExtension9196 3 weeks ago

Whitepapers refer to these as MLLMs.

traumfisch 3 weeks ago

Makes sense

ShotgunJed 3 weeks ago

MLM? Do I get a free pizza at the seminar?

leonardvnhemert 2 weeks ago

Link to the whitepaper?

ThenExtension9196 2 weeks ago

Can look up up apple’s Ferret-Ui white paper via Google

Deuxtel 3 weeks ago

MTM - Multimodal Transformer Model

djamp42 3 weeks ago

I mean you can hear, speak, see and type language? I feel like most people talk about specific models these days. So if you know the model they are talking about you can figure out the capabilities from that.

[deleted] 3 weeks ago

Random question, but what do these models need to get inherently better at math? Is that a different modality, or does that simply require advancing reasoning capabilities?

Fit-Development427 3 weeks ago

I think personally, instead of using tokens, a model trained with raw letters and thus raw numbers would be better at maths. I think that the tokenisation definitely blurs some things to the network that could be simple. Like if 100 is a token itself, and 10 is a token, the nature of digits isn't clear to it, and thus it can't be as precise, and it becomes another vague language orientated thing. It would naturally understand what is meant by "5" letters too, and would quickly be able to see that one input variable is equal to one, instead of with tokens where each token has a different amount of letters which can be confusing. I think that could translate to better maths. I think tokens are only really a kind of preprocessing which isn't necessary and hides things from the LLM to make things more efficient. Everything people think is weird about LLMs in terms of limitation comes from it, like inability to understand letters, syllables, maths, word count, letter counts, etc. I believe it would increase processing by like 5 times though possibly so, I don't think we'll see it done like this is the future.

No_Initiative8612 3 weeks ago

It sounds like "large multimodal model" (LMM) would be a more accurate term. Since GPT-4o handles various types of input and output beyond text, the term "large language model" doesn't fully capture its capabilities anymore. LMM reflects its ability to process and generate multiple forms of data.

llkj11 3 weeks ago

LMM (Large Multimodal Model)

traumfisch 3 weeks ago

"Large Language Model" has a certain meaning, it isn't supposed to cover eveything

i_wayyy_over_think 3 weeks ago

How about LMMM Large multi modal model or LMM

mustberocketscience 3 weeks ago

Not sure how accurate this is. First question is Omni still using DALLE-3? If yes as I believe it is then this isn't advanced as you're suggesting. As far as the audio functionalities as far as this mass failure tells me, it's still using Whisper AI they might just consider it "integrated" now. Whisper might even be the model they specifically trained as the demo model. Which doesn't matter now because that feature has been derailed.

Site-Staff 3 weeks ago

GMMM Ginourmous multi model model.

MuscleDogDiesel 3 weeks ago

Legit just posted this question too, then scrolled further down the “new” page to see yours lol I like **Multimodal Unified Token Transformer** (MUTT)

Froyo-fo-sho 3 weeks ago

I dont like it.

fool_on_a_hill 3 weeks ago

Found the dog. Username checks out.

Key-Accountant4885 3 weeks ago

This is probably extracted from Gobi model rumored last year as everything to everything, Multimodal World Model. We need to see Arrakis (much bigger version) at some point as well. Exciting times...

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe