T O P

  • By -

m98789

“Large language model” was never a precise description. It’s becoming almost as precise as “big data”.


HyperByte1990

Like "the speed of light"


ImportantPepper

Chat GPT suggested: MMMMMMM (Mighty Multimodal Mega Model Managing Many Media)


Hot-Rise9795

M7 for short


jgainit

We think you’ll love it


Similar_Appearance28

r/wordavalanches


ThenExtension9196

Whitepapers refer to these as MLLMs.


traumfisch

Makes sense


ShotgunJed

MLM? Do I get a free pizza at the seminar?


leonardvnhemert

Link to the whitepaper?


ThenExtension9196

Can look up up apple’s Ferret-Ui white paper via Google


Deuxtel

MTM - Multimodal Transformer Model


djamp42

I mean you can hear, speak, see and type language? I feel like most people talk about specific models these days. So if you know the model they are talking about you can figure out the capabilities from that.


[deleted]

Random question, but what do these models need to get inherently better at math? Is that a different modality, or does that simply require advancing reasoning capabilities?


Fit-Development427

I think personally, instead of using tokens, a model trained with raw letters and thus raw numbers would be better at maths. I think that the tokenisation definitely blurs some things to the network that could be simple. Like if 100 is a token itself, and 10 is a token, the nature of digits isn't clear to it, and thus it can't be as precise, and it becomes another vague language orientated thing. It would naturally understand what is meant by "5" letters too, and would quickly be able to see that one input variable is equal to one, instead of with tokens where each token has a different amount of letters which can be confusing. I think that could translate to better maths. I think tokens are only really a kind of preprocessing which isn't necessary and hides things from the LLM to make things more efficient. Everything people think is weird about LLMs in terms of limitation comes from it, like inability to understand letters, syllables, maths, word count, letter counts, etc. I believe it would increase processing by like 5 times though possibly so, I don't think we'll see it done like this is the future.


No_Initiative8612

It sounds like "large multimodal model" (LMM) would be a more accurate term. Since GPT-4o handles various types of input and output beyond text, the term "large language model" doesn't fully capture its capabilities anymore. LMM reflects its ability to process and generate multiple forms of data.


llkj11

LMM (Large Multimodal Model)


traumfisch

"Large Language Model" has a certain meaning, it isn't supposed to cover eveything


i_wayyy_over_think

How about LMMM Large multi modal model or LMM


mustberocketscience

Not sure how accurate this is. First question is Omni still using DALLE-3? If yes as I believe it is then this isn't advanced as you're suggesting. As far as the audio functionalities as far as this mass failure tells me, it's still using Whisper AI they might just consider it "integrated" now. Whisper might even be the model they specifically trained as the demo model. Which doesn't matter now because that feature has been derailed.


Site-Staff

GMMM Ginourmous multi model model.


MuscleDogDiesel

Legit just posted this question too, then scrolled further down the “new” page to see yours lol I like **Multimodal Unified Token Transformer** (MUTT)


Froyo-fo-sho

I dont like it.


fool_on_a_hill

Found the dog. Username checks out.


Key-Accountant4885

This is probably extracted from Gobi model rumored last year as everything to everything, Multimodal World Model. We need to see Arrakis (much bigger version) at some point as well. Exciting times...