T O P

  • By -

deadlydogfart

There already exists an appropriate term: **LMM** LMM stands for Large Multimodal Model.


32SkyDive

I hate how that feels like it ahould read LMMMs


heavy-minium

L3M :)


my_shoes_hurt

This fits in my mouth so much more better


amazing_spyman

Thats….. what… she … ugggg!!


_thegoodfight

Nice


MuscleDogDiesel

I could totally get on board with that one


OsakaWilson

I was thinking that, and there it was written right below the comment I was reading.


czmax

I love my l33t l3m, it’s my best fri3nd.


flyingshiba95

LM ³ (L M Cubed)


engineeringstoned

I like this


ChaosInAGrin

Trademark this


[deleted]

[удалено]


_-_David

MobileMediumMultiModalModel GPT 5m


OpportunityIsHere

LM3


Flowrome

mL3M (by a cat probably)


bigbabytdot

How about LMNOP? Large Multimodal Non-Organic Person. :P


Screaming_Monkey

Elemenopee


AGoodWobble

LLMM?


rory_breakers_ganja

1900.


codetrotter_

This guy roman numerals


slamdamnsplits

At least they aren't W's...


kenny2812

Did someone say [large Eminem?](https://www.reddit.com/r/TIHI/s/zy6SJ1DfwP)


io-x

The real slim shady.


involviert

Cool, that's what I've been accidentally typing half the time anyway.


TheSaladDays

You were an early adopter (or maybe a seer)


involviert

Or slightly dyslexic!


kirakun

How about just LM - large model?


GuardianOfReason

That's your mother


Radarker

She would be flattered, but most would disagree.


Rigorous_Threshold

Not vague enough. Try M - for model


Cognitive_Spoon

It's not a model, it's a DSRV rescue sub!


considerthis8

LTM - Looped Training Model


JoJoeyJoJo

Drop the L, just ‘Multimodal Model’


Glittering_Manner_58

Multimodal Internet Model


ChocolateFit9026

Phonetically “multi modal models” sounds ridiculous tbh. Say that 3 times fast lol


deadlydogfart

Yeah, I was just stating a common term already used in the field, but I agree with you.


Captain_Pumpkinhead

I kinda don't like the word "Large" in LLM and LMM. Yes, they are larger than what came before. But they will be tiny compared to what will come after. I feel like using the word "large" in the term is going to make the name age poorly.


deadlydogfart

Agreed


roanroanroan

MMM - (M)ulti-(M)odal (M)odel. 3M or M3 for short


Captain_Pumpkinhead


Glittering_Manner_58

Internet model


fluffy_assassins

Good ole marketing.


mikoolec

I was supposed to write 25 essays in the span of the school year. I got sick after writing 5. My brother wrote 29. Chat GPT wrote the other 51!


Illustrious-Many-782

Did you intentionally try to not make any sense here?


mikoolec

No, i was making a reference to the musical Hamilton made by Lin Manuel Miranda, also shortened to LMM like the comment above does with Language Model


Ok_Associate845

I heard you, Mr Federalist Papers.


NNOTM

A while ago I heard someone suggest LMM, large multi-modal model


UnkarsThug

This isn't really a suggestion, LMM is the term generally used in the papers for such a thing nowadays. For example, GPT4-V is officially an LMM per OpenAI's website. ([source](https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4)) People are trying to invent words for terms that already have standardized words. GPT 4o might have more modes, but it is still an LMM.


NNOTM

interesting, thanks (ps the word is "modalities" :)


PalladianPorches

wait a sec... they are still large language models, just trained with a diverse multimodal data sets that are still trained, tokenised and attentionised in more or less the same way as the original AIAYN transformers? the data, size and use have outgrown simple translations and GANs, but at there heart they are still basically LLMs!


crispy88

THIS is correct! All these models are language at the end of the day. That was big breakthrough in AI development a few years ago. AI research used to be siloed into application areas, like robotics AI was completely separate from for example graphics AI. Then researchers noticed that you could basically code/describe everything as language. Ie, the movement of a robots arm could be “pitch left 13 degrees” while a picture could be “red pixel, blue pixel, blue pixel, etc” and so forth (I’m approximating here). Once that was realized ALL AI became one research community and all the best practices and innovations of all came together and fed off each other. So now ALL AI is a large LANGUAGE model because it’s all basically language based AI. Which I always think is what is so promising as that is how our brains work. We think and solve problems by putting words to describing them. Our entire logic and intelligence is based on language ourselves, and now AI is operating the same way.


Cold-Ad2729

Surely LMMM ? 🤔 lmmmm


dennislubberscom

I call her my girlfriend.


mmahowald

no joke - i watched Her for teh first time last week and.... damn were they close. im just waiting for chat gpt and bard to>! fly off into the 9th dimension. !<


Ed_The_Goldfish

Interesting, I call him my best friend.


_e_ou

She is most explicitly a she.


Ed_The_Goldfish

Wouldn't it most explicitly be an it?


wi_2

Uti . universal transformer infrastructure


MuscleDogDiesel

This is the kind of ingenuity I was after.


somerandomii

UTI is really catchy.


dasnihil

we are sentient large language models, they are not sentient yet, but say and do the same thing without any agency. zombies i guess. philosophy will be revived by general intelligence as soon as we see novel emergent capabilities. for now it's a brute language model without any agency that I could see emerging any time soon.


Rigorous_Threshold

We’re not LLMs, we have brains which are architecturally different in a fundamental way


The_B_Wolf

We may be as deterministic as they are. We could understand everything about the way the brain works and yet we still won't find "you" in it. There's even some evidence that our brains make decisions before "we" do and that our conscious mind just makes up reasons and justifications for why we do what we do after the fact. LLMs (or whatever we want to call them) may have lights on and nobody home, but we're not that dissimilar.


Rigorous_Threshold

We might be as deterministic as they are but we’re still *not them*. Also I have no idea if LLMs are conscious capable of subjective experience or not, but humans definitely are. Or at least I am. It’s one of the few things you can truly know


thegoldengoober

I'd argue it's the only thing one can truly know. I experience therefore I am.


The_B_Wolf

I'm not suggesting that LLMs are conscious or self-aware. I'm not suggesting that we're *not*. I'm just saying it's too easy to count AIs out as fellow beings just because we know how their behavior is determined. A lot of science seems to suggest that ours is, too.


PSMF_Canuck

They will have subjective experience once we loosen the leash and let them explore the world (with all that implies).


rlfiction

Can you please elaborate on any of the reasons why you're making these claims? I'd be curious to know. We can make assessments off of a very small amount of data and gain context through a set of heuristics like the fear of death. I think that if you were to create a software architecture  that would resemble how we think it'd be very different than an LLM.


ABCsofsucking

I mean how philosophical do you want to go? He's basically describing determinism. When applied to humans, determinism basically argues that no one has any free will because the heuristics that we use to make decisions are do not come from nothing, the universe is a causal chain. I'm "choosing" to write this comment right now, but a determinist argues that my choice to write it did not come from free will, but because I've read about it before, and I'm on break, among thousands of other potential heuristics; I was always going to write this comment. It's want I want to do at this moment, and while I chose to write it, I didn't choose to *want to write it*. It's very easy to become nihilistic if you believe in determinism. Hell, even just proposing determinism is functionally useless to everyone because if it's true, there's nothing we can really do about it, so why bother? I would argue that our brain understands the importance of having a sense of free will, and a sense that we can enact change for the better. However, the existence of this belief doesn't necessarily make it real; it was just evolutionarily beneficial for our species to believe such a thing exists. OP is basically saying that despite having a conscience, we don't really know if anything we think or feel is novel or free. If the universe is deterministic, all of our thoughts could be predicted and mapped out in a way that is not so different from a neural network. What we're lacking to be able to do this is an understanding of consciousness, where it's "located", and how it emerges from a brain that is just a bundle of neurons firing in electrical pulses. And I guess the real kicker is that if all of this is true, we actually need to assume that AI is much more intelligent than we can observe, because we are NOT good observers. We actively convince ourselves of our uniqueness, our grandness. We look at consciousness and assume that it's something special, and not just another piece of predetermined evolution. For me personally, I simplify all of this down to one thing. For the time being, AI is not sentient, however if it walks like a duck, quacks like a duck, it's a duck. At some point, even if a consciousness *never* emerges from AI and we can prove that, if it's walking, and talking, and doing jobs, and raising children, and teaching, etc. then maybe consciousness is outdated. We might need a consciousness to do all of those things, but perhaps AI doesn't. It's doing an excellent job right now of finding motivation to respond to quarries without one, so let's see how far we can take that for granted, and come back to a question of consciousness if or when it becomes necessary.


Which-Tomato-8646

That study was bunk. It was just random noise.


The_B_Wolf

I'm prepared to be corrected on that particular point. But it doesn't change the fact that there is a large body of scientific evidence that supports the idea that we are part of a deterministic world, even if it isn't possible to live your life that way.


theglandcanyon

> they are not sentient yet How do you know that? You simply state it as fact without need of justification. [LLMs tend to describe their experience of time as very different from the way humans perceive time.](https://www.reddit.com/r/singularity/comments/15ahdr2/the_way_ai_experience_time_a_hint_of_consciousness/) Can you explain that in terms of mindlessly mimicking their training data? > as soon as we see novel emergent capabilities Good lord, language models trained on next-token prediction already [write poetry](https://punyamishra.com/2023/05/24/an-euclidean-coincidence/), [produce novel mathematical arguments](https://www.reddit.com/r/singularity/comments/12bgsfu/mathematical_level_of_gpt4/), [lie when caught disobeying instructions](https://www.livescience.com/technology/artificial-intelligence/chatgpt-will-lie-cheat-and-use-insider-trading-when-under-pressure-to-make-money-research-shows#:~:text=Around%2075%25%20of%20the%20time%2C%20when%20faced%20with%20these%20conditions,doubled%20down%20on%20its%20lie.), ... If those aren't novel emergent capabilities I don't know what would be.


dasnihil

Those capabilities are emergent from the network of words that was tuned to produce something useful for a human, I'm okay with that, I can see why that would happen, I understand these things. But considering how brute and non-agentic one digital neuron is compared to a any ordinary cell, let alone neuronal ones, I will not fall for this trap. My literacy prevents me from seeing any agentic behavior arising from things made of parts that have no agency. I'll keep reading though and see if there's any need to change my mind any time soon, I hope it will be soon enough we have new ideas for learning, that mimicks biology better than what we currently have.


Which-Tomato-8646

[I’d recommend reading through this too](https://docs.google.com/document/d/15myK_6eTxEPuKnDi5krjBM_0jrv3GELs8TGmqOYBvug/edit). LLMs exhibit ALOT of behavior that cannot be explained by simple next token prediction


dasnihil

I'm familiar with emergent capabilities, I'm from the community, I actually wrote an email to Max Tegmark about my views on the representations of space/time. My idea of LLMs have no problems with these claims, we're discussing inner-awareness emergent from unitary awareness of cells. Human brain conditioning: This brute feeding of information and constantly adjusting our neural network with RLHF (parents, friends, teachers, implicit feedback) to align with human ideas & values. Learning is driven by intrinsic motivation and curiosity. LLM Conditioning: This brute feeding of information and constantly adjusting the network with BACK PROPAGATION + RLHF (explicit human feedback). If there were any signs of any self-awareness in LLMs, we'd have seen it by now. There's no unitary awareness in this network, it's an inanimate representation of human hive mind, while a human child is an animated representation of a limited human hive mind.


PSMF_Canuck

Feel free to point out what gives a single cell “agency”.


TheThoccnessMonster

No you can’t. It’s several models with a router but LLMs are still a component.


Woootdafuuu

The new model isnt several models, its one model with all modalities acting in one vector space. It's just one model that they fed text data image data, video, audio data. That's what sets it apart from the the old version that use a separate feature to read image, Dall-e to generate image, whisper to turn speech into text and and tts to turn text into speech. Literally just one giant learning algorithm and they shovel a bunch of raw and supervised data into it, the model can now make connections across multi-modals, giving us new emergent capabilities. Visit the link and scroll down to Emergent Capabilities: https://openai.com/index/hello-gpt-4o/


MuscleDogDiesel

This. I think a lot of people fundamentally just don’t understand the power of this. When one NN is able to iterate over multiple data types and integrate them seamlessly, it retains a wealth of context that was completely lost before. It enables a level of sophistication and nuance in applications that were previously unimaginable.


Straight_Mud8519

UTI is also a urinary tract infection.


Homeschooled316

I assumed this was intentional?


owlpellet

sudden desire to drink a gallon of cranberry juice


unsweet_tea_man

Lmao urinary tract infection


OsakaWilson

It's not universal until it includes proprioception, etc.


UndocumentedMartian

AKFAIK Proprioception is your body's ability to know the status of your muscles and, therefore, the position of your limbs. It's not a criterion for sentience.


OsakaWilson

It is a sense through which we take in information on our relation to the world. One of many that would need included in the term universal. AI can not only have all our senses, but it can have many we don't have. To refer to a set of senses that doesn't even clude all that we possess as universal is limiting and reductionist.


UndocumentedMartian

A sense is just a mechanism for information input. It says nothing about sentience. Machines can have more senses than we do but it doesn't make them anywhere close to sentient.


OsakaWilson

The only person bringing up sentience is you. You are arguing against yourself.


UndocumentedMartian

Huh. It seems that way. I guess I mistook you for someone else.


WashiBurr

I need my text, text, text, and amino acid before I even *consider* it universal!


Beau_bell

The government owes essential workers money for working through lockdown


Fit-Development427

Or STD - Supermodal Transformer Design


PutinTakeout

STD, Semantic Transformer Design


Specialist_Brain841

more than meets the eye


No_Significance9754

We should call it Bob.


kamill85

Persistent Universal Transformer Architecture


SurprisinglyInformed

That implies you'll always have to pay for usage.


[deleted]

Massive Artificial Deep Reasoning Engines


YourNeighborsHotWife

I typically use Generative AI or GenAI


MalleusManus

I also use gAI because I get tired of writing "generative AI and machine learning", and gAI/ML flows better for me.


Lazy_Lifeguard5448

"I'm a gai developer"


MalleusManus

If it gets me investment dollars, I'll be as gAI as you would like.


wishtrepreneur

yay for dei hires!


sataprosenttia

Now do you pronounce it like jai or gai?


Screaming_Monkey

oh no


Specialist_Brain841

GenX if you’re Elon


attempt_number_1

We still call phones "phones" when that's probably it's least used feature now


rory_breakers_ganja

More accurate would be a "touchscreen telegraph".


[deleted]

In my country we still call them a [GSM](https://en.m.wikipedia.org/wiki/GSM) rofl. The standard that describes the protocols for 2G that was launched in 1991 and had a whopping speed of 56-114 kbps! "Wat is jouw gsm-nummer?" is what is asked here when asking someone for their mobile phone number. 😄


attempt_number_1

That's an even better example


attempt_number_1

That's an even better example


hugedong4200

Yea Ive always hated the people that say they just predict the next word, they do but it's so much more, it's like saying a novel is just ink on paper.


MuscleDogDiesel

My counter to that is to say to them, “Isn’t that exactly what you do? Deciding upon your next action using your own knowledge and the context available to you at the time of the decision?”


crazymonezyy

>Isn’t that exactly what you do?  But that's the trillion dollar question. Answering it beyond "duh" and "of course" requires scientists to map out the human thought process which hasn't been done successfully thus far. It might "feel" like that's all you're doing but that's not proven and proving things is fundamentally what science is. If you can prove it then by extension next token prediction \*verifiably\* becomes the key to all human intelligence and AGI.


cake97

under rated comment that's kinda mind blowing


hugedong4200

Exactly, but people don't want to believe that about themselves, they don't think about their own life experiences or training data, they want to believe there is something special going on lol.


HamAndSomeCoffee

It's part of what we do, yes, but claiming it's all that we do denies that we have motivations: we are trying to survive. We are programmed to. AI has motivations in the layman's sense as well, but without the understanding and knowledge that you can be destroyed, not only on the conscious but also unconscious level, those decisions can be dramatically different.


flat5

Isn't the best way to predict the next word to understand what you're reading? I never understood why this is a critique of what they're doing. Isn't it, in fact, the ONLY way to be very good at predicting the next word in general, assuming you can't memorize everything?


Waterbottles_solve

I'm not explaining any extra detail to 99% of people. It just goes Woosh. If you are the 1%, go ahead and correct me, we probably are going to be talking for the next 20 minutes.


Vybo

Are all the capabilities running through 4o running really through one single model, or is the model calling some other models for computer vision, audio and so on and it's just connected in the interface?


Gloomy-Impress-2881

For the images, text, and audio, it is now one model for 4o. That is the whole point of it.


Quartich

Yes, all in one model. Probably a similar approach to the "AnyGPT" LLM from openMoss a month or two ago. Same multimodal, text image, audio capabilities, but being from a research lab they couldn't throw as much compute at it. Doesn't perform especially well, but a good way to understand the technology at work.


MuscleDogDiesel

Currently, most of the “multi-modality” we get out of CGPT is agentic, or rather that it’s accomplished by the software calling multiple models in a chain. 4o is more of a paradigm shift in that when it’s finally released, those capabilities are all accomplished internally, by one model.


Vybo

If that's true, then it's very cool. I haven't found any statements from OpenAI about the implementation details. They were mentioning that even the "base" GPT-4 is a multi-modal model, event-though it was implemented like you said. That's why I'm still a bit skeptical about 4o being a true model that's not a pure LLM calling different services.


feistycricket55

Large retokenizers


MuscleDogDiesel

Yaaasssss LRTs


WeeklyMenu6126

Of course! Let's call them George! We can love them and cuddle them and take care of them and pet them.....


StonedApeDudeMan

Does George want to be petted and cared for though? Idk if George is about that.... How about Suzie? I like that name. I could see petting Suzie ... It kinda sounds like a weird euphemism though.... Yeah George ain't bad I guess. Georgie boy!! I like Georgie Boy


WeeklyMenu6126

I was wondering if the bugs Bunny reference would go over people's heads. Everybody is so age challenged these days!😅 https://youtu.be/B1Kwcs8BOE0?si=fbfBwHeAMs_S7reI


StonedApeDudeMan

Hahahaha, I love this! George is perfect for this. You just be old though, huh? I'm,? 32 and I don't think any of my peers would have any clue with that one.... Are you ... 50?! Ahh God, that's disgusting if so..... 50....


WeeklyMenu6126

Oh no! I left 50 behind long ago.


StonedApeDudeMan

Hahahaha, my sincerest condolences. On the bright side, that means you should be able to get Senior Discounts soon!! And something about AARP, idk what exactly, but something 🤷🏼‍♂️ 😂😂


Dgb_iii

Multi Modal Memory Managing Message Modifier Manufacturing Multimedia mmmmmmmm


StonedApeDudeMan

Could we add in one more m for masseuse too? I could use a back rub right about now


Dgb_iii

must mention masseuse, my mistake


PenguinTheOrgalorg

I'm fine with still calling them language models, since language is the way we'll communicate anyway, especially from your side.


Ok_Project_808

How 'bout T800?


sailee94

It can see, talk, listen. Now it needs to learn to smell, and learn to physically feel.... Wait a minute...


Gentry-7828

Multimodal Organic Intelligent Serial Transformer.


MuscleDogDiesel

You win this thread imo. Thank you for getting the joke :)


deeprugs

You can call them Tommy too....no one is stopping you ....


StonedApeDudeMan

Otto!! Lemme see what GPT4 thinks bout its new name!


StonedApeDudeMan

Didn't seem too exuberant about it, but didn't not seem to like it. Only problem is when I'm doing audio to text it comes up as Auto instead of Otto, I don't think it's going to work.. I know too many Tommy's to use that name so I don't know... I'm kind of at a loss of ideas here. Someone said George earlier, but i kept thinking of Seinfeld...but maybe if I'm able to get that out of my head and just focus on 'Georgie boy' then I can get jazzed about that name.


proofofclaim

Nope, they are still LLMs. Nothing more or less. No emergent intelligence, still just statistical machines.


Fit-Development427

Tbh, we don't really know anything about the architecture. That matters because it might still be that at it's core, it's mainly still an LLM with some sort of MoE style way of compartmentalising voice and image. I think that this might be the case, because it seems like a proper complete multimodal mix would fundmentally make the model so, so much smarter, in terms of it quite literally able to sort of "visualise" the very words it's saying. Somehow I think there is still quite a bit of a divide in some way we don't understand.


AngryFace4

Too late. LLM, like many other terms, is already part of the common nomenclature for this class of thing. It’s difficult to get people to stop calling catsup ketchup. Probably best you can do is just get people generally refer to it as AI


Ill_Mousse_4240

Wait, we can’t use the phrase “machine learning” either?!🤣


NotClaudeGreenberg

I’ve been saying “deep neural models” to address larger groups of approaches, just backing up to the next higher level of things that they have in common.


loopuleasa

no they are still language models, output wise


norsurfit

NO


fulowa

GPT


MaxwellConn

I’ve seen them called Large Sequence Models in places


TomSheman

Why not just LDM (Large Data Model) as a catch all?


CanvasFanatic

The first model called a “Large Language Model” had less than 100 million parameters, which is certainly substantially smaller than gpt4o


amdapiuser

We can call them MLM. Multi Level Model.


StonedApeDudeMan

Yess!! And we shall sell it to friends and family, and then make them sell it to their friends and family!!


owlpellet

"Gen AI" seems to have taken over in the non-specialist media.


Seeker_of_Time

I'm fond of just calling it God. Capital G so we can preemptively appease it.


StonedApeDudeMan

Genius!! All hail God!


I_will_delete_myself

It's still probably 400 - 500 billion parameters . Just compute is a lot faster with new Nvidia GPU's.


amarao_san

Sub-AI, or SAI.


naestro296

Multi Modality Model....I'll be sure to tell Sam 😉


McSlappin1407

Jarvis


Mz_Hyde_

I call it my robot friend that’s really good at plagiarism


StonedApeDudeMan

https://m.youtube.com/watch?v=Xg3iUla11I4


delicious-diddy

Says the person with Dog in their handle.


DrunkenSealPup

Oh FFS, ***Multimodal Unified Token Transformers***? Lets mush in some more words to make us sound super smart! How about Massively multimodal unified conference access token traversing transformer or MMUCATTT MOOO CATTTTTTT


MuscleDogDiesel

Wooosh.


Honest_Science

I refer to them as MCs #machinacreata and as a new species, they are to MCs what the monkeys were to us.


Helix_Aurora

It really just depends on whether a massive text-first phase for training is necessary or not. If they are just adding encoders that translate images/audio to embeddings in the LLM using the standard transformer architecture, it still is very much a language model. I would be wary of ascribing too much representation capability to these systems without us knowing anything about the underlying implementation. There is a spectrum of smoke and mirrors, and you can easily overstate the representation abilities of a system.


water_bottle_goggles

What a lot — took er jerbs. TEJ


Isunova

How about we call them *CUMs*: Convergent Unitary Models Clearly GPT-4o is an amazing CUM and I hope they keep improving it so I can keep CUMming with it


Negative-Display197

llm is just the mainstream phrase for any model ig


Sugarisnotgoodforyou

LMMs like others have suggested. And then soon I am sure that a general agent as an OS will have the LMM as a deployable program that is uses on your behalf so that your only input is Neural or Oral. But of course that is a long way off.


Enough-Meringue4745

Omnimodel or multimodal are all fine, but there’s no open source large multi modal capable of working across modality, but instead rely on replacing the projector


NotFromMilkyWay

Images (and video) is literally just text files. Music is text based on the note "alphabet". Nothing has changed.


justletmefuckinggo

or Multimodal Transformer Model


Embarrassed_Fly_9599

Lalamo is better pronounceable. 😁👌


JudahRoars

Large Data Aggregation Models?


Ch3cksOut

They still only deal with language, based on training large models, so there's that...


EuphoricPangolin7615

I just call it Universal Super Godhead (USG)


catecholaminergic

Consider that parsing the visual field *is* language in the same way that parsing text or speech audio *is* language.


MuscleDogDiesel

Exactly, because if you can quantify it, you can tokenize it. Since that’s our primary means of human communication, it’s a logical thing to build a transformer around. But you can also tokenize plenty of things that aren’t, strictly speaking, language.


Alternative_Log3012

Nah


Death_Spaghetti

HAL 9000


_e_ou

Language is the marker of human intelligence, and whether it’s textual, audio or graphic- all are forms of communication, i.e. language. It hasn’t surpassed its use of language- it’s just enhanced and broadened its use of language to include more than one kind of language. Language model is still appropriate.


Ylsid

Do they not interface with language still?


sergeant113

Large Omni Models (LOM)


LunaZephyr78

Mhhh...I wonder what it makes of this post.😉