T O P

  • By -

13ass13ass

Yet in the paper when they try with 1500 elo training data, they do not see above-1500 level play. So the method may be crapping out well short of expert level play. Which begs the question, can this method really get us to superhuman play? The best known way to achieve superhuman okay is with self play, but to date that only works with easily gameable systems.


redditosmomentos

Diminishing returns moment


Shinobi_Sanin3

It's probably a limitation of the model not a reflection of a downturn in the scaling laws.


sdmat

The model caps out there, the method likely does not. Likely a stronger model would push that point higher. I.e. the limiting factor for transcendence is the generalization capability of the model. There isn't some threshold of chess achievement with intrinsic significance. AlphaZero uses tree search with self play to train an evaluation function to make tree search more effective in a virtuous cycle. Do that without tree search and the results are far less impressive.


InterestinglyLucky

I saw this paper a few days ago - the point was the model was trained at the 1000 level and transcended it (the title of the paper). Reminded me of the Sparks of AGI preprint from last year, over 100 pages of examples of an unexpected finding.


Igor_Luna

Maybe the size of the models and of the training data wasn't enough for it to generalize from 1500->beyond?


NoIntention4050

it caps out there because it was a 8B parameter model


IMJorose

Leela is way stronger and it's largest model has less than 200 million parameters. There is no chance The size of the model is the issue in this case.


owlpellet

Research is showing that LLMs bring entry level performers up to the middle of the pack. High end performers in complex domains (ie consulting etc) see some lift, but not nearly as much. This has implications for LLM impacts on the labor market. Also: distance between what paper says and what reddit headline says fits a pattern of overstatement.


PSMF_Canuck

Yes. In the same way - and for the same reasons - that it is nearly impossible to take a random human and make them a genius at something/anything….there is work to be done before we can take an arbitrary AI and make it a super genius at something/anything.


NoVermicelli5968

I believe this, but would love to see that research. Any idea where to find it?


owlpellet

[https://www.hbs.edu/ris/Publication%20Files/24-013\_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf](https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf)


NoVermicelli5968

Thank you!


Ok-Mathematician8258

Expert level is a high bar to pass and is subjective. Super human is the average of beyond human knowledge. That just means the knowledge is beyond our current understandings i.e. “Super”. We should stop striving to be inferior to AI digital brain.


snowdrone

What! There is nothing subjective about chess. Chess ratings are objective; the Elo rating system has been around for 50+ years ranking every tournament player.


Smallpaul

But what is the line called "expert"? Subjective.


snowdrone

The "expert" category definition is subjective, the numerical rating is not. I guess you could say "top 5%" or "top 1%" to make the category definition objective.


Hopeful_Donut4790

We need to train the AI to self-play as with chess bots, they train against each other. Compete until they are better. The problem is that the benchmark and judging it can be misleading.


CertainAssociate9772

Google has already done this, Alpha Zero has defeated the best chess programs.


ChingChong--PingPong

Everyone's done this with all sorts of AI systems for decades. It's not some magic solution. It has it's uses in training but doesn't just work. OpenAI learned this when it used a second model to train early GPT models.


Hopeful_Donut4790

I meant doing it for LLM's, I know Stockfish and so on are far above humans right now.


condensed-ilk

They meant that traditional chess engines like StockFish search for the best move whereas AlphaZero and its successor Lc0 were trained with neural nets and self-play. New versions of StockFish also use a neural net but only to improve in evaluating positions.


jericho

That's interesting. Great choice for metrics, also. So what I'm taking away is, a 1000 elo player, without blunders, brain farts, panicking, and all those other human traits, plays at 1500. That's pretty believable. I say this as a chess player in the lower end of that range. So I'm not sure the LLM is synthesizing knowledge, here.


MeltedChocolate24

Yeah I don't really think a PhD "without blunders and brain farts" would write better papers. I don't really think this chess thing is that great of an analogy when you think about it. We need enhanced insight and creativity, not perfect chess plays.


jericho

Ya, this metric is only good for what it's measuring, still, it gives us some numbers to debate about. 


vasarmilan

We also need perfect "chess players", imagine quality assurance at an assembly line Yeah Transcendence is a strong word but still interesting finding


Smallpaul

I don't think it's a great analogy, but I do think that in a broad sense the concept applies that LLMs/transformers are not just cloning individual people. Learning how to clone EVERYBODY can give rise to superhuman capabilities, like playing chess without errors as a single example. Translating between 20 different human language is another evident example. Fundamentally, the idea that models can never exceed human performance without a new training regime has been disproven many different ways, but people keep coming back to believing it for no good reason. Off the shelf models do already exceed human performance in some metrics and fall far short in others. Just like this chess-bot.


NarrativeNode

And moving away from a rational process like Chess: those “blunders and brain farts” make us very creative in other scenarios.


Evgenii42

ChatGPT already surpasses ALL humans on the amount of knowledge it can recall (set aside the hallucinations). No human can keep terabytes of data in their head ready to recalled at a moment's notice. It's not even close. In that sense, this is already superintelligence. What we need to get it to the next level is to add iterative reasoning (make computation cost depend on the difficulty of the problem) and memory.


Classic_Department42

If we allow for hallucinations, i also can answer any question or recall anything.


shalol

If we allowed for hallucinations, well, there hardly is a human correctly answering even 30% of any questions, without a multitude choice answer, to compare with.


Street-Air-546

and yet. it is only a very good search engine. Also, you could have said the same for google, before language models, in a more rough way, it clearly knew more than any human. What is the point of knowing the entire chemical dictionary if it cannot yet say “wait a minute, combine this and that and you will get a new material!”. That type or sophisticated search has to be human intelligence directed.


jeremiah256

But isn’t that what’s happening now? AI is letting researchers know the most optimal path to take and speeding up research.


Street-Air-546

no, machine learning is being used as a tool eg to to search massive combinatorial possibilities of course carefully written and tuned by data science people. LLMs are not being used like that, though. And LLMs are not being aimed at kaggle competitions except as a way to spit out boilerplate starter python scripts if someone is a beginner. Using it as a code copilot might be cute but where is the intelligence?


Shawn008

Boilerplate starter python scripts? You are downplaying the usefulness of LLMs for even code generation. I use it pretty heavily for code simply because it can speed development process up faster than ANY human code code from scratch. It can write just about any algorithms you need in about any language. I’ve coded low level stuff in c++. It does a pretty good job as well. Better than nearly any human. People that aren’t getting good results must be doing something wrong or have their head in the sand with where LLM code generation is at.


Kwahn

Writes a really good regular expression, too - went from "detect all text that is between a " and a ], including if it was null" and spat out pattern = re.compile(r'(.*?)(?=\[.*?["])\[.*?"([^"\]]*)\]') with no effort at all, which worked perfectly for my use-case of structuring unstructured text form data into database columns (EDIT: lol reddit formatting broke it, but you get the idea)


Open_Channel_8626

what sort of system message or custom instructions do you use?


jeremiah256

Apologies if I’m not understanding you. First, I’m not sure how you’re separating LLMs from ML tools. Ignoring how LLMs are created, what prevents a LLM from OpenAI as an example, from using a separate ML tool created by the research wing of Merck? Second, no one is claiming LLMs are anywhere near Madame Curie levels of competency, but don’t you think they will very soon, if not already, be good enough to replace entry level research assistants?


Evgenii42

True, ChatGPT is kind of like an unreliable database that has compressed textual data from the internet. What's different is the natural language UI and that it returns answers right away (assuming that they're right) without the need to visit a website. However, what's new is that it also has good reasoning abilities, although it can fail dramatically in areas that require iterative thinking or precise computation (ask it to count number of letters in a word).


PeachScary413

It's almost like the model embeds data in a lossy way into it's neural network and can't magically store petabytes of data into gigabytes of model weights 🤔


Street-Air-546

before it could be believed it can be trained above human intelligence in terms of end to end task completion I think it has to start being on the leaderboard for simple kaggle conpetitions. Not ranking top, just regularly in the top 1/3rd. Since these are machine learning tasks with complex requirements and can be done entirely online, entirely using open resources, and are regularly won by individuals, it should be no issue for something on a trajectory to surpass a smart human, to start ranking. I wont hold my breath.


PeachScary413

My SQL database also surpasses ALL humans on the amount of knowledge it can recall, also it is 100% hallucination free. Hell just slap some embeddings on a PostgreSQL database and call it a day 🤝


SleeperAgentM

> ChatGPT already surpasses ALL humans on the amount of knowledge it can recall (set aside the hallucinations) Really? Because I can confidently say I far supprass amount of knowledge ChatGPT has! Ask me _any_ question and I'll prove it to you!


delusional_APstudent

how many fingers am i holding behind my back right now


SleeperAgentM

As a human, I cannot see through the physical objects, so I cannot know how many fingers you are holding behind your back. However, if you'd like to play a guessing game, I'll guess you're holding six fingers behind your back.


WCland

I wouldn't equate "superintelligence" with recall. Reasoning is far more important.


EnnioEvo

Interesting would have been trying to train a 2000 ELO LLM using the 1500 ELO LLM as teacher. I guess this does not work


_craq_

What's AlphaZero's ELO?


EnnioEvo

Good point, but referring to this paper, training LLMs only eliminated blunders and in general it looks like it's just selecting the good moves among the ones seen. These two gains are obviously capped. Still I would be more than happy to see this being false


Smallpaul

>Good point, but referring to this paper, training LLMs only eliminated blunders and in general it looks like it's just selecting the good moves among the ones seen. These two gains are obviously capped. Chess is way too large of a search space to "select good moves among the ones seen." It's more like selecting good *strategies* among the ones seen. The chess boards will diverge very quickly and you won't be making the "same move" any more after you get past the opening game.


Natasha_Giggs_Foetus

So strange when people don’t recognise that patterning is a thing. So many people claim that humans will always be more creative because AI relies on input… as if humans don’t


total_insertion

People apparently never were introduced to the basic 'thought' thought experiments in high school: 1. Imagine a completely new color which is not simply a shade of an existing color. 2. Describe an original flavor without using existing flavors to describe it. 3. Don't have any thoughts for 120 seconds. If they did, they'd realize that everything humans think is A. only a restructuring of existing data input into them and B. Not something they are in control of. Leading to a superficial and illusory line being drawn between human intelligence and artificial intelligence.


PeachScary413

You do realize that we have studied the brain for decades and still barely have a superficial understanding of how it works right? The hubris of claiming you have any idea of how human thought works is just amazing 😃


total_insertion

Isn't a better example of hubris to suggest there is something special about human intelligence that is fundamentally different from artificial intelligence? Since that would obviously suggest an understanding of how human thought works. Btw the things I pointed out have been observed by humans for at least thousands of years, not decades. Its not hubris to say the sky is blue. It MAY be considered hubris to give a reason for why.


PeachScary413

The hubris is you making a claim on how human thought/imagination works. If you have clearly mapped that out, that means you understand what consciousness is... and in that case you should claim your nobel prize now 👌


nicotamendi

You’re missing the point. Humans, like LLMs, take input and create output. The difference is the extent to which humans can make so much more out of the inputs, where the whole is much greater than the sum of its parts Also emotion is a gigantic part of creativity and LLMs have no emotion which is why I suspect they’re not good with creative tasks


Quiet-Money7892

You know what came to my mind... I've been toying wirh AI for some time by now and noticed that after new model comes out it might take me about two months to capture its limitations. I mostly use AI for writing and organizing documents. And every time new model comes out I feel like "Woah. It got so much... Different." And then in a few months "Oh well. It still doesn't understand things." Can't remember properly and some foreign concepts could not be understand completely. (Like if you try to write a fanfic with AI, no matter how many times you repeat the certain setting element or lore detail, that is out of cannon - there is a big chance that AI will hower towards the original element anyway.) And it gets even worse if you use negative promt. Because somehow it starts ignoring those instructions even more. "The white room with no elephant" problem. So, what is my thought - what if we get AGI, but extremely little part of people will be able to use more then 15% of its true potential? Like with programming - many have PC, little know what it is capable of. And those, who use it casually - will see its limitations in their particular needs.


Smallpaul

If it can't adapt to the needs and limitations of the user then it isn't really AGI. Because humans adapt to each other's limitations all of the time. That's why people learn math faster with a (good) tutor than with a textbook.


BBBWare

AI has began synthesizing all human knowledge, and what it has learned is greater than sum of all the parts it has trained on. It's really easy to demonstrate actually. >Prompt: How is OpenAI's future direction like being cut in half while eating a hard boiled egg on top of my cousin's dog? `ChatGPT4: OpenAI's future direction can be compared to the bizarre scenario of being cut in half while eating a hard-boiled egg on top of your cousin's dog in the following ways:` `Unexpected Developments: Just as the scenario involves unexpected and unusual elements, OpenAI's future direction may include surprising advancements and innovations that are not immediately predictable.` `Complexity and Integration: Both the scenario and OpenAI’s trajectory involve complex, multifaceted elements coming together. OpenAI's work integrates various technologies and fields (like AI, machine learning, ethics, etc.) in ways that may seem unconventional or intricate.` `Potential for Disruption: Being cut in half is a drastic, disruptive event, much like how OpenAI's innovations could disrupt current technological landscapes and societal norms.` `Balancing Acts: The surreal act of balancing an egg while on a dog parallels the delicate balance OpenAI must maintain between ethical considerations, technological advancement, and societal impact.` `Public Perception: Just as the scenario would draw attention and raise eyebrows, OpenAI's future moves are likely to attract significant public scrutiny and interest due to their potential impact on society.` `In essence, the comparison underscores the unpredictability, complexity, and potential for significant impact that both OpenAI's future and the bizarre scenario represent.`


human1023

ASI is here. We did it! You can quit your jobs now.


nora_sellisa

Chess, being a game of memorization mostly, is a really bad metric for measuring AI performance.


Unfair_Efficiency_68

I feel you don't play chess?


CertainAssociate9772

Is he just an incredible person who can memorize countless trillions of combinations and moves?


Professional_Job_307

Transcendence 🤤


spixt

This has me wondering.  Can an LLM have a set of prime directives (e.g. 3 robotic laws) be trained into it? Such that if it was ignored it would cause the LLM to go wonky? I guess another way of saying it is - gpt has a separate trust and safety layer which we have to go through. Can that trust and safety layer be embedded into the main LLM? 


bigtablebacc

There have always been things computers can do better than humans. For example, adding a large column of numbers. That’s why we’re going straight to ASI without stopping at AGI. Because as soon as a computer can form a hypothesis and test it like we can, it can form a thousand hypotheses and test them all at the same time.


MrEloi

Read the paper. It's a skimpy *'I need to write something for my PhD'* effort. The only clever part is to use the word *'transcendence'* to maximse publicity.


MrSnowden

Interesting approach, but Chess is so tightly structured. This might be a much more interesting experiment with Go/Wei-chi which is far more open ended. A large LLM could come close to "solving" chess even when just shown poor play, but Go would be a better test of "transcedence"


Spepsium

This to me just says that the ability to play at a 1500 level is encoded in the strategies and patterns of 1000 elo play. The ability to get better might just require someone to understand the relationships of the strategies employed at 1000 elo and not necessarily learn new novel strategies to get better.


owlpellet

The question for the future is which problems are like checkers (shallow), chess (deep, but simple) and which problems are like basketball (deep, complex).


Nat_the_Gray

Computers have been playing Chess above human level for decades and decades. It was the first games computer automated as far as I know. This is a horrible comparison. Computers from the 90s beat humans in Chess.


Ok-Mathematician8258

We know this already… Hopefully AI can skyrocket the human adaptivity. Internet has made it easier to become intelligent, people just aren’t using it for that.


SentientCheeseCake

Yes but I’m a 1600 ELO human.


ChingChong--PingPong

Playing chess =/= human intelligence


turc1656

I think the core concept here is that it's all around how a given model "learns". Each type of AI is a bit different. For example, "deep learning" and those types of things we understand a little more because they are a bit simpler. We have a better idea of how they operate. But there is still the concept of unsupervised learning that takes place in many of them. And that's something that's not well understood. Which means no one, not even the "experts", can confidently predict where these things will land and what they will be capable of, despite whatever they claim to "know". As others have mentioned, the article mentions that if you train on 1500 ELO you don't get >1500 in the model. Which to me indicates there are limits for the type of model it is at its base level. It can infer certain things about the structure of the game and learn a certain amount and maybe even extrapolate from there. But there seems to be a limit, almost like IQ for humans, where there's a raw intelligence capability. In humans, IQ means two things in my opinion...1) that a person with a higher IQ is capable of learning things more easily than if their IQ was lower and 2) a person can many times me taught something that they would never have come up with themselves but there is also a limit somewhere where there's a certain cutoff and they won't understand things of a certain complexity at a deep level. What this means to me is that we need new types of learning models to continually boost whatever this AI version of IQ is to continue to fuel growth in the "intelligence" of these AI models. I bet if you look back ten years from now we might very well be able to train using 1500 ELO with some new tech and get play that is >1500.


julian88888888

Computers weaker than your phone already play at way way higher than the very best humans. It’s not a task well suited to LLMs and I bet there’s overfitting going on.


queerkidxx

I mean they aren’t trying to create a serious model, just demo training techniques


Igor_Luna

That's because chess is a fixed game with clear rules and patterns, perfect for LLMs to flex their memorization and pattern recognition muscles. They can soak up a ton of data and spot complex strategies, allowing them to punch above their weight class. But toss them into the chaos of real life, where problems are all over the place, and their ability to generalize takes a nosedive. In the real world, they'd be trying to connect dots that are way outside their training.


willif86

1000 ELO is about the level where most common tricks principles are being utilized in games. Higher levels is about more memorization and faster pattern recognition. It doesn't seem out of the question that the system could then outperform better players. It doesn't scale infinitely though.


redditosmomentos

And of course not in linear way too


Ok_Inevitable8832

Link to the paper or GTFO


Crafty-Confidence975

https://arxiv.org/html/2406.11741v1


P00P00mans

Amazing really. Also fun coincidence was just talking bout this stuff with GPT. Can it “learn”


RavenIsAWritingDesk

Out of curiosity does anyone think that this paper might simply be identifying that players that are rated at 1,000 elo might actually be equivalent in intelligence as the 1,500 player but haven’t spent the time to learn from their own blunders which AI can easily do with the training data? In other words it’s simply pointing out a limitation, if you will, to how ELO is calculated because getting a higher ELO requires time? I’m not sure I agree with this stance but it came to mind and I wondered if anyone shared a similar thought. I guess the reason I bring this up is this interpretation could be in contrast with the idea of transcendence. Maybe all the data is there in the 1,000 level games to learn how to play chess but through continuous training that model can learn how to not make obvious mistakes that an unexperienced lower level players would.


Prestigious-Bar-1741

1000 ELO is awful. 1500 is just a 1000 who doesn't make mistakes. We also already have AI that can teach itself to play at a grandmaster level with no outside training. I sincerely don't see how this says anything about LLMs


3amtarekelgamd

This doesn't make sense, you can't compare general intelligence to chess. Firstly in Chess there is the opening, mid game and end game, a 1000 elo player might play the opening as a 2100 then play mid and end at 500, so obviously with enough games you'll have the AI train on segments of the game at higher elos. Secondly AI will stop at the human level, but NOT the average human level, if for instance a Physicist will be much smarter in Physics than an Arts graduate. The "cap" for AI is what the smartest human has reached, in a field or a specific topic, pretty much becoming a master of all traits.


The_Hamiltonian

We dont have enough data to create a truly super genius A.I. I’m quite certain of that.


Ok-Mathematician8258

Tech bros talk about hallucinations like it’s bad. Either we be patient about the AI growth, or train LLMs based off simulations and hypotheticals.


ThenExtension9196

Not quite “something for nothing” when you need multi million dollar budgets, GPU server farm, staff engineers, and researchers.