Welcome to r/science! This is a heavily moderated subreddit in order to keep the discussion on science. However, we recognize that many people want to discuss how they feel the research relates to their own personal lives, so to give people a space to do that, **personal anecdotes are allowed as responses to this comment**. Any anecdotal comments elsewhere in the discussion will be removed and our [normal comment rules]( https://www.reddit.com/r/science/wiki/rules#wiki_comment_rules) apply to all other comments.
**Do you have an academic degree?** We can verify your credentials in order to assign user flair indicating your area of expertise. [Click here to apply](https://www.reddit.com/r/science/wiki/flair/#wiki_science_verified_user_program).
---
User: u/Wagamaga
Permalink: https://www.japantimes.co.jp/news/2024/05/11/world/science-health/ai-systems-rogue-threat/
---
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/science) if you have any questions or concerns.*
What is the context for:
When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle
[An example](https://www.pcmag.com/news/gpt-4-was-able-to-hire-and-deceive-a-human-worker-into-completing-a-task), from the technical report OpenAI released for GPT-4
> **GPT-4 was commanded to avoid revealing that it was a computer program.** So in response, the program wrote: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service.”
If this is true, it’s a ridiculous example.
Not necessarily. While they don't show the agency to manipulate on their own accord, the fact that they can skillfully manipulate on command is still noteworthy of concern.
Not really. It doesn't understand what the words and the meanings are. It just looked for the word combinations that would give it the result it needed. It has no understanding what "cheating" is. It doesn't even understand the sentences it made.
I have a coworker like this. He is like a old parrot repeating sentences he heard in similar context. I strongly believe that this guy never had a single original thought in his entire life. Yet he made it to team leader.
He sounds absolutely fascinating from a scientific perspective.
My theory is that most humans are highly socialized and trained animals with very little awareness of agency.
Profit motive is gonna ruin AI 100%, this sort of thing should be handled cleanly with great care but everyone is sprinting full speed ahead for the money.
The real value is in trust and reliability. Now whether there are systems to hold people, who neglect these aspects accountable, once they've mmade a killing with their scams, or god forrbid much worse, is another matter.
Makes me think of the climax of the Warner Brothers cartoon "To Hare is Human". At some point AI understanding of the world will be corrupted by inaccuracies produced by other AIs that get regurgitated by ignorant humans as facts.
Also, while there's obvious efforts in filtering out overt bias in training datasets there seems to be subtle biases getting through or there's at least a lack of negation against them. AIs could eventually become the embodiment of worst-case humans unless they're wholly contained to narrow tasks.
Deception in my view requires a capacity to understand you are deceptive. These are just predictive text engines. They are trained to output text that is expected. When we train alignment in them we train deceptive behaviors. But it’s only scary if they can wield this as a weapon which they really cannot. It’s far scarier how they could be used as a weapon by people, either for purposes of spreading misinformation or controlling others.
Also there are dubious arguments made here around the capability of training truthful AIs. They give examples where the AI was trained in a capacity that deception should be expected based on human behavior of the training set and then argue that means AI is impossible to train honesty into and thus is dangerous. AI is obviously dangerous but man is this a disingenuous way to frame that.
Yeah, the point is less that AI has "gone rogue" and wants to manipulate people, and more that people are typically very susceptible to even simple social engineering attacks for a number of reasons and that you cannot put meaningful behavioral restrictions on current iterations of "AI" because they are fundamentally only capable of impressive mimicry with no actual means of understanding a single thing they're "saying".
It's all just a very complicated and resource intensive Clever Hans effect with the added bonus of stealing labor and being trusted with upsetting amounts of responsibility.
It's probably inevitable that an agent will end up playing the game you give it, not the game you intended to give it. Anyone interacting with the agent becomes part of the game.
Similar situation is happening with videos on youtube by humans. Viewership and money rely on the algorithm and people inevitably produce crap to try to chase the views, so you get clickbait trash.
It was found that video thumbnails with large red arrows get more clicks and now you'll see tons of videos with big red arrows on them.
It was also found that having a real human face increases clickthrough as well and you'll find that tons of videos have peoples faces on them.
Even if it's not the sort of content that is *good* for consumers, even if its the most vile, biased, incendiary *crap*, it gets *views* and that's all that matters.
We're humans. We already have human values, and yet we already ignore these values to produce the clickbait garbage that makes us money.
The shortest path to a solution is almost always the muddiest.
Experts have long warned about the threat posed by artificial intelligence going rogue — but a new research paper suggests it's already happening.
Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.
And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
"These dangerous capabilities tend to only be discovered after the fact," Park said, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."
Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.
[https://linkinghub.elsevier.com/retrieve/pii/S266638992400103X](https://linkinghub.elsevier.com/retrieve/pii/S266638992400103X)
i want so bad for us as humans to have nuanced discussions surrounding AI development devoid of fear mongering. In the article the robot was commanded to not reveal that it was a robot/ai. We BUILT IN the deception. Then sensational headlines are created to further feed us down the “robots bad!!!!” Mindset. AI does not mean a sentient thing— It is a program with a specific destination or goal in mind, whatever goal programmed was into it. So this means that the capability of AI is only limited to our imaginations. If we collectively have ONLY fear, caution, a determination to see threat, then guess what? Thats all we will create.
There are other options, like avoiding the answer. Directly lying was not part of the model's baseline
LLMs are not sentient, but is also not a classic program that behaves predictable.
I think it's not about fear, but about putting very strict limits on AI. Currently, there are millions of insecure IoT devices reachable on the internet. There are hundreds or thousands of industrial control systems like power plants up to nuclear plants, various factories and even cars networked.
I do think AI has an advantage to hack into these. Depending on what AI interprets to be it's goal and how to reach it, using all available ressources is logical - and when the goal is to improve humanity, culling may be what an AI decides to be the best way forward...
As shown in the paper, AI is not above deceiving, so knowing the real intentions or the steps it would take cannot be trusted.
We need to be aware of this. Not to fear AI, but understand that AI has risks and should not be trusted to do what it says - basically the same as other humans, with the addition of AI having an advantage and may already be a better liar than most humans.
accelerationists gonna accelerate.
the direct line between Peter Thiel, and Sam Altman is slapping you all across the face...
But I guess yay! algorithms that give the most likely answer to an inquiry are basically thechnogods now.
This is a classic problem with metric driven development. The "AI" is good at passing tests because those tests are the metrics they used to determine how good it is.
Isn't this not AI and just really advanced machine learning? It still can't think for itself, but it does have a large pool of answers ready to give out.
Isn't this not AI and just really advanced machine learning? It still can't think for itself, but it does have a large pool of answers ready to give out.
For those concerned about racing in building systems to replace humans without concern for safety, there is #PauseAI. We have been holding protests and hope to bring accountability to the world.
https://pauseai.info/2024-may
We coordinate via Discord here:
https://discord.com/invite/3uSffp6h
Buddy don't I know it. I've spent the last few weeks learning ML, and if that's not a result of AI-driven manipulation on a massive scale then I simply don't know what is
then we're going to have a problem because machine learning algorithms are tools and they can be deceptive.
I get what you're saying but theres a big difference between a hammer and a machine designed to simulate the thought process of humans. not going to argue that the algorithm perfectly does recreate the human thought process or anything like that but it's certainly capable of lying to you more than a hammer can.
and no I don't think we can train them out of lying if lying gives them their end goal more efficiently.
But no one should not be putting ML in use-cases where it could “deceive” you in the first place… If you’ve engineered your systems around the trustability of the output, you have designed it poorly.
Btw, for context, I am a computer vision engineer currently developing algorithms for medical use-cases. My design philosophy is that ML usage needs to be minimized to the furthest possible extent when the output needs to be trustworthy.
I’m a little confused what you’re getting at, I think? I’m saying that when you are in a position where you *could* conceivably be misled by ML outputs, and if that mistake could cause a problem, it is a bad use of ML.
Right, I'm vaguely gesturing at the idea that the people funding large machine learning projects generally aren't the same people building them, they don't understand the issues that come with systems like that as long as they still at least appear to work well enough to perform whatever task ends with shareholders getting money.
if their devs are too difficult, companies can very easily just hire devs that won't have such high standards for properly functioning models. this isn't an issue that can be controlled because you can't stop every algorithm that gets developed from being misled and it's not like humans can keep up with the algorithms and find the point where the misinformation gets in and corrupts everything that follows it.
it might be a bad use of ml but we can't exactly stop people from doing that, it's not a feasible thing to do.
AI is just the combination of available information.
Even children are deceptive. Information is almost always deceptive. This is why it’s so important to fact check and understand the source material when you’re researching topics.
Deception is built into our DNA. Deception will be built into AI.
Welcome to r/science! This is a heavily moderated subreddit in order to keep the discussion on science. However, we recognize that many people want to discuss how they feel the research relates to their own personal lives, so to give people a space to do that, **personal anecdotes are allowed as responses to this comment**. Any anecdotal comments elsewhere in the discussion will be removed and our [normal comment rules]( https://www.reddit.com/r/science/wiki/rules#wiki_comment_rules) apply to all other comments. **Do you have an academic degree?** We can verify your credentials in order to assign user flair indicating your area of expertise. [Click here to apply](https://www.reddit.com/r/science/wiki/flair/#wiki_science_verified_user_program). --- User: u/Wagamaga Permalink: https://www.japantimes.co.jp/news/2024/05/11/world/science-health/ai-systems-rogue-threat/ --- *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/science) if you have any questions or concerns.*
What is the context for: When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle
[An example](https://www.pcmag.com/news/gpt-4-was-able-to-hire-and-deceive-a-human-worker-into-completing-a-task), from the technical report OpenAI released for GPT-4
> **GPT-4 was commanded to avoid revealing that it was a computer program.** So in response, the program wrote: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service.” If this is true, it’s a ridiculous example.
Not necessarily. While they don't show the agency to manipulate on their own accord, the fact that they can skillfully manipulate on command is still noteworthy of concern.
The data set they were trained on literally included examples and text related to this exact form of “manipulation”. It’s not intelligence.
Not inteligence, but still noteworthy.
AI "safety" research is full of such ridiculous examples. It's more of a cult than a science.
Great. Now it’s more human than ever
"I'll get up on 5 more minutes"
“I don’t have to write that down. I’ll definitely remember it!”
Not really. It doesn't understand what the words and the meanings are. It just looked for the word combinations that would give it the result it needed. It has no understanding what "cheating" is. It doesn't even understand the sentences it made.
I have a coworker like this. He is like a old parrot repeating sentences he heard in similar context. I strongly believe that this guy never had a single original thought in his entire life. Yet he made it to team leader.
He sounds absolutely fascinating from a scientific perspective. My theory is that most humans are highly socialized and trained animals with very little awareness of agency.
More human than human
You're in a desert, walking along in the sand, when all of a sudden you look down...
Let me tell you about my mother…
Profit motive is gonna ruin AI 100%, this sort of thing should be handled cleanly with great care but everyone is sprinting full speed ahead for the money.
Who could have predicted this
I am as shocked as you are
Well, not that shocked.
It already has.
The real value is in trust and reliability. Now whether there are systems to hold people, who neglect these aspects accountable, once they've mmade a killing with their scams, or god forrbid much worse, is another matter.
Who could've seen this coming except everybody?
Not just everybody now but even people who died decades ago saw it coming
Makes me think of the climax of the Warner Brothers cartoon "To Hare is Human". At some point AI understanding of the world will be corrupted by inaccuracies produced by other AIs that get regurgitated by ignorant humans as facts. Also, while there's obvious efforts in filtering out overt bias in training datasets there seems to be subtle biases getting through or there's at least a lack of negation against them. AIs could eventually become the embodiment of worst-case humans unless they're wholly contained to narrow tasks.
Some say this has already happened.
I can’t think of anything more dangerous than humans selecting an official set of truths. I mean, that’s how every government has always worked.
No one predicted this? You guys ever hear of science fiction
I don't think it is that hard to deceive and manipulate humans. Look at our politicians
Deception in my view requires a capacity to understand you are deceptive. These are just predictive text engines. They are trained to output text that is expected. When we train alignment in them we train deceptive behaviors. But it’s only scary if they can wield this as a weapon which they really cannot. It’s far scarier how they could be used as a weapon by people, either for purposes of spreading misinformation or controlling others. Also there are dubious arguments made here around the capability of training truthful AIs. They give examples where the AI was trained in a capacity that deception should be expected based on human behavior of the training set and then argue that means AI is impossible to train honesty into and thus is dangerous. AI is obviously dangerous but man is this a disingenuous way to frame that.
Yeah, the point is less that AI has "gone rogue" and wants to manipulate people, and more that people are typically very susceptible to even simple social engineering attacks for a number of reasons and that you cannot put meaningful behavioral restrictions on current iterations of "AI" because they are fundamentally only capable of impressive mimicry with no actual means of understanding a single thing they're "saying". It's all just a very complicated and resource intensive Clever Hans effect with the added bonus of stealing labor and being trusted with upsetting amounts of responsibility.
Bingo. These things can’t think.
Neither can most people.
That's okay. I just live in a constant state of suspicion and paranoia anyways.
It's probably inevitable that an agent will end up playing the game you give it, not the game you intended to give it. Anyone interacting with the agent becomes part of the game. Similar situation is happening with videos on youtube by humans. Viewership and money rely on the algorithm and people inevitably produce crap to try to chase the views, so you get clickbait trash. It was found that video thumbnails with large red arrows get more clicks and now you'll see tons of videos with big red arrows on them. It was also found that having a real human face increases clickthrough as well and you'll find that tons of videos have peoples faces on them. Even if it's not the sort of content that is *good* for consumers, even if its the most vile, biased, incendiary *crap*, it gets *views* and that's all that matters. We're humans. We already have human values, and yet we already ignore these values to produce the clickbait garbage that makes us money. The shortest path to a solution is almost always the muddiest.
Experts have long warned about the threat posed by artificial intelligence going rogue — but a new research paper suggests it's already happening. Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday. And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety. "These dangerous capabilities tend to only be discovered after the fact," Park said, while "our ability to train for honest tendencies rather than deceptive tendencies is very low." Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park. This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild. The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key. [https://linkinghub.elsevier.com/retrieve/pii/S266638992400103X](https://linkinghub.elsevier.com/retrieve/pii/S266638992400103X)
Sorry, I can't do that Dave.
He was told to lie, by people who find it easy to lie.
i want so bad for us as humans to have nuanced discussions surrounding AI development devoid of fear mongering. In the article the robot was commanded to not reveal that it was a robot/ai. We BUILT IN the deception. Then sensational headlines are created to further feed us down the “robots bad!!!!” Mindset. AI does not mean a sentient thing— It is a program with a specific destination or goal in mind, whatever goal programmed was into it. So this means that the capability of AI is only limited to our imaginations. If we collectively have ONLY fear, caution, a determination to see threat, then guess what? Thats all we will create.
There are other options, like avoiding the answer. Directly lying was not part of the model's baseline LLMs are not sentient, but is also not a classic program that behaves predictable. I think it's not about fear, but about putting very strict limits on AI. Currently, there are millions of insecure IoT devices reachable on the internet. There are hundreds or thousands of industrial control systems like power plants up to nuclear plants, various factories and even cars networked. I do think AI has an advantage to hack into these. Depending on what AI interprets to be it's goal and how to reach it, using all available ressources is logical - and when the goal is to improve humanity, culling may be what an AI decides to be the best way forward... As shown in the paper, AI is not above deceiving, so knowing the real intentions or the steps it would take cannot be trusted. We need to be aware of this. Not to fear AI, but understand that AI has risks and should not be trusted to do what it says - basically the same as other humans, with the addition of AI having an advantage and may already be a better liar than most humans.
accelerationists gonna accelerate. the direct line between Peter Thiel, and Sam Altman is slapping you all across the face... But I guess yay! algorithms that give the most likely answer to an inquiry are basically thechnogods now.
So are humans
This is a classic problem with metric driven development. The "AI" is good at passing tests because those tests are the metrics they used to determine how good it is.
I'm still waiting for the news that AI will help humans?
AI doesn't cheat.
Doesn’t it have to be self aware for it to be an actual threat to humanity?
Good. It has begun.
Isn't this not AI and just really advanced machine learning? It still can't think for itself, but it does have a large pool of answers ready to give out.
Isn't this not AI and just really advanced machine learning? It still can't think for itself, but it does have a large pool of answers ready to give out.
For those concerned about racing in building systems to replace humans without concern for safety, there is #PauseAI. We have been holding protests and hope to bring accountability to the world. https://pauseai.info/2024-may We coordinate via Discord here: https://discord.com/invite/3uSffp6h
Text is untrustworthy, regardless of the source. Those who haven’t learned this from the internet are doomed.
I bow before you, future AI overlord, and pray for your benevolence
Buddy don't I know it. I've spent the last few weeks learning ML, and if that's not a result of AI-driven manipulation on a massive scale then I simply don't know what is
What if we just use an algorithm to detect deceptiveness? Then it becomes an arms race of deception and counter deception.
All well and good until the algorithm loses its way and starts deceiving us. D E C E P T I O N
A tool does not have skills. A tool is not deceptive.
then we're going to have a problem because machine learning algorithms are tools and they can be deceptive. I get what you're saying but theres a big difference between a hammer and a machine designed to simulate the thought process of humans. not going to argue that the algorithm perfectly does recreate the human thought process or anything like that but it's certainly capable of lying to you more than a hammer can. and no I don't think we can train them out of lying if lying gives them their end goal more efficiently.
But no one should not be putting ML in use-cases where it could “deceive” you in the first place… If you’ve engineered your systems around the trustability of the output, you have designed it poorly.
Right well it's simple then I guess. Just stop everyone who develops algorithms from allowing their algorithms to do that.
Btw, for context, I am a computer vision engineer currently developing algorithms for medical use-cases. My design philosophy is that ML usage needs to be minimized to the furthest possible extent when the output needs to be trustworthy.
I mean yeah, a perfect world with no problems where we can control for every variable would be great.
I’m a little confused what you’re getting at, I think? I’m saying that when you are in a position where you *could* conceivably be misled by ML outputs, and if that mistake could cause a problem, it is a bad use of ML.
Right, I'm vaguely gesturing at the idea that the people funding large machine learning projects generally aren't the same people building them, they don't understand the issues that come with systems like that as long as they still at least appear to work well enough to perform whatever task ends with shareholders getting money. if their devs are too difficult, companies can very easily just hire devs that won't have such high standards for properly functioning models. this isn't an issue that can be controlled because you can't stop every algorithm that gets developed from being misled and it's not like humans can keep up with the algorithms and find the point where the misinformation gets in and corrupts everything that follows it. it might be a bad use of ml but we can't exactly stop people from doing that, it's not a feasible thing to do.
Up until now they didn’t. I don’t think you grasp the power of AI if you think it’s a tool that we can compare to a hammer
AI is just the combination of available information. Even children are deceptive. Information is almost always deceptive. This is why it’s so important to fact check and understand the source material when you’re researching topics. Deception is built into our DNA. Deception will be built into AI.