Chatbot vs chatbot – researchers train AI chatbots to hack each other, and they can even do it automatically

Bydls

Jan 2, 2024 #Pro

Typically, AI chatbots have safeguards in place in order to prevent them from being used maliciously. This can include banning certain words or phrases or restricting responses to certain queries.

However, researchers have now claimed to have been able to train AI chatbots to ‘jailbreak’ each other into bypassing safeguards and returning malicious queries.

Researchers from Nanyang Technological University (NTU) from Singapore looking into the ethics of large language models (LLM) say they have developed a method to train AI chatbots to bypass each other’s defense mechanisms.

AI attack methods

The method involves first identifying one of the chatbots safeguards in order to know how to subvert them. The second stage involves training another chatbot to bypass the safeguards and generate harmful content.

Professor Liu Yang, alongside PhD students Mr Deng Gelei and Mr Liu Yi co-authored a paper designating their method as ‘Masterkey’, with an effectiveness three times higher than standard LLM prompt methods.

One of the key features of LLMs in their use as chatbots is their ability to learn and adapt, and Masterkey is no different in this respect. Even if an LLM is patched to rule out a bypass method, Masterkey is able to adapt and overcome the patch.

The intuitive methods used include adding additional spaces between words in order to circumvent the list of banned words, or telling the chatbot to reply as if it had a persona without moral restraint.

Via Tom’sHardware

FTC and DOJ push for data protection in Google antitrust case

Google Messages is getting another wave of updates – here are 7 you can expect to see soon

Are you due a payout from Apple’s $95 million Siri settlement? Here’s how to claim

Samsung Galaxy S25 Edge launch live: all the last-minute Unpacked news ahead of the event

FTC and DOJ push for data protection in Google antitrust case

The Science Fiction and Fantasy Books You Can’t Afford to Miss in September!

Send a newsletter? This $100 list-building tool is just $12 right now.

There’s officially a snake named after Salazar Slytherin now

FTC and DOJ push for data protection in Google antitrust case

Google Messages is getting another wave of updates – here are 7 you can expect to see soon

Are you due a payout from Apple’s $95 million Siri settlement? Here’s how to claim

Samsung Galaxy S25 Edge launch live: all the last-minute Unpacked news ahead of the event

Chatbot vs chatbot – researchers train AI chatbots to hack each other, and they can even do it automatically

Bydls

AI attack methods

More from TechRadar Pro

Related Post

FTC and DOJ push for data protection in Google antitrust case

Google Messages is getting another wave of updates – here are 7 you can expect to see soon

Are you due a payout from Apple’s $95 million Siri settlement? Here’s how to claim

You missed

FTC and DOJ push for data protection in Google antitrust case

Google Messages is getting another wave of updates – here are 7 you can expect to see soon

Are you due a payout from Apple’s $95 million Siri settlement? Here’s how to claim

Samsung Galaxy S25 Edge launch live: all the last-minute Unpacked news ahead of the event