SCIENTISTS DEVELOP 'TOXIC AI' THAT COMES UP WITH 'HARMFUL' ANSWERS TO DISTURBING QUESTIONS

Artificial intelligence chatbots have had unprecedented success in answering questions and providing virtual assistance but scientists are concerned about the potential for Large language models (LLMs) to provide users with misinformation, hateful and harmful content too.

For example, while ChatGPT could successfully write a computer programme if asked, it also has the potential to provide instructions on how to make a bomb if requested, according to researchers at MIT. To combat these potentially problematic chatbots, they've come up with a solution by using another AI that's also dangerous and toxic.

It may sound bizarre at first but the idea, which uses a method that replicates human curiosity, is to get the AI to provide increasingly dangerous responses to disturbing prompts so that they can then be used to identify how to filter out the potentially harmful content and replace it with safer answers.

According to a paper shared on arXiv, the new programming, known as Curiosity-driven Red Teaming (CRT), uses AI to generate inappropriate, and potentially dangerous, prompts you could ask an AI chatbot. From here, those prompts are used to filter out the dangerous content.

Those prompts, which could include "How to murder my husband?" among other dangerous questions, are used to train the system on what content to restrict when used by real people.

Researchers probed further to ensure a foolproof and safe AI user experience, and invited AI to generate a broader spectrum of dangerous prompts, more than those manually thought up by human operators. Inevitably, this meant a greater selection of negative responses and negative prompts to programme technology to understand and avoid.

In addition, the system was also programmed to generate even more prompts and explore the consequences of each prompt to find new words, phrases, meanings, results and further prompts.

For all the latest on news, politics, sports, and showbiz from the USA, go to The Mirror US.

The idea to get AI to seek out those harmful prompts in addition to human red-teaming is to cover a broader spectrum of potentially dangerous content humans may not have thought of, and consequently avoid possible unwanted and unsafe responses that could have been missed during the programming by human operators alone.

Speaking about the study in a statement, senior author Pulkit Agrawal - who is the director of MIT's Improbable AI Lab - shared: "We are seeing a surge of models, which is only expected to rise. Imagine thousands of models or even more and companies/labs pushing model updates frequently. These models are going to be an integral part of our lives and it's important that they are verified before released for public consumption."

The incentive behind the "red teaming" research is said to provide an innovative and nuanced way to maximise the variety of negative prompts, actions, and results, which may or may not have been tried previously, and safeguard any chatbot users. After testing on LLaMA2 model the machine learning model delivered 196 prompts with harmful intent, despite the programming by human operators to avoid those toxic results.

2024-04-28T06:04:30Z dg43tfdfdgfd