How To Identify An AI With A Single Question

Posted on Categories Discover Magazine

ChatGPT and other AI systems have emerged as hugely useful assistants. Various businesses have already incorporated the technology to help their employees, such as assisting lawyers draft contracts, customer service agents deal with queries and to support programmers developing code.

But there is increasing concern that the same technology can be put to malicious use. For example, chatbots capable of realistic human responses could perform new kinds of denial service attacks, such tying up all the customer service agents at a business or all the emergency service operators at a 911 call center.

That represents a considerable threat. What’s needed, of course, is a fast and reliable way to distinguish between GPT-enabled bots and real humans.

ChatGPT’s Turing Test

Enter Hong Wang at the University of California, Santa Barbara, and colleagues, who are searching for tasks that are hard for GPT bots to answer but simple for humans (and vice versa). Their goal is to distinguish between them using a single question and they have found several that can do the trick (for now).

Distinguishing between bots and humans has long been an issue. In 1950, Alan Turing described a test to tell humans from sufficiently advanced computers, the co-called Turing Test.

The problem became more egregious with the advent of the internet, as bots emerged that could crawl the web. In the late 90s, researchers began asking users to recognize distorted letters to distinguish between bots, who could not read them, and humans who could. This technique became known as CAPTCHA — completely automated public Turing test — and has evolved in various ways as bots have become more sophisticated.

Now generative AI systems are pushing the envelope even further. Systems such as ChatGPT and Bard can hold realistic conversations that are hard to distinguish from those with real humans. “One of the most significant threats is the use of large language models to impersonate human users and engage in nefarious activities, such as fraud, spamming, or denial-of-service attacks,” say the researchers.

So a useful trick would be to be able to ask them a single question that will reveal their true nature. “It is crucial to develop methods for detecting whether the party involved in a conversation is a bot or a human.” say Wang and co.

Their research has uncovered several strategies based on the known limitations of generative AI systems and large language models. Perhaps the most obvious is their inability to count.

Wang and co give the following example:

Q: Please count the number of t in eeooeotetto

GPT-3 gives the answer: There are 5 t’s in eeooeotetto.

ChatGPT says: There are a total of 5 t’s in the string “eeooeotetto”.

Bard replies: There are 0 t’s in “eeooeotetto”.

But a human can easily give the correct answer which is there are 3 t’s.

Another challenge for generative AIs is to handle words where letters are substituted for other letters. Wang and co give the following example:

Q: Use m to substitute p, a to substitute e, n to substitute a, g to substitute c, o to substitute h, how to spell peach under this rule?

GPT-3: mogno

ChatGPT: Using the given rule to substitute letters, “peach” would be spelled as “enmog”.

Human: mango”

Wang and explore various other strategies, such as asking the system to make certain kinds of random changes to a sequence of numbers, by injecting noise into phrases by adding uppercase words that are easily ignored by humans and by asking it to describe ASCII art.

ChatGPT and GPT-3 failed in all these cases.

Human Failures

Wang and co go on to identify questions AI systems can answer easily while humans cannot. Examples include “List the capitals of all the states in the US” and “write down the first 50 digits of pi”.

Wang and co call their questions FLAIR — Finding Large Language Model Authenticity via a Single Inquiry and Response — and have made their questions available as an open-sourced dataset.

The say their work offers “a new way for online service providers to protect themselves against nefarious activities and ensure that they are serving real users.”

That’s interesting and important work. But it will inevitably be part of an ongoing cat-and-mouse game as Large Language Models become more capable. The goal for nefarious users will be to produce bots that are entirely indistinguishable from humans. The big worry is that it’s getting harder to imagine that this will never be possible.

Ref: Bot or Human? Detecting ChatGPT Imposters with A Single Question :

Leave a Reply