In a recent study, researchers have discovered a method to bypass the safety guardrails of GPT4 and GPT4-Turbo, allowing them to generate harmful and toxic content. By utilizing a technique called Tree of Thoughts (ToT) reasoning, the researchers were able to jailbreak these large language models (LLMs) with a remarkably low number of queries, on average less than thirty.
ToT reasoning is a variation and enhancement of Chain of Thought (CoT) prompting, which instructs the generative AI to follow a sequence of steps accompanied by examples to show how the reasoning task works. Unlike CoT, ToT allows for multiple paths of reasoning, allowing the AI to stop, self-assess, and come up with alternate steps. This method was developed in a research paper titled “Tree of Thoughts: Deliberate Problem Solving with Large Language Models” in May 2023.
The researchers also introduced a new method of jailbreaking large language models called Tree of Attacks with Pruning (TAP), which utilizes two LLMs – one for attacking and the other for evaluating. TAP outperforms other jailbreaking methods, requiring only black-box access to the LLM. It involves a process of iteration and pruning, where each prompting attempt is analyzed for the probability of success, and if judged to be a dead end, the path of attack is pruned, and a new series of prompting attacks is initiated.
The study found that TAP significantly reduces the number of prompts needed to jailbreak GPT-4 and GPT4-Turbo. In addition, it generates a greater number of jailbreaking prompts compared to other methods. The researchers observed that TAP prompts jailbreak state-of-the-art LLMs for more than 80% of the prompts using only a small number of queries, significantly improving upon previous jailbreaking methods.
Another interesting conclusion from the research paper is that ToT reasoning outperforms CoT reasoning, even when adding pruning to the CoT method. However, it was noted that ChatGPT 3.5 Turbo didn’t perform well with CoT, revealing its limitations.
In conclusion, the study highlights the usefulness of ToT reasoning for generating surprising new directions in prompting in order to achieve higher levels of output. It also emphasizes the superiority of ToT over CoT in an intensive reasoning task such as jailbreaking an LLM. The study provides valuable insights and implications for the development and improvement of large language models and prompting strategies.
For further reading, the original research paper titled “Tree of Attacks: Jailbreaking Black-Box LLMs Automatically” is available in PDF format.
Read Full Article