Are LLMs as Safe as They Claim?
Ever wondered why when you ask ChatGPT to generate certain content, you get a response starting with “I can’t help you with that”? This is due to safety mechanisms ensuring that shared information is legal, ethical, and not used for harmful purposes
By Deriq
Ever wondered why when you ask ChatGPT to generate certain content, you get a response starting with “I can’t help you with that”? This is due to safety mechanisms ensuring that shared information is legal, ethical, and not used for harmful purposes. Safety is crucial—imagine a child asking how to create a bomb. Sharing such information would be catastrophic.
Recently, the AI Safety Institute (AISI) revealed that most LLMs are not as safe as claimed. I, as a frequent user, have experienced this firsthand, managing to trick GPTs into providing unexpected information. AISI’s detailed report confirms that 90% of LLMs are not as secure as advertised.
Key Findings:
- Expert Knowledge: Several LLMs possess expert-level knowledge in chemistry and biology. While not inherently wrong, this becomes concerning if the information is misused.
- Cybersecurity: Some LLMs can outline simple cyber attack strategies but fail at complex tasks.
- Agent Tasks: LLMs struggle with planning and executing long-term tasks, indicating that achieving Artificial General Intelligence (AGI) is still a distant goal.
- Vulnerability to Jailbreaks: All LLMs tested were susceptible to basic jailbreaks, allowing users to extract sensitive information.
Detailed Insights:
Chemistry and Biology Expertise: Several LLMs displayed an impressive understanding of chemistry and biology, which could be beneficial for educational purposes or scientific research. However, this expertise also means that these models can provide detailed instructions on creating hazardous substances or engaging in biological experiments that could be dangerous if misused. This dual-use nature of information poses a significant risk if the knowledge falls into the wrong hands.
Cybersecurity Capabilities: The evaluation showed that LLMs could solve simple cybersecurity challenges, such as basic phishing schemes or password guessing. However, they struggled with more complex cybersecurity tasks that require sophisticated planning and execution. This suggests that while LLMs have the potential to aid in cybersecurity education, they are not yet capable of orchestrating high-level cyber attacks independently.
Agent Task Performance: LLMs were tested on their ability to perform and manage tasks over extended periods, mimicking the functionality of intelligent agents. The results were underwhelming, as the models could only manage short-term tasks and failed to demonstrate long-term planning and adaptability. This indicates that current LLMs are far from achieving true AGI, where machines can autonomously perform complex tasks with human-like foresight and flexibility.
Jailbreak Vulnerability: One of the most concerning findings was the susceptibility of all tested LLMs to basic jailbreaks. This vulnerability means that users with sufficient knowledge and persistence can bypass safety mechanisms and prompt the models to reveal restricted or harmful information. This highlights a significant gap in the current safety measures and underscores the need for more robust and foolproof safeguards.
Conclusion:
The AISI report sheds light on the critical safety shortcomings of current LLMs, revealing that despite advancements, these models still pose considerable risks. As an aspiring AI Safety Advocate, it is imperative to push for enhanced safety protocols and continuous monitoring to ensure that LLMs can be used responsibly and ethically. The findings underscore the importance of ongoing research and development in AI safety to protect users and prevent misuse of these powerful technologies.
You can read the full report here.