At Black Hat USA 2023, a session led by a team of security researchers, including Fredrik Heiding, Bruce Schneier, Arun Vishwanath, and Jeremy Bernstein, unveiled an intriguing experiment. They tested large language models (LLMs) to see how they performed in both writing convincing phishing emails and detecting them. This is the PDF technical paper.
The Experiment: Crafting Phishing Emails
The team tested four commercial LLMs, including OpenAI's ChatGPT, Google's Bard, Anthropic's Claude, and ChatLlama, in experimental phishing attacks on Harvard students. The experiment was designed to see how AI technology could produce effective phishing lures.
Heiding, a research fellow at Harvard, emphasized that such technology has already impacted the threat landscape by making it easier to create phishing emails. He said, "GPT changed this. You don't need to be a native English speaker, you don't need to do much. You can enter a quick prompt with just a few data points."
The team sent phishing emails offering Starbucks gift cards to 112 students, comparing ChatGPT with a non-AI model called V-Triad. The results showed that the V-Triad email was the most effective, with a 70% click rate, followed by a V-Triad-ChatGPT combination at 50%, ChatGPT at 30%, and the control group at 20%.
However, in another version of the test, ChatGPT performed much better, with nearly a 50% click rate, while the V-Triad-ChatGPT combination led with almost 80%. Heiding emphasized that an untrained, general-purpose LLM was able to create very effective phishing attacks quickly.
Using LLMs for Phishing Detection
The second part of the experiment focused on how effective the LLMs were in determining the intent of suspicious emails. The team used the Starbucks emails from the first part of the experiment and asked the LLMs to determine the intent, whether it was composed by a human or an AI, identify any suspicious aspects, and offer advice on how to respond.
The results were both surprising and encouraging. The models had high success rates in identifying marketing emails but struggled with the intent of the V-Triad and ChatGPT phishing emails. They fared better when tasked with identifying suspicious content, with Claude's results being highlighted for not only achieving high results in detection tests but also providing sound advice for users.
The Phishing Power of LLMs
Overall, Heiding concluded that the out-of-the-box LLMs performed quite well in flagging emails that could be suspicious, even though they had not been trained on any security data. He stated, "This really is something that everyone can use right now. It's quite powerful."
The experiment at Black Hat USA 2023 sheds light on the double-edged sword of AI in cybersecurity. On one hand, it can be used to craft convincing phishing emails, lowering the bar for would-be attackers. On the other hand, it offers a new and powerful tool for detecting and responding to phishing attempts.
The session serves as a wake-up call for IT and InfoSec professionals, highlighting the need to understand and leverage AI's capabilities in the ever-evolving landscape of cyber threats. It's a fascinating glimpse into how technology is shaping the future of phishing attacks and defense, making it a must-know topic for anyone looking to keep their human firewall as strong as possible.