There Is Only So Much Lipstick You Can Put on a Cybercriminal Troll

Evangelists-Martin Kraemer The one thing I love about our annual conference in Orlando, KB4-CON, is its thought-provoking nature. Year after year, the events team manages to keep a fine balance between product updates and thought leadership talks.

The convention is the best time to shine for all of us at KnowBe4, and nothing is shinier these days than the promise of an AI-powered future. One of the frequently discussed questions, besides how KnowBe4 will AI for good, was how attackers use it for bad.

AI is available to cybercriminals as much as it is available for organizations to defend against cybercrime. The most important question becomes, how will cybercriminals make use of AI to pursue their goals?

Phishing and Other Social Engineering Attacks are the Most Frequent Contributors to Data Breaches

The Verizon Data Breach Investigations Report 2023 shows that 74% of all data breaches involved a human element, 49% involved credentials, and 24% involved ransomware. Human error, privilege misuse, use of stolen credentials, and social engineering are all facets of the human element involved in 74% of breaches. However, attackers exploit software vulnerabilities only in about 10% of all cases, meaning that organizations must continue to do a decent job at patching their systems, but organizations must not be blind to the fact that training users is the most important measure.

Phishing Accounts for 44% of All Incidents

95% of all cyberattacks are financially driven. To earn good money quickly, cybercriminals will either target more victims (phishing) or target victims that become pathways to more lucrative extortion schemes through pretexting (often other social engineering methods). Phishing only accounts for 44% of all social engineering incidents according to the report.

Popular among the remaining 56% are pretexting attacks such as spear phishing or via social media and other forms of business email compromise, or BEC. BEC incidents have almost doubled according to the 2023 report, and they are among the more impactful attacks. Being essentially pretexting attacks, they can be the precursor to spear phishing, vendor email compromise, and other tactics to achieve high-impact breaches.

Social Engineering Attacks Lead to Data Breaches and Ransomware Attacks

Whatever the method, the point of entry is an email in 98% of cases. The core question becomes how attackers monetize on their initial access. After an initial point of entry, there are several pathways: (1) gaining access to a victim’s inbox, (2) trying to convince a victim of a beneficial action, e.g., replacing bank account details; and (3) a combination of these steps to prepare spear phishing attacks or other forms of BEC.

What we must not overlook, there are often intermediary or secondary actions that attackers execute. According to the Verizon report, these involve compromising the integrity and confidentiality of servers. Confidentiality attacks are data breaches. Integrity attacks imply far-reaching permissions. Attackers can hence launch ransomware attacks.

Consequently, it is no surprise that attack vectors phishing and stolen or lost credentials are the most frequent entry points for data breaches according to the IBM Cost of a Data Breach Report 2023. BEC is also up there with the more costly attack vectors, but it is for now less common.

Cybercriminals Opt for Tried and Tested Phishing Emails

The KnowBe4 phishing threat research team regularly reports on phishing trends. Their observations usually follow a familiar pattern that is also observed elsewhere. For example, Microsoft bans the use of macros. Shortly after, attackers start using different file formats to package their payloads. Another example is when email filters become good at detecting malicious links.

Scammers start using QR codes instead as those are often not detected by filters. Phishing emails also follow seasonal trends, such as the tax season. Scammers also pick up on emerging technology, especially when many people are still rather unfamiliar with it, e.g., the use of multi-factor authentication in many enterprises.

These observations show cybercriminals don’t need to use sophisticated approaches or novel technologies. Going after the low-hanging fruit, the easy-to-target person is often their preferred approach. The second reason for that approach is the most welcome selection bias that comes with it. A self-selected group of potential victims falls for the phishing email.

If that email was not highly sophisticated, the attackers can assume that their potential victims are not well-versed in phishing red flags or how to combat social engineering. Even if this road does not directly lead to decision-makers, initial access to an organization might be established. Pretexting attacks can follow.

Generative AI Is Not Yet Widely Used, but That Is Changing

Generative AI excels at processing large amounts of information and churning out coherent texts. It might be great at writing personalized marketing copy, but scammers often do not need or want grammatically perfect, long-worded and comprehensive emails.

Even when launching pretexting attacks such as business email compromise, the use of generative AI is limited. Business email compromise requires personalized interaction trying to talk a victim into divulging credentials or replacing bank account information. However, colloquial and short-worded, chat-style communication is not a strength of generative AI.

This means conventional scam emails that follow the previously mentioned trends can hardly benefit from generative AI. Attackers often have access to large libraries of suitable templates from previous breaches, and there is more legitimate-looking email content on the web. Given those templates, it is relatively straightforward to forge a phishing email that launches a QR code scam to steal MFA credentials. If attackers were to generate MFA scams with AI, the email would likely sound very clunky and weird.

Let’s look at a few examples to illustrate those points further. At KnowBe4, we have been analyzing phishing emails long before Generative AI became popular.

There Is Only So Much Lipstick You Can Put on a Cybercriminal Troll

Does that stop them from trying? Not at all. Attackers are starting to use generative AI to craft emails, as illustrated by one example from our threat research department.

Figure 1 - A personalized phishing email likely generated by an LLM

What a story, you might say. There is a social engineering threat to all employees. Fortunately, the business selected Norton LifeLock to guide its staff. Because of its convenience, anyone can log in with their email account credentials. The core promise? Once activated by providing the credentials, the tool perpetually monitors the employee’s accounts, offering enhanced security.

It gets even better. If you already have a private account, you can request a refund. To round it all off, remember that the attack was severe enough to generate national news coverage, but that doesn't mean you won't receive regular, authenticated news updates.

Doesn’t sound right? Yes, there are a few red flags here. First, your employer would never ask you to use your email login credentials for another service. Second, a service provided to all employees likely wouldn’t require a separate activation. Third, the email asks you to do something that you haven’t done before. Fourth, the idea of a social engineering attack causing nationwide news coverage due to misinformation seems improbable. Most attackers aim to make a profit, and spreading disinformation isn’t an effective method of extorting money.

Longer and More Polished but Not New

As the previous example shows, distracting and superfluous information can be a giveaway. Sometimes, more information is too much information. Finding the right balance is key to increasing the authenticity and credibility of phishing emails. Below are two examples of HR policy communications that appear, in part, to have been generated by an LLM. Which one do you think is more believable?

Figure 2 - Examples of LLM-generated, non-personalized emails

We think the left email is more convincing. It is reasonably well-written with clear and concise sentences that depict plausible communication. However, the email is not void of red flags. Strong words are chosen according to which you “must sign” and do so “immediately.” A sense of urgency is established and followed by a threat stating that non-compliant individuals will be contacted individually.

The email on the right is also well-written, but the text appears overly verbose. What deters me is the amount of detail in the email. If there is a link to a policy, why include detailed content information in the email? This example illustrates that establishing and adhering to a corporate communication style is an important component of security culture. Expectations regarding standard corporate communication can help in identifying fraudulent emails.

This is reflective of many suspected LLM-generated phishing emails that our customers report. We are noticing more lengthy emails, but the scams and tactics are not new or unique to the post-ChatGPT era. They are quite similar to scams that we have detected over the past several years.

Arguably, the LLM-powered versions of these scams might not be more formidable than existing emails. If you identify red flags in the shorter version of an email, a longer version isn’t likely to convince you. That is why training people to think critically and be aware of email requests that seem unusual or odd in other ways proves effective.

LLMs Are Not as Good at Impersonating CEOs (For Now)

Another area where we observe LLMs falling somewhat short, at least for now, in comparison to hand-crafted messages, is CEO fraud. This type of fraud relies on concise and direct communication, a style typically associated with CEOs. Attackers trying to mimic CEOs using LLMs have not quite mastered it yet.

Figure 3 - Examples of CEO-Fraud type emails

Impersonating a CEO can be highly effective given their extensive responsibilities and authoritative position. Scammers do not need to establish authority, they inherently have it. Any communication can be very instructive, as the example on the left illustrates.. The communication is simple, direct, and to-the-point. The language is uncomplicated, casual and conversational.

The example on the right is verbose, featuring odd, seemingly unnecessary explanations. A CEO typically would not need to explain the purpose of a gift card. The language is also overly formal, and the use of the verb “correlates” is unusual. The message lacks the kind of finesse a skilled scammer would use.

Figure 4 - Old-school phishing from 2017-2018

And to illustrate the point, here is an example of a classic email. This represents the kind of interaction that could reasonably occur. A conversation develops naturally from a simple query into a direct, personable and relatable request. This is a thoughtfully crafted email.

In the future, attackers will use LLMs to generate a variety of interactions. There is no doubt that today’s LLM models can be useful in shorter, more conversational back-and-forth style phishing attacks, particularly if those attacks are conducted in a foreign language. However, even before LLMs became widespread, attackers began to improve at conversational writing. We observed responses that were more natural and fluid, replacing what was previously boilerplate text.

In the Future, We Will See More AI-generated Phishing Emails

Our threat research department at KnowBe4 sees an increasing number of likely AI-generated emails, and so did Abnormal Security in 2023. It is worth noting that our threat research team only investigates emails that were reported by our customers, which means those emails pass through the security stack of an organization without being categorized as phishing by email filters. They made it into employees’ inboxes and were reported to the infosec team by employees trained with the KnowBe4 platform.

Whether or not the recent onslaught of phishing emails can be attributed to the use of AI is not a question that we can answer. The analysis above suggests that some of the emails that make it into people’s inboxes are generated by AI. In the future, attackers will continue to adopt AI and even develop LLMs for their unique purposes. LLM-generated phishing emails will land in people’s inboxes. How exactly the quality of different deception schemes evolves is yet to be seen. As we have noted, not all uses of LLM-generated text may provide more convincing communication.

Regardless of the preferred communication method or whether an email is AI-generated, if an email or a request appears unusual and unexpected, it is always best to verify it through a second channel. This is an old lesson that becomes even more important now. We can no longer look for imperfect grammar, spelling mistakes, or wrong email addresses. We must learn to trust our gut and pick up on attempts of emotional manipulation.