70% to 90% of All Malicious Breaches are Due to Social Engineering and Phishing Attacks

iStock-1142845130 If you’ve heard me speak the last two years, read any of my articles, or watched any of my webinars, you’ve probably heard me say, “Seventy to ninety percent of all malicious breaches are due to social engineering and phishing!” I say it all the time because it’s true.

Note: I usually include that unpatched software is responsible for 20% to 40% and everything else put all together accounts for 1% to 10% of the risk.

Many people ask me to send them the link for that data point. I can’t, because it’s my own research, and I can’t share it because it contains confidential data for which I’m under NDA by others But I’m not even asking you to believe me, solely on what I say, because I work for an organization that sells anti-social engineering training for a living, and I could be biased. I’m asking you to ask yourself about when a hacker or malware got through your defenses, over your career and own personal experiences, how did it happen? It was probably social engineering and unpatched software, with social engineering leading the way. When you hear about a big compromise in the news, how did it happen? Probably social engineering and unpatched software.

Yes, there are cybersecurity incidents that don’t involve social engineering or unpatched software, but they are minor issues. Yes, some organizations get compromised due to insider threats, misconfigurations, password guessing, eavesdropping, and physical attacks. But when you compare the number of attacks, there is a clear winner for how most of the attacks happened, by far.

My Research

My research involved downloading the world’s largest public data breach database, from the Privacy Rights Clearinghouse. It has been keeping track of breached databases since 2005. It has kept track of over 11.6 billion breached records from thousands of individual events.

I downloaded the database into a local Microsoft Excel spreadsheet, deleted the columns I didn’t care about, and sorted by number of involved records. Then I looked at the root cause for each incident. To be clear, many incidents didn’t include a root cause. And more importantly to my cause, the causes in the database didn’t always neatly track to the root cause categories I have identified as the true root causes. For example, one of the many root causes of breaches was classified as “ransomware”. Well, ransomware is not a root cause. It’s an outcome of a root cause.

So, in all the cases where the root cause was not identified, I researched the related news articles, other required data breach reporting databases and reports, and called and emailed those involved. Not everyone wanted to talk with me. I had a lot of bounced emails and non-replies. I went out of my way to explain why I was doing what I was doing and promised not to release any personal information or replies with more information. Many made me sign an official NDA. Most just took my word in email or over the phone. Some still refused to tell me. And a small percentage told me they did not know how it happened.

I then broke down the root causes into two big categories, which tracked if the breach was caused by a malicious act or could lead to the records being used maliciously, or not. For example, if the “breach” was due to someone leaving records behind in an old office for a month after a move before being discovered, I didn’t consider that a “breach”. If they simply threw the records away in a dumpster, I did not consider that a breach unless it was reported that someone found them or the records were reported as being used in any way. If the “breach” was simply someone accidentally sending the records to someone else who did not use them maliciously, I did not consider that a breach.

On the same hand, if ransomware happened, I considered those records a malicious breach, even if all that was reported that happened was encrypted data held for ransom. I assumed that the ransomware gang had full control of the data and could have compromised it. I considered an unsecured website or data storage bucket found and reported by a white hat hacker a malicious breach even if there was no report of anyone maliciously finding and using the same export. In reality, the vast majority of these “breaches” never end up being used by anyone maliciously. I was essentially trying to make a risk decision about whether or not the breach had a reasonable chance of being used maliciously. So, on that account, it was my own personal assessment.

It took me months of data digging and back and forth conversations before I had my data. Again, I ruled out “non-malicious” data breaches. That’s why I say, “Social engineering and phishing account for 70% to 90% of MALICIOUS breaches”. I want to be clear in what I’m measuring.

And when I got through with my research, 70% to 90% of all malicious data breaches were due to social engineering of some type. The 70% to 90% figure difference comes from two things. First, it depends on the period of time and second it depends on how I counted data breaches. If I counted it from purely a number of overall incidents (and not per record), then the figure was higher. If I displayed the data on a per record basis, it was lower. The latter happened because the Equifax and other HUGE incidents, which exposed over 100 million records, often happened because of unpatched software. Also, many of the ransomware incidents happened because of unpatched software (Remote Desktop Protocol (RDP)) or password guessing against RDP or SSH (Secure Shell), although the number of records compromised was often much smaller in these latter cases.

Note: There is a huge, glaring, known misstatement in statistics here that likely works in my statement’s favor. It’s that the majority of “casual and normal” malware infections (those that made it past the anti-malware scan even if just for a minute) come from social engineering and unpatched software. Those “regular” infections, which happen to nearly every organization in the world on a routine basis rarely make it into data breach reporting databases. Clearly, if we include most malware infections, the rate of “breaches” including those exploitations would likely push the overall statistics to something closer to higher end (90% to 99%) more frequently.

Why Can’t I Share My Data?

A few researchers have asked why I can’t share my data. I agree, it is an issue. The biggest reason is that I would have to anonymize my data so much that it would not be useful. In many cases, including just the number of records compromised would be enough to let any reader know whose data breach it was, and in doing so, I would potentially be breaking my promises and NDAs. I also thought about rounding the figures up or down to obscure the exacted breached records’ count, but doing that across 12,000 separate entries just takes a lot of wasted time, and I’m not sure that would be anonymized enough. More importantly, I think that anyone who cares enough about this should just do their own research. Download the database, sort any way you want, and start looking for root cause trends. You’ll likely agree with me that most data breaches are caused by social engineering one way or another.

Metasurvey of Surveys

Or if you don’t want to do the work…and I understand that…it took me months to do it, download and read my KnowBe4 colleague Javvad Malik’s threat intelligence whitepaper. He looked at over 100 different cybersecurity incident reports and surveys, each which claimed to summarize what the biggest root causes were. Although they all disagreed on the actual percentages, they were each assigned to a root cause category – all 100 said social engineering was the number one problem, by far. There wasn’t one who disagreed. So, if you don’t believe me and my “secret” data, look at any of the 100 reports that Javvad reviewed. It’s a great read that backs up my personal conclusion.

But more importantly, even if you don’t believe what Javvad and I are saying, after all, we both work for KnowBe4, and KnowBe4 is trying to sell you anti-social engineering training software and services. Instead, if you are still skeptical, ask yourself what your own experience has been in your career or attacks you’ve read about. I’ve yet to meet a person who disagreed with the statement that social engineering is the number one cause of most security breaches.

Note: Interestingly, the number three root cause varies greatly depending on the report, survey, and individual company. Sometimes it’s insider threat. Sometimes it’s misconfiguration. And sometimes it’s denial of service problems.

It’s Always Been This Way

I’ve been doing computer security for over 32 years. For the entirety of my career, social engineering and unpatched software have been the number one and number two reasons why computers and people get compromised. They have switched positions over the years. For nearly a decade, one unpatched software program, Sun/Oracle Java, was responsible for 91% of all malicious web breaches alone…by itself. But eventually browser and OS vendors responded and put down the threat from unpatched Java, and since then, social engineering and phishing has regained the number one spot. That’s because it works so well, and it works across any platform; whether you are running Microsoft Windows, Apple, Linux, Chrome OS, or some other portable device.

Social engineering has been involved as the leading cause of criminality since the beginning of man. Being human means social engineering will always be around. Maybe not as the number one cause of most cybersecurity incidents, but it’s been number one for a long time and nothing I’ve seen has shown me that is likely to change any time soon.