Bias and phishing: How machine learning can protect us

Kevin Tunison | 16th Jun 2021

Bias exists in every facet of our lives and is a significant challenge when developing AI/machine learning systems, specifically unconscious bias. In this post, I’ll share some of the challenges from a data governance viewpoint on how to address this difficult (and often overlooked) aspect of machine learning systems.

Understanding unconscious bias

Our bias (that we communicate knowingly or not) is a key factor we need to recognise and look out for if we are to build and maintain trust in reducing risk in the human layer.

First, let’s start with some examples of unconscious bias.

You could point to instances of ‘microaggressions’ as an example of unconscious bias. This is where unintended comments show prejudice to a marginalised group.

We can also take a more quantitative approach. Consider the number sequence 2, 4, and 6. If you were asked to guess the next numbers, most of us would generate 8, 10, 12, and so on. This precondition is entirely valid, but the only rule we can state as a fact is that the numbers increase in value. So, the next number in that sequence being a 7, 9, or 11 is just as plausible. We have simply assumed the undefined rules are defined. This is an unconscious bias in action.

So, you can see how unconscious bias forms a part of our everyday lives; the next question is whether we can use technology to combat these unconscious biases when it comes to data security. We can indeed – but first we need to trust it.

Trust in technology

Trust has always been one of the defining features of being human. Trust (and distrust) has been a fundamental tool in our survival. While we have come a long way as a society, we continue to rely on trust to navigate our lives – and nobody likes being taken advantage of.

Every technological innovation has been met with a healthy dose of cynicism and distrust. For example, it was once thought that travelling on trains going faster than 50mph would cause all sorts of problems for the human body. Everything from organs flying out to whole bodies melting – and we can now look back and laugh at this!

Similarly, fears arose as we introduced automobiles, telephones, flight, computers, mobile phones, space travel… and the list could go on. Fear-driven mistrust is always a popular talking point.

How is trust in tech gained and broken?

Our disposition to trust builds gradually over time. The more frequently we see others navigating new technology safely, the more we build trust through verification of the following factors:

  • Accuracy
  • Reliability
  • Resiliency
  • Security
  • Safety
  • Accountability
  • Privacy

This isn’t to say we should assume that once we have trust in something it cannot be broken. As a reader of this article, you will be familiar with technological incidents (whether that is hacking, cyber espionage, ransomware, etc.) that force us to re-evaluate our trust in computing.

In fact, it is our trust that creates a key risk in AI/machine learning. If that system we have become reliant on is no longer available, we will likely lack either the know-how or experience to take the controls. Ask yourself how many people you know that change the tyres and oil with their own cars these days compared to 40 years ago. A car breakdown likely results in a third party for recovery.

The point is, we have accepted these changes and have now entrusted those third parties so we can spend more time doing things that matter. We can do the same thing with data security.

Manipulating bias for cybercrime

Cybercriminals rely on our bias as a method of manipulation. As criminals learn the rules and structures in place to discover their activities, their methods change. For example, with email phishing we have watched attackers go to great lengths to fool users into divulging confidential information.

These include creating fake (but very realistic) login pages and staging legitimate emails from inside an organisation with a compromised account. Or purchasing trusted encryption certificates and registering businesses to add perceived legitimacy to their scams. Some will copy invoice templates and change account numbers for financial gain.

The days of a far-off prince bestowing his wealth for the small price of an initial investment are long gone. Even more plausible scams such as 80% discounts on popular products are less effective. That is because there are additional tools automatically filtering out these rudimentary attempts, and the attackers have had to evolve.

Defending ourselves with machine learning

We must implement tools to prevent users from falling for more sophisticated forms of phishing, including those leveraging our unconscious bias. Tools such as Egress Defend are also able to educate users on why specific emails have been flagged as risky.

Defend uses machine learning to analyse the content and context of every email – without the biases that affect us all. That means it can impartially and effectively step in to offer advice in those crucial moments where cybercriminals are hoping for a slip in concentration.

Adding this additional layer of protection means we can better prevent insider risk, giving people a tool they can trust to keep them safe while sharing data. As a result, we’ll see happier and more productive colleagues. They can spend less time examining every email received and rely on trusted solutions to find these patterns of phishing at lightning-quick speed.