Man and machine: Why unsupervised machine learning won't solve all your security problems

by Fahim Afghan
Published on 22nd Jun 2021

I read a fascinating Financial Times article a couple of years back. It revealed that 40% of self-proclaimed AI or machine learning-enabled European start-ups actually offered no evidence of such technology.

I recently flagged this stat to a Security leader within Financial Services. She laughed while acknowledging she would roll her eyes every time someone champions machine learning as a silver bullet that will solve all her problems.

We were talking about email security and, in particular, an incident that her team had to resolve where an employee had emailed a highly sensitive document to the wrong recipient. Not only was there a potentially reportable breach, but a lot of effort, resource, and cost then went into preserving a relationship with a very important and lucrative customer (the right recipient!) whose data had been compromised.

Thankfully for her business, the partnership persevered. But it was the latest example of a legacy technology struggling to cope with today’s insider risk. Now more than ever, people make mistakes and traditional DLP solutions built on static rules just aren’t sophisticated enough to detect and resolve the nuances of human error – unless you have an army of administrators with the bandwidth to configure policies to an incredibly granular and complex level. But let’s face it; who has that?

Unsupervised machine learning mitigates a proportion of risk – but it’s not a silver bullet on its own

And that’s when the topic of machine learning reared its head. We’re now able to utilise advances in technology such as social graphing – pioneered by the likes of Facebook and LinkedIn – that allow us to continuously learn and accurately model relationships. The algorithms learn and deeply understand who we connect with and how we interact.

If applied to the email security incident referenced above, this technology would have detected an anomalous relationship, i.e. a wrong recipient, and then alerted the employee in real-time to ultimately avoid a potentially serious and costly security breach. Rules were not required as previous user behavior would have informed the algorithms, in turn giving them the ability to spot an anomalous recipient.

In effect, this is ‘unsupervised machine learning’ which runs by itself, continuously updating its hypotheses as more information becomes available. Its ability to ingest, process, and analyse vast amounts of information is what makes it so valuable from a productivity point of view, as no human can compete with its output. And on top of that, it solves a significant source of email security risk.

However, anyone who tells you unsupervised machine learning is a silver bullet that solves all email security risks is wrong. 

Supervised machine learning: the best of man and machine

Let me give you an example. What if the employee at our Security leader’s financial services firm had entered the right recipient but mistakenly attached the wrong sensitive document? Perhaps another customer’s invoice or investor information. Relying on unsupervised machine learning alone would not have been enough to detect the mistake, as the algorithms would not have spotted any anomalies with the given recipient. No attention would have been paid to the nature of the content included in the wrong attachment.

Sometimes, machine learning is only as smart as the information it holds. On a recent webinar with us, Mike Duff, Chief Security Officer at Harneys, likened machine learning to a toddler: “It has certain abilities, but you have to teach it along the way.” The algorithms will require input from security experts in the form of helpful policies that coach them to know right from wrong. In effect, ‘supervised machine learning’.

That way, the technology can detect both anomalous recipients and when email content – whether the subject line, message body or even data buried within an attachment – doesn’t match with the intended recipient. For example, a name, social security number, or bank details. This is very different from traditional DLP solutions that apply a one-size-fits-all approach to content inspection, putting a blanket rule over keywords or expressions, regardless of the user, their role within the business, and the relationship they have with the recipient. Supervised machine learning allows us to take context into account, in turn removing email security risks without the disruption of endless prompts and false positives.

Coaching algorithms with the best security expertise

Unsupervised machine learning is not a silver bullet in itself: despite offering many benefits, it does not fully solve the puzzle of human error on email. But when we supervise and train the model with expert inputs, we can then cater for the broader complexities of mistakes over email and truly eradicate human error security breaches.