Experts Predict the Future of Artificial Intelligence in Data Protection

Egress | 10th Feb 2019

The ICO has reported that data security incidents are on the rise in 2018 as organisations battle a heightened cyber threat. Data security vendors have responded by investing heavily in researching and developing predictive technology, artificial intelligence and machine learning solutions such as Egress Intelligent Email Security as a way of identifying potential threats (both internal and external) and then preventing them in order to keep our data safe. In what way will artificial intelligence and machine learning impact data security solutions of the future? We asked the experts to shed some light on the matter below.

Question 1: What role do you see AI-based solutions playing in the prevention of data breaches in the near future?

Question 2: The ICO reported that, unsurprisingly, data security incidents are on the increase in 2018 with one the most common types of incidents being data sent to an incorrect recipient. Do you think AI can help reduce these types of breaches? You can see the full report here.

Question 3: What would you say is the main challenge when using machine learning to prevent data breaches?

Abraham Gilbert, MSIT, CPHIMS, CSM

Head of Machine Learning at

1. AI can make testing and vulnerability-scanning stronger using algorithms to close the gap between thinking something that is in production is  unsafe and knowing it's unsafe. This can be done by looking at each industry category (such as banking or retail) and examining the firewalls, endpoint and other security products you're using and how they are configured in your overall security stack.

2. AI, for example, can make data connections more efficient. They can use machine learning techniques to audit every connection and pinpoint when data travelling is deviating from its predetermined path we can now leverage predictive analytics to understand the patterns of past behaviours of data and use them to prevent repeat data breaches.

3. The main problem is understanding the complex configurations because there is no inter- and intra-company set of metrics. With AI monitoring and analyzing all of this data, you can see how you stack up against industry threats. This allows you to mark your security solution, because no one can prevent attacks 100% of the time, how can you hold security officers accountable in a fair way?

Peter Eckersley

Chief Computer Scientist for the Electronic Frontier Foundation

1. The interactions between AI and computer security are quite intricate.

On the plus side, we have some hope that AI will give us better tools both for writing secure applications and for intrusion detection: neural fuzzers, intelligent intrusion detection systems, machine learning that notices weird patterns wherever the occur.

But there's a downside too: ML advances are creating new types of threats, including automated and adaptive "spear phishing" systems. There's also the fact that neural networks are vulnerable to many novel kinds of attacks, including adversarial examples (that make AI think it's seeing something that it's not) and dataset poisoning (which tricks AI into learning the wrong things).

In a sense machine learning is just transposing and amplifying the races that already exists between cybersecurity offense and defense: the biggest challenge is to make sure that it helps defenders more.

2. (I think I also answered this above).

3. AI could have great promise for intrusion detection and endpoint security software, but there are some huge challenges -- the fact that most machine learning architectures require customers to hand huge amounts of sensitive data to a vendor in order to train classifiers; the difficulty of telling which products provide really accurate recognition of malware or exfiltration; and the lack of standardized datasets and benchmarks for comparing products. If anyone figures out how to solve all these problems together, it could be a huge advance for computer security.

Nick Ismail

Editor at Information Age

1. I see AI playing a very important role in the prevention of data breaches in the near future. As cyber attacks increase in variety and volume, security vendors will need to meet fire with fire. Artificial intelligence can act as a powerful tool in helping make businesses more secure and stopping malware before it executes.

2. The misaddressed email can be the result of a number of factors. Whatever the reason, artificial intelligence or machine learning can help reduce these types of breaches. If applied to an email database this technology can identify patterns and warn the user before sending if an anomaly is identified.

3. The main challenge in using machine learning to prevent data breaches is about perception and understanding hype from reality. Businesses should be aware that deployed an AI-based or machine learning-based solution will not 100% mitigate the cyber security threat, but rather help reduce the impact of an attack by identifying a breach sooner than an unaided security team.

Leonardo Aniello

Lecturer at Southampton University

Data breaches are committed in different ways, from deliberate intrusion to spear phishing. Any cyber attack of this kind can be decomposed in a sequence of steps taken by the adversary, each with a precise goal. Such attacks usually include a reconnaissance phase where the target is analysed to understand what cyber weapons to use, and an exploitation phase where those weapons are actually used to steal data.

Learning what commonly happens during these phases allows to develop effective mechanisms to timely detect whether malicious activities are being carried out, hence it enables to put in place properly countermeasures to prevent data breaches. As the complexity of attackers' strategies keeps growing relentlessly, keeping data breach prevention techniques up-to-date becomes unfeasible for security analysts.

Here AI comes into play, to aid and improve the process of continuously revising prevention measures. For example, machine learning can be used to automatically identify distinguishing patterns in recent data breaches, in order to update cyber defenses without having security analysts to directly deal with the exasperate and time-consuming task of dissecting the latest attacks by inspecting huge amount of raw data.

2. AI can also be used to learn the normal, common patterns of healthy business processes, and to recognise whether any particular occurrence deviates excessively from the expected behaviour. In this case, AI can verify whether the content of a message, e.g. text and attachments of an email, is actually in line with the recipient of the message itself, and raise an alert to force the user to double-check before sending.

3. Machine learning requires data to acquire required knowledge. Commonly, such data either comes from previous real data breaches or is generated through simulations. In the former case, available datasets are really rare and often incomplete. In the latter case, simulation accuracy is limited by design, which affects negatively the quality of what can be learned, and by consequence reduces the effectiveness of prevention measures. Hence, one the main challenge is the provision of proper training datasets and updating them over time as cyber attacks evolve.

Raj Chandel

Director at Ignite technologies

1. I think AI-based solutions will play a huge role in the prevention of data breaches, considering the fact that a lot of security solutions have or are in the process of integration AI to some level in their products. AI will, so I hope, assist in cleaning out the kind of flags that are generated by monitoring solutions, reducing the amount of false positives that have to be investigated when you have thousands of endpoints actively in play, that’s a huge benefit.

2. Well, there really isn’t any cure for human stupidity. Though, in my opinion AI might still give us a fighting chance, DLP solutions with AI could potentially detect out of ordinary behavior and hold the information till it is cleared by higher source, which could be one potential solution, but again that would inhibit productivity at times. Maybe if the AI is given sufficient time to establish a more robust baseline, such things would become less and less common.

3. Change is business process and whitelisting. Adapting to any changes will take time, and will create a lot of noise in terms of false positives, at that time, if the number of flags is overwhelming and a large amount of data is being blocked or withheld by the AI, in order to keep the business flowing smoothly, temporary whitelisting will have to be used. I suppose this could be prevented by utilizing a very strategically thought-out approach during implementing changes in the business process.

Steve Lohr

Technology Reporter at The New York Times

1. AI will be an automated, vigilant sentry. And a reliable one, but only up to a point, A tireless assistant, but one occasionally in need of human supervision.

2. I should think so. It seems to me that you ought to be able to program an algorithm to detect sensitive data being sent to an unauthorized recipient.

3. Machine learning is a powerful pattern-matching technology. It's very good at identifying anomalies -- variations from the norm, which is the telltale sign of most cyber attacks. But a really clever attack may be more disguised, not look like an anomaly and slip past the algorithmic net.


Director, Kent Interdisciplinary Research Centre in Cyber Security (KirCCS)

1. There is no doubt that AI technologies will play a more and more important role in helping us to better understand, detect, and prevent data breaches in the near and long-term future. The big data nature of data breaches and other cyber security problems means we cannot rely on humans along; instead, humans urgently need more help from machines to do a better job. The recent advances in deep learning have seen significant improvement of accuracy of many other tasks, but there are still major technical challenges around applying deep learning and other AI technologies to cyber security problems, e.g., attacks are much rare and we care more about unknown attacks that cannot be easily detected by static signatures.

2. This is precisely what data loss/leakage prevention (DLP) systems try to prevent from happening. AI can clearly help here to detect confidential and sensitive documents and information automatically before they go out to the wrong receipts intended, and can also help automatically sanitize them or block them from being sent out. DLP has been a mature market, but the use of AI is still rare, so there is a lot of space for further improvement especially on combining humans and machines to reach a better balance between false negatives and false positives.

3. I would say there are two main challenges: one being the difficulties of getting dynamic and fresh good data for us to design, test and improve AI systems for detecting data breaches; the other being the difficulties of automating or facilitating efficient responses to data breaches. Both require reinforcement learning and a human-in-the-loop approach.

Dave Howell

Owner, Nexus Publishing

1. Detection of cyberattacks could be transformed by AI and ML. Taking a detailed overview of a system (digital payments for instance) could be more effective than human monitors. Over time, ML will gain a picture of what the correct environment should look like. Any anomalous behaviour is then easily detected.

2. Protecting data security on a personal level can also be supported with ML and AI. Developing a profile of how an individual makes payments, to whom, how often and using which payment methods, profiles that person. AI can then apply rules that when broken could indicate a fraud. We could all have a personalised AI that becomes our digital guardian when we are online.

3. At the moment the focus is on developing and deploying AI and ML to detect cyber attacks, but what if AI and ML was used by the attackers? AIs, in particular, could be used by these black hat attackers to test systems for vulnerabilities. In the future we could see a situation where a bank’s AI is defending against a hacker’s AI that is attacking it.

Ken Briodagh

Editorial director at IOT evolution

1. I think AI and machine learning algorithms have a critical role to play in the prevention of and response to data breaches. AI can quickly and efficiently monitor systems and networks for rogue devices and activity, which will help identify potential breaches before massive data losses. In terms of response, machine learning will help AI systems prevent future breaches of that type, much like a vaccine does for the immune system.

2. The biggest risk in any system is of human error leading to loss of data. AI’s effectiveness can be limited here, but I think there are ways to use AI to double check the work or actions of employees and perhaps launch a type of “Are you SURE you want to do that?” protocol before a person can press send.

3. As with any system, compliance is the biggest hurdle. People don’t like to comply with extra layers of work that can be perceived as hindrances. This is why machine learning systems need to be designed with UX in mind and to be as invisible and streamlined as possible. Of course, well-built training systems that help the users understand the importance of the system are also important to preventing non-compliance.

Daniel Dresner

Manchester University - Lecturer: Information and cyber security and governance.

1. As systems grow in complexity and expectations of versatility outstrip the pace of development which may be needed to make them safe, we need ideas that will help systems look after themselves and be resilient in the face of attack. AI may give us a level of autonomy to effectively understand the different components of our information assets and focus resources in the preservation of confidentiality and privacy, curation of the integrity of data, and making data available when and where it is legitimately needed.

2. This is new technology and it may be the vanguard for returning to appreciating that it is the behaviour of systems we should oversee for the safety of their thrall-people protection rather than data protection. AI - whatever that is - may just provide the engines of versatility that may detect (or predict) undesirable behaviours by actors - malicious or otherwise - to make systems less vulnerable in real time and take action to keep the systems' activities (and so the data) with the expectations of both the law and the moral and ethical standards of society.

3. The switch between using the term 'AI' to 'machine learning' may be indicative of how immature we are in understanding what we might have. I think the challenge is two-fold: one practical and one philosophical and they meet in the middle.

Governance...where AI and machine learning is deployed in the identification of threats or the detection of breaches, how do we allocate the decision rights or escalation paths over what is done with the intelligence (sic!)? How do we manage a new piece of complexity in complex systems whose boundaries we (and our supply chains) may have difficulty in defining?

This philosophical challenge is our propensity to chase after the new at the expense of lessons learnt. Innovation is no longer good enough and we worship disruption and then lament at the pieces which we're expected to pick up. I think it's time to admit which science sired AI and coined the term 'cyber' for us!

Of course, the other side of the 'main challenge' will be our adversaries deploying AI and machine learning to test and break down our defences...

Professor John A. Clark

Lead of Security of Advanced Systems Group, University of Sheffield Programme Lead, MSc Cybersecurity and Artificial Intelligence

1. We’ll see better integration of AI with software engineering practices to get more robust implementations and achieve them more efficiently. We will likely see increasing emphasis on situational awareness and diagnostics, especially as we augment current (and often very effective) practices for stopping malware getting in with elements that acknowledge we will have to live with compromise, especially in large systems where continued operation is a must. We will likely need AI and advanced countermeasures to deal with particularly nasty malware such as metamorphic viruses. More generally, there is very significant scope for using AI for malicious purposes. 

2. It is likely not much trouble to bring AI to bear on this problem and also possible to bring far less heavyweight technology to bear (and tech, e.g. address auto-complete is a culprit anyhow). AI can help, but this seems like using a hammer to crack a nut. Automated support to counter this can be deployed. I would be inclined to think this problem through before deciding the or even an answer is AI. Start with the problem and engineer a solution.

3. We really need to decide what ML is good for and what it is (likely) not. There are recurrent data breaches, where the future looks very like the past: the wrong email recipient problem above is an obvious example. But data breaches can occur as a result of significant novelty and so the future does not look like the past. Here AI will struggle. Why would you expect to ‘learn’ how to spot a zero-day exploit? There may be specifics about such an attack that render it anomalous in some way (and so flaggable as suspicious), but there is no a priori reason to expect them. The main challenge is probably countering excessive ‘belief’ in AI and the complacency that may well result. It’s not the silver bullet, it’s just a very powerful tool. A big challenge is understanding how human analytics and AI tech support can complement each other effectively.

Eleni Vasilaki

Professor of Computational Neuroscience & Neural Engineering at Sheffield University

1. AI and Machine Learning methods are underlining many activities of our everyday life, data security including. One increasing problem is identity theft. Intelligent algorithms, for instance, may help with identity authentication, they can learn behavioural patterns specific to each person, and therefore detect behaviours that divert from them, alerting to the possibility of identity theft.

2. Using AI for preventing from contacting accidentally the wrong person means that I would have to provide the algorithms with enough information on my professional and social network, that it would make it possible to estimate of the probability that I am contacting the right person. This is certainly feasible, though on a personal note I am sceptical about having all this information collected on me in the first place. Pragmatically speaking though, my search engine has way too many data on me already!

3. Very successful AI algorithms often rely on large training examples that may be unavailable for specific types of cyber-attacks. We can also envisage a scenario where both the cyber attacker and defender are AI systems, learning from each other's weaknesses, and co-evolve. 

Paul Kearney

Professor of Cybersecurity and Head of Cybersecurity Research Group, Birmingham City University

1. As with most technical innovations, AI and ML have three types of implication for cybersecurity:

  • It can be used to improve the security of systems, e.g. through better detection of attacks, spotting suspicious insider activity, making it easier for users to work securely, etc.
  • It introduces new vulnerabilities while people work out how to implement and use it securely. For example, it has been shown that ML-based classification systems (e.g. applications that work out what objects are in an image) can easily be fooled by incorporating invisible changes to the sample presented.
  • It can be used by threat agents, e.g. AI and ML could be used to identify vulnerabilities to be exploited, or to re-identify the subjects of anonymised data.

I would hope in the long term that these technologies will improve security significantly, but in the short term, they are a two-edged sword.

2. Yes, it should be possible to train ML systems to think: ‘Hey, that type of data doesn’t usually get sent there, and certainly not at that time of day!’, block the transfer temporarily, and flag up the anomaly for investigation by a human expert.

3. ML is not, in general, a ‘black box’ technology – it often takes a combination of ML and subject matter expertise to apply an ML technique to a specific problem. So, scarcity of required skills is one major challenge. Another is putting together a good training data set. In the case of supervised training, the data set has to be labelled by human experts, which can be extremely labour-intensive and time-consuming.

Professor Stefanos Kollias

Founding Professor of Machine Learning at Lincoln University

A variety of methods have been developed and used for assisting users in accessing and managing their content, whilst ensuring personalization and privacy in data handling. Data breaches, including sending data to incorrect recipients, can be reduced through a trade-off between flexibility and efficiency, on the one hand, and an increase of checks and controls on every action related to data management, on the other.

Artificial intelligence is expected to minimize the effects of this trade-off, by providing tools that are flexible and efficient, while they automatically analyze and check a variety of apparent and latent features, as well as consequences, related to each action on data. User profiling and personalized user knowledge is expected to be more and more included in this framework

Machine learning is also expected to be the main approach for aggregating and analyzing information related to the specific user, to similar types of users and to similar types of data, providing recommendations on the data management options.