Machine Learning Is Also a Tool for Cybercriminals

It doesn’t matter what discipline within cybersecurity one looks at, nearly everywhere one looks machine learning and artificial intelligence are changing how security data are analyzed, security tools deployed, and threats identified. I know there’s a difference between machine language and AI, but so many use the terms interchangeably now that the difference is blurring in the minds of many.

There are many ways machine learning promises to enhance security, by increasing security through help with SIEM analysis, malware analysis, threat intelligence and many other ways. But anything that can be used to help the good guys can be used to attack them as well. And that’s the dichotomy of new technologies, they always serve duel uses.

Essentially, machine learning is pattern recognition and computational learning theory that make it possible for systems to learn on their own, such as the ability to see trends in data humans have a tough time seeing, providing predictive analytics, or identifying malware based on previously unidentified activity. This Carnegie Mellon University Software Engineering Institute blog, Machine Learning in Cybersecurity, provides a good overview on how machine learning works in cybersecurity.

The process is the same as with most data analytics and requires data collection, cleaning, model building and validation and deployment and monitoring of the machine learning system.

Machine learning is taking off across all disciplines. Oracle CEO Mark Hurd recently said during his CloudWorld keynote "I don't think AI will become a thing, I think AI will become a feature integrated into everything, which makes it very different strategically," he said.

Enterprise Cloud News wrote that Hurd believes that 90% of all enterprise applications will feature AI capabilities by 2020. At the same time, over 50% of enterprise data will be managed autonomously.

Certainly, security applications of machine learning will be no different. But machine learning advancements won’t only be the purview of enterprises working to protect themselves. Attackers will use these tools, too. Here are some ways you can expect that to happen:

Enhanced foot printing. Attackers will use machine learning tools to conduct better and enhanced reconnaissance on potential targets. Typically when an adversary is conducting footprinting, they are gathering information about the target’s computer systems and people. They’ll use tools such as Nmap or neotrace, conduct port scans web spidering, and such. The good ones will learn what they can about people in the organization through social media and professional sites such as LinkedIn and Facebook. This type of work is time consuming and takes considerable effort to map things out. Machine learning will be able to gather this information, identify weaknesses, trends and key things among personnel and help the adversary to identify methods of attack that would take manual methods much more time to conduct.

Machine Learning vs Machine Learning. Machine learning systems are taught with training data and this is susceptible to being trained to fail. An adversary may poison the machine learning data that will make it unusable, weaken it to the point that its analysis is no longer reliable. If you are interested in a deep dive on this subject, have a read of this paper, BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain, which covers how outsourcing training of machine learning models, or buying these models from online model zoos, revealed how maliciously trained neural networks are backdoored and how the “Bad- Nets” can perform perfectly on regular inputs but misbehave when carefully crafted attacker-chosen inputs are provided. Not good.

Malware enhancements. Attackers with access to machine learning systems can take the malware writing process and set it on fire with automation, they’ll be better able to get around endpoint protection tools and potentially, if the system has access to a centralized machine learning model, the malware can capture real-time data about its environment and be coded and changed in near-real time until it finds a way to success. Likewise, machine learning defensive systems can be updated but this is an arms race and battle of the machine learning models certainly won’t always be won by the defenders.

Breaking Algos. Attackers can also attack machine learning systems directly, this can include contaminating training data fed to the machine learning algorithm — which will make its analysis worthless (garbage in, garbage out). Attackers could also potentially feed bad data to a learning system in production so that its predictions become worthless. Here’s a great post on potential machine learning defenses.

Does this mean machine learning will be of little benefit to security professionals and the enterprises they are trying to protect? Certainly not. In fact, it means that defenders will need machine learning more than ever if they hope to stay ahead or at least on par with their attackers. Machine learning will expand into many other areas of benefit to defenders, as wells as malware protection and helping to defend their overall architecture such as insider threat and secure coding enhancements.

Whatever you thoughts on machine learning today, whether it’s overhyped big data statistical analysis or for real, the reality is sooner or later using machine learning will be table stakes.