When you think about data mining, cyber security is not necessarily the first thing that springs to mind.
Data mining is a process used by businesses to turn raw data into useful information. Software is used to identify patterns in large datasets, helping businesses to learn more about their customers.
Once mined and converted into useful information, the data is then typically used to develop more effective marketing strategies, increase sales, decrease costs, or enhance customer relations.
Like any big data analysis, data mining depends on the quality of the data it is mining so data collection, warehousing and computer processing are all important aspects of data mining.
This post, however, is going to focus on the practical applications of data mining for cyber security solutions and how data mining can discover vulnerabilities including the detection of malware and intruders on a network or system.
What is data mining?
Before we look at the uses of data mining in cyber security, let’s first take a closer look at what exactly data mining is.
Data mining is, at its core, pattern finding. Data miners are experts at using specialised software to find regularities (and irregularities) in large data sets.
Once the data has been mined, the information can be used to predict future trends, allowing businesses to make proactive, knowledge-based decisions based on large data sets.
Data mining software programmes break down patterns and connections in data based on the information that is requested by users. This means that the data is only as good as the requests made by the data miners.
The data mining process is typically broken down into five steps:
- Organisations collect data and load it into their data warehouses.
- Businesses store and manage the data, either on in-house servers or the cloud.
- Business analysts, management teams and IT professionals access the data and determine how they want to organise it.
- Application software sorts the data based on the user’s requests.
- The end-user presents the data in an easy-to-share format such as graphs or tables.
Data mining in security
Data mining has many applications in security including in national security as well as in cyber security (e.g., virus detection). The threats to national security include attacking buildings and destroying critical infrastructures such as power grids and telecommunication systems.
Cyber security is concerned with protecting computer and network systems from corruption due to malicious software including Trojan horses and viruses. Data mining is also being applied to provide solutions such as intrusion detection and auditing.
Data mining in cyber security
Data mining is now commonly used by businesses as part of a cyber security solution suite. For example, anomaly detection techniques could be used to detect unusual patterns and behaviours.
Link analysis may be used to trace the viruses to the perpetrators. Classification may be used to group various cyber-attacks and then use the profiles to detect an attack when it occurs. Prediction may be used to determine potential future attacks depending in a way on information learnt about terrorists through email and phone conversations.
Data mining is also being applied for intrusion detection and auditing. The conventional approach to securing computer systems against cyber threats is to design mechanisms such as firewalls, authentication tools, and virtual private networks that create a protective shield. However, these mechanisms almost always have vulnerabilities.
This has created the need for intrusion detection, a security technology that complements conventional security approaches by monitoring systems and identifying computer attacks.
Data mining is used to support more traditional methods of cyber security such as firewalls and authentication tools and is primarily used across three areas: malware detection, intruder detection and fraud detection.
Data mining for malware detection
Malware detection is one of the most common cyber security requirements and data mining is just one of the techniques used for malware detection. Other examples include:
- Activity monitoring
- Integrity checking
Data mining is typically built into cyber security applications as a way of improving both the speed and quality of malware detection.
Various strategies are then used to detect potential malware including anomaly detection and misuse detection (or a combination of the two).
Anomaly detection can detect emerging threats and attacks (which do not have signatures or labelled data corresponding to them) as deviations from normal usage.
Moreover, unlike misuse detection schemes (which build classification models using labelled data and then classify an observation as normal or attack), anomaly detection algorithms do not require an explicitly labelled training data set, which is very desirable, as labelled data is difficult to obtain in a real network setting.
Data mining for intrusion detection
As well as detecting malware code, data mining can be effectively used to detect intrusions and analyse audit results to detect anomalous patterns.
Malicious intrusions may include intrusions into networks, databases, servers, web clients, and operating systems.
Whilst it is unlikely that individuals would use data mining as a way of preventing intrusions to their personal devices, cyber security solutions are starting to build data mining into their commercial cyber security products and consumers are starting to benefit from data mining for cyber security and intrusion detection.
More commonly, intruders target network-based attacks as a way of gaining access to a company’s entire network. That is why large organisations are adding data mining to their cyber security suite to cover all bases.
Like malware detection, if you want to detect an intruder, either as a host-based attack or a network-based attack, you need to look for anomalies in behaviour or cases of misuse.
Data mining for fraud detection
Credit card fraud and identity theft are on the rise. According to the Federal Trade Commission, there was a 73% year-over-year increase in identity thefts from 2019 to 2020 in the US. There were nearly 1.4 million reported ID theft incidents in 2020, versus 650,000+ in 2019.
CNBC reported that credit card losses reached $28.65 billion worldwide in 2019 according to Nilson Report data. They go on to predict that credit card fraud will continue to increase due to the Covid pandemic. According to The Ascent, there were 323,920 reports of COVID-19 fraud last year, a number that has since grown to over 500,000.
These rising numbers in terms of credit card fraud and identity theft are forcing businesses to turn to alternative solutions to protect their customers and their data.
Artificial intelligence (AI) and machine learning are also helping data miners to identify patterns and predict outcomes. The machine learning algorithms take the information representing the relationship between items in data sets and build models so that they can predict future outcomes. These models are nothing but actions that will be taken by the machine to get to a result.
Fraudulent activities can be detected with the help of supervised and unsupervised learning. Unsupervised learning does not rely on trained data sets to predict the outcomes, but it uses direct techniques such as clustering and association to predict outcomes. Trained data sets mean the input for which the output is known.
Supervised Learning is like teacher-student learning. The relation between the input and the output variable is known. The machine learning algorithms will predict the outcome of the input data which will be compared with the expected outcome. The error will be corrected, and this step will be performed iteratively until an acceptable level of performance is achieved.
Data mining has huge potential within the field of both general security and cyber security. Using data mining, huge sets of data can be analysed and using AI and machine learning, insights and outcomes can be extracted and actioned.
NEC New Zealand partners with strategic vendors to provide best-in-class Cyber Security solutions to our customers. Our expertise in Cyber Security and next-generation security platforms enables protection against advanced cyber security threats to protect today’s networks. Learn more about our cyber security solutions and talk to the team today.