Machine Learning-Based Distributed Denial of Service Attack Detection on Intrusion Detection System Regarding to Feature Selection

Distributed denial of service (DDoS) is a type of network attack that continues to increase every year, in terms of volume and intensity [1]. DDoS attacks pose a threat to Internet users and all the infrastructure that is in them, including bandwidth, server resources, data integrity, data availability, and confidentiality of data stored on the server [2]. Until now DDoS attacks are still included in the main types of cyber security threats. Early detection plays a fundamental role in preventing the fatal impact of DDoS attacks on server resources. One of the basic actions taken to prevent DDoS attacks is to install an Intrusion Detection System (IDS) on the server to monitor the flow of data packets that enter the internal network or vice versa [3]. Detection techniques in common IDS are far from perfect when compared to a variety of modern techniques and tools used by attackers because IDS still uses signature-based detection or anomaly-based detection models [4]. The use of detection models in both signature-based and anomaly-based IDs has a high false-positive rate. From a technical point of view, signature-based IDs and anomaly-based IDS work by monitoring the flow of data packets that enter or exit the internal network. IDS will provide a marker if it finds data flow activities that do not match the signature database that has been embedded in the IDS [5]. Thus detection model logically will cause a lot of false positive flags, because the flow of computer network data packets has dynamic properties both in terms of size, source, protocol, and content of data contents [6]. On the other hand, signature-based IDS and anomaly-based IDS have two main weaknesses. The first weakness is when ARTICLE INFO ABST RACT


Introduction
Distributed denial of service (DDoS) is a type of network attack that continues to increase every year, in terms of volume and intensity [1]. DDoS attacks pose a threat to Internet users and all the infrastructure that is in them, including bandwidth, server resources, data integrity, data availability, and confidentiality of data stored on the server [2]. Until now DDoS attacks are still included in the main types of cyber security threats. Early detection plays a fundamental role in preventing the fatal impact of DDoS attacks on server resources. One of the basic actions taken to prevent DDoS attacks is to install an Intrusion Detection System (IDS) on the server to monitor the flow of data packets that enter the internal network or vice versa [3]. Detection techniques in common IDS are far from perfect when compared to a variety of modern techniques and tools used by attackers because IDS still uses signature-based detection or anomaly-based detection models [4]. The use of detection models in both signature-based and anomaly-based IDs has a high false-positive rate. From a technical point of view, signature-based IDs and anomaly-based IDS work by monitoring the flow of data packets that enter or exit the internal network. IDS will provide a marker if it finds data flow activities that do not match the signature database that has been embedded in the IDS [5]. Thus detection model logically will cause a lot of false positive flags, because the flow of computer network data packets has dynamic properties both in terms of size, source, protocol, and content of data contents [6]. On the other hand, signature-based IDS and anomaly-based IDS have two main weaknesses. The first weakness is when Distributed Service Denial (DDoS) is a type of network attack, which each year increases in volume and intensity. DDoS attacks also form part of the major types of cyber security threats so far. Early detection plays a key role in avoiding the catastrophic effects on server infrastructure from DDoS attacks. Detection techniques in the traditional Intrusion Detection System (IDS) are far from perfect compared to a number of modern techniques and tools used by attackers, because the traditional IDS only uses signature-based detection or anomaly-based detection models and causes a lot of false positive flags, since the flow of computer network data packets has complex properties in terms of both size and source. Based on the deficiency in the ordinary IDS, this study aims to detect DDoS attacks by using machine learning techniques to enhance IDS policy development. According to the experiment the selection of features plays an important role in the precision of the detection results and in the performance of machine learning in classification problems. The combination of seven key selected dataset features used as an input neural network classifier in this study provides the highest accuracy value at 97.76%.
IDS detects attacks that begin with the SYN protocol, for example SYN-Flood, because the SYN protocol is a legal and absolute protocol to be used to initiate communication between two computers/devices in a network [7]. Therefore, ordinary IDS is difficult to generate alerts against attacks that begin with the SYN protocol artifact. The second weakness of IDS is mainly due to the TCP/IP protocol deficit which makes it easy for an attacker to start a DDoS attack for example by using the Ping command which is available by default throughout the operating system or using special tools such as HOIC, LOIC, XOIC, golden-eye, and etc [8]. The use of TCP/IP standard protocols by attackers to carry out DDoS attacks causes the target too slow to realize that it is under attack, so that it also impacts process of attack mitigation [9]. Weaknesses of the TCP/IP protocol are difficult to handle by ordinary IDS. In addition, the high volume of false positive flags generated by ordinary IDS has quite an impact on server hardening efficiency. Based on the weaknesses that exist in the ordinary IDS, this study aims to detect DDoS attacks by utilizing machine learning techniques so that it can be an improvement in the development of IDS devices. This research utilizes a DDoS attack dataset sourced from UNSW-NB15 (University of New South Wales) [10] for further processing by applying the neural network method to produce DDoS detection machine learning models.

Detection Approach
The DDoS attack detection approach implemented in this study is divided into several stages namely :

Retrieving Dataset
The first step is getting the UNSW-NB15 DDoS attack dataset published by the University of New South Wales. The UNSW-NB15 dataset is the latest attack dataset containing the attack packet flow record and a normal packet in the form of a tcpdump file, recording the data flow for 31 hours [11]. The attack packet flow is synthetically simulated using IXIA software, mimicking attacks with highspeed low footprinting. There are nine types of attacks covered by the UNSW-NB15 dataset, presented in Table 1. The grouping of UNSW-NB15 dataset feature categories is carried out systematically, namely flow features, basic features, data packet content features, time features, and additional features. Basically, the motivation for the formation of the UNSW-NB15 dataset is to improve the issue of shortcomings in the KDDCUP99 and NSLKDD datasets. [12]. Fuzzers Actions that cause network communication or running programs to be delayed temporarily, by injecting random data.

Backdoors
The technique of bypassing a system security door secretly to access a machine and the data it contains.

5.
Generic The technique of disrupting encrypted data flow. 6.
Exploit Attempts to exploit network or software security holes on the server or host. 7. Reconnaissance The process of snooping on security holes on a network or server by gathering information related to an attack. 8.
Worms Attempts to replicate malicious code or software that an attacker has implanted into the infected network or machine. Replication aims to spread malicious code or software to other machines that haven't been infected. 9.
Shellcode A small piece of malicious code that is used as a carrier of information / triggers of an attack / exploitation. 10.
Normal Natural transaction data flow.

Selecting Feature
In this study, the type of record that will be analyzed will be specific to the DDoS record group as presented in Table 1. The UNSW-NB15 DDoS dataset attack record has features as presented in Table  2. The fifteen features are then selected using the Information Gain technique with the aim of reducing computational time and obtaining a high-accuracy machine learning model. Information Gain is the amount of mutual information obtained from a combination of observational variables and is a divergence from the Kullback-Liebler theory [13]. In terms of machine learning, Information Gain is useful for selecting and selecting several important features based on theories that measure the value of information possessed by a feature related to other features. For an "a" feature, information gain is the amount of entropy contained by "a" compared to the "c" feature of all available features [14]. Important features are indicated by the maximum value of entropy possessed by the feature. The Information Gain equation is presented in (1).

()
Where H (X) is entropy X, and p (X) is the probability of X

Building Neural Network Scheme
In this study a machine learning model in the form of artificial neural network backpropagation was formed with architecture as presented in Table 3. Vol 4, No 1, June 2020, pp.   The use of a hidden layer in neural network architecture is based on the reason that a hidden layer is sufficient to solve the classification problem [15], and the number of hidden layer neurons is 2n where n is the number of input layer neurons [16]. The function used to train neural networks is determined using the quasi newton method (in Matlab trainlm) which is able to provide divergence speed compared to the scaled conjugate or resilient propagation method [17].

Results and Discussion
The experiments in this study were carried out with Matlab 2015B software running on a Windows 10 64bit operating system platform. The results of the feature selection stages of the UNSW-NB15 dataset produce a sequence of features as presented in Table 4. In this study, four feature schemes are used as input from artificial neural network classifiers to determine the effectiveness of training and classification accuracy resulting from the feature selection process. Based on the input schemes from the feature selection, four different neural network architecture schemes were formed, referring to Table 3. Four neural network architectural schemes related to input features are presented in Table 5. The four neural network schemes were trained with the same parameters namely epoch = 20,000; momentum = 0.95; learning rate = 0.1; goal = 0.01; performance evaluation = mean-squared error; gradient = 0.01e-10; mu = 1.00e + 10. The amount of data in the dataset with a total of 1200 lines is randomly divided into three blocks, namely training, testing, and validation. The distribution of dataset blocks is done randomly using the default Matlab dividerand function which produces 70% of training data, 15% of testing data, and 15% of validation data. The results of the training of four neural network schemes related to feature input are presented sequentially in Fig 1 to Fig 4. A summary of the performance of the training results and accuracy of the four neural network scheme models is presented in Table 6.     In summary, the results of the accuracy of each neural network scheme related to the number of feature selection inputs are presented in Fig. 5.

Conclusion
Based on the results of experiments that have been carried out, it is found that feature selection plays an important role in the accuracy of detection results and the efficiency of machine learning training in classification problems. In this study, the combination of seven main features of the dataset used as an input neural network classifier namely feature number 2, 6, 9, 10, 12, 1, and 3 produces the highest accuracy value of 97.76% compared to the three other feature combination schemes, namely 15 feature input schemes, 5 feature input schemes, and 9 feature input schemes. The seven feature combination scheme also produces a neural network model that has the best training efficiency, which is characterized by the smallest epoch and mean-squared error among other schemes, namely 429 epochs and 0.009011 mean-squared errors. In contrast, the validation regression value of the neural network model with the input of seven selection features, produces the largest value of 0.982270, which means that the neural network model provides a high match between input and training targets. In the end it can be concluded that to cover the ordinary IDS deficiency in solving DDoS attack detection problems, based on the UNSW-NB15 dataset and neural network backpropagation classifier, seven selected features are needed from the fifteen available features. The seven features are able to produce an accuracy of 97.76% and training classifier efficiency of 429 epochs.