Momentum Backpropagation Optimization for Cancer Detection Based on DNA Microarray Data

Cancer is a general term for a large group of diseases characterized by the growth of abnormal cells outside the boundary that can attack neighboring parts of the body and / or spread to other organs. Other general terms used are malignant tumors and neoplasms. Cancer can affect almost all parts of the body and has many anatomic and molecular subtypes. In 2018, according to the World Health Organization, cancer is the second leading cause of death globally and accounts for 9.6 million deaths. Lung, prostate, colorectal, stomach and liver cancers are the most common types of cancer in men, while breast, colorectal, lung, cervical and stomach cancers are the most common in women [1]. Based on available data, 30% and 50% of cancer deaths can be prevented by avoiding the main risk factors, including avoiding tobacco products, reducing alcohol consumption, maintaining a healthy weight, exercising regularly and overcoming risk factors associated with infection. In addition, early detection of cancer can also increase the success of treatment in patients with cancer [2], [3].

Microarray's feature selection and classifier, Kumar et al. uses the method of T-test feature selection and Functional Link Neural Network (FLNN) as a classifier [13], Bharathi used the Analysis of Variance (ANOVA) and Support Vector Machine (SVM) as feature selection and classifiers [14], and Diaz uses the Random Forest method [15]. Adiwijaya [16] also conducted research on the classification of cancer using PCA as a dimension reduction method as well as SVM and Backpropagation -Levenberg Marquardt (BPLM) as a classification method. In this study an average accuracy of 96.07 using BPLM and 94.98 using SVM can be achieved. Untari [17] uses Deep Belief Network and Mutual Information for microarray data classification and achieves an accuracy of 90.84%.
From some dimension reduction methods above, there are difficulties in determining how many attributes to choose, so the determination of number of attributes must be done by trial and error. Untari [18] uses the Genetic Algorithm and Momentum Backpropagation to classify Colon tumors and leukemia with 98.33% performance for Colon Tumor and 100% for Leukemia. By using the Genetic Algorithm, the selection of important attributes can be done by utilizing an evolutionary scheme so that the selected attributes are truly important attributes. However, in the Momentum Backpropagation method the value of learning rate is still static and need more time to converge, so certain mechanisms are needed to resolve the problem. Therefore, in this research optimization of the Momentum Backpropagation method is done by adding an adaptive learning rate scheme. With the adaptive learning rate scheme, it is expected that the system learning time will be faster.

II. Proposed Method
The novelty proposed in this study is the optimization of the Momentum Backpropagation algorithm in detecting cancer based on DNA microarray data. The optimization is done by dynamically adjusting the value of learning rate at each epoch in the training process. The optimization process is done by adding an adaptive learning rate scheme to the training process. This optimization process is needed to increase the level of convergence of the Momentum Backpropagation algorithm so that the training process will be faster and require less iteration. In this research, Genetic Algorithms are also used as a feature selection / reduction method of DNA microarray data dimensions that have very high dimensions. Genetic Algorithm is used because the method can choose the optimal features automatically using the evolution process, so that the number of features selected is not done by trial and error. With the proposed scheme, it is expected that the training process can be faster with a high level of detection accuracy.

III. Methodology
At this stage the researchers designed a feature selection method using genetic algorithms to select optimal features from Microarray data which will be used in the classification process, as well as designing the learning and testing stages of the Momentum Backpropagation method by adding an adaptive learning rate scheme. The addition of adaptive learning rate is done by multiplying the value of the learning rate with the increment learning rate parameter if the error value is large, and reducing the value of the learning rate by multiplying it with the decrement learning rate parameter if the error value is small. In addition, also determined parameters that affect the performance of genetic algorithms and the Momentum Backpropagation method used, so that the cancer detection system built can have a high performance. The following pseudo code (Table 1) describes the stages of a cancer detection system based on DNA microarray data.
The system built begins with the processing of a cancer dataset in the form of one of the gene expression data, microarrays. There are four datasets used, namely Colon Tumors, Leukemia, Lung Cancer, and Ovarian Cancer. The data was obtained through the Kent Ridge Database Repository [19]. The data specifications to be used can be seen in Table 2. The data undergoes a normalization process because the range of data for each attribute can vary [20]. Furthermore, in the method used there is the training and testing process. The training process is used to train the model in order to get the most optimal model, and the model will be tested for performance in the testing process. Therefore, in one dataset will be divided into two parts, namely training data and testing data. Crossover process using one-point crossover.
Mutation process using binary mutation. Survivor selection using general replacement and elitism. Until: max generation reached. Output training process: optimal features index, optimal params/weights of Neural Network.

Testing process:
Input: testing data, optimal features index, optimal params/weights of Neural Network. Feature selection based on optimal features index. Cancer detection for each data: Neural Network testing using optimal params/weights of Neural Network. Calculate testing accuracy. Output testing process: detection results and testing accuracy. In the training process, Genetic Algorithms are used as a feature selection method and Neural Network with the Momentum Backpropagation training algorithm as a microarray data classification method. The genetic algorithm will select the important features contained in the microarray data through an evolutionary process. The Genetic Algorithm training process begins with population initialization, in which at this stage a number of individuals are generated (according to a predetermined population size) randomly according to the chromosome representation that was previously designed. Because the Genetic Algorithm in this study is used as a feature selection method, the chromosome that is designed has a dimension of 1xN, where N is the number of attributes of the data used [18]. Then the chromosomes are represented using binary numbers, where "1" means the features in the corresponding column are selected, and 0 vice versa. Furthermore, the chromosome decoding process is done by selecting features in the data used according to the chromosome index value of 1.
Chromosome evaluation is done by calculating the fitness value of each chromosome. The fitness function used to calculate the fitness value is the accuracy of training data obtained from the Momentum Backpropagation training process [21], [22]. The Momentum Backpropagation training process aims to find the weights of the Neural Network architecture, where the architecture of Multi-Layer Neural Network is determined at the beginning how many hidden layer and hidden neurons are used [23]. In this process there are three training algorithms used, namely Momentum Backpropagation, Backpropagation with adaptive learning rate schemes, and Momentum Backpropagation with adaptive learning rate schemes [24], [25], [26].
Then, the next process is the process of evolution of Genetic Algorithms, which consists of parents selection, crossover, mutation, and survivor selection. Parents selection, to look for parental pairs that will be used in the crossover process (recombination). The parent selection scheme used is the Roulette Wheel, which is the selection of parents carried out randomly according to the proportion of fitness value [27]. Crossover, the process of exchanging parent genes to form the chromosomes of children. In a standard genetic algorithm, two parent chromosomes will produce two child chromosomes. The recombination process is carried out if the random number raised is less than the specified recombination probability. The crossover operator used in this study is one point crossover [28].
Mutation, the process of reversing genes in chromosomes where the number 0 will become 1, and vice versa. The mutation process is carried out if the random number generated is less than the probability of the mutation that has been determined. In this study flip mutations are used [29]. Survivor selection, to choose chromosomes that will survive / enter the next generation. In this study the Generational Replacement survivor selection scheme [30] is used, in which the child chromosomes will replace the chromosomes in the current generation. Therefore, the elitism process is needed to maintain chromosomes with the highest fitness value so that they are not replaced by chromosomes of lower quality.
The evolution process will be repeated as many generations as determined, and at the end of the training process the system will store the best chromosomes and their optimal weights for use in the testing process. The testing process is an activity of testing the system that has been implemented along with parameters that have been obtained during the learning phase. This process begins with the feature selection based on the best chromosomes obtained in the training process, so the testing data used is data that has been reduced by the number of attributes. Then the classification process uses the Neural Network architecture and the optimal weights from the training process and perform forward propagation to determine the Neural Network output. As for what will be observed and analyzed from the system that has been built include: a. Effect of genetic algorithm parameters used as a feature selection method on system performance. b. Effect of momentum parameters and adaptive learning rate on the learning process.

IV. Results and Discussion
The research conducted aims to build a cancer detection system based on Microarray data using Genetic Algorithms as a feature selection method and Momentum Backpropagation -Neural Network by adding an adaptive learning rate scheme for Microarray data classification that has been reduced in dimensions. In addition, in this study also compares and analyzes the performance of the system uses Momentum Backpropagation, Backpropagation with the adaptive learning rate scheme, and Momentum Backpropagation with the adaptive learning rate scheme. The combination of parameters used in this study to find the best model are population size, cross-move probability, mutation probability, number of hidden neurons, learning rate, increment learning rate, decrement learning rate, and momentum. Detail range of each parameter can be seen in Table 3. The test results with the specified parameters are proven to be able to detect cancer based on Microarray data classification, and the addition of an adaptive learning rate scheme to the Momentum Backpropagation algorithm can improve the accuracy of the cancer detection system. Table 4 shows the best combination parameter results of Genetic Algorithm and Momentum Backpropagation Algorithm for each dataset. Hyper parameters observed were population size, cross-move probability, mutation probability, number of hidden neurons, and Momentum Backpropagation algorithm parameters (learning rate and momentum). The best testing accuracy that can be achieved using Genetic Algorithm and Momentum Backpropagation Algorithm (  The next algorithm tested is the Backpropagation algorithm by adding an adaptive learning rate, and the results are presented in Table 5. Hyper parameters observed were population size, crossover probability, mutation probability, number of hidden neurons, and Backpropagation algorithm with an adaptive learning rate parameter (learning rate, increment learning rate, and decrement learning rate). The best testing accuracy that can be achieved using Genetic Algorithms and Backpropagation Algorithm with Adaptive Learning Rate (Table 5) is 90.51% for Colon Tumor data, 98.66% for Leukemia data, 100% for Lung Cancer data, and 100% for Ovarian Cancer data. Then to improve the performance and processing time of the Momentum Backpropagation algorithm, an adaptive learning rate scheme is added. This is done by combining the Backpropagation Momentum and Backpropagation algorithms with the adaptive learning rate, where the results are presented in Table  6. Hyper parameters observed are population size, crossover probability, mutation probability, number of hidden neurons, and Momentum Backpropagation algorithm with adaptive learning rate schemes parameters (learning rate, momentum, increment learning rate, and decrement learning rate). The best testing accuracy that can be achieved using Genetic Algorithms and Momentum Backpropagation Algorithm with Adaptive Learning Rate (Table 6) is 90.5% for Colon Tumor data, 100% for Leukemia data, 100% for Lung Cancer data, and 100% for Ovarian Cancer data.
Based on the best combination of parameters in Table 4, Table 5, and Table 6, each dataset and cancer detection scheme have the best combination of different parameters. This shows that the use of hyper parameter values is very dependent on the algorithm and dataset used. Population size parameter indicates how much exploration the search for solutions performed, crossover and mutation probability shows how much the level of crossover and mutation operations are carried out, number of hidden neurons shows how complex the classification functions required for data classification, learning rate and momentum indicate how large the step width needed to reach the convergent point, and the learning rate increment and decrement are used to increase or decrease the value of the learning rate at each iteration.
Based on Table 7 and Table 8, it can be seen that the addition of an adaptive learning rate scheme to the Momentum Backpropagation can improve system accuracy and reduce the number of epochs needed in the training process, so that the time/number of epochs needed for the system to reach its convergence point is faster. The reduction in the number of epochs is very significant for each dataset, which is a decrease of 390 average epochs from the Backpropagation algorithm and a decrease of 80 average epochs from the Backpropagation algorithm with the adaptive learning rate scheme. The number of epochs needed for Momentum Backpropagation for each dataset is 228 epochs for Colon Tumor data, 84 epochs for leukemia data, 617 epochs for Lung Cancer data, and 628 epochs for Ovarian Cancer data. The number of epochs needed for Backpropagation with an adaptive learning rate scheme for each dataset is 189 epochs for Colon Tumor data, 86 epochs for leukemia data, 159 epochs for Lung Cancer data, and 189 epochs for Ovarian Cancer data. The number of epochs needed for Momentum Backpropagation with an adaptive learning rate scheme (proposed scheme) for each dataset is 48 epochs for Colon Tumor data, 39 epochs for leukemia data, 61 epochs for Lung Cancer data, and 154 epochs for Ovarian Cancer data. Reducing the number of epochs required in the training process effect the Neural Network reaching a convergent point quickly and the training time required is shorter.

V. Conclussion
In this research, an optimization of the Momentum Backpropagation algorithm is done by adding an adaptive learning rate scheme as a method of classifying DNA microarray data in a cancer detection system. This optimization is done to increase the level of convergence of the Momentum Backpropagation algorithm so that the training process can be carried out faster. Genetic Algorithms are used as a method of feature selection because the method can automatically select features through an evolutionary process. The proposed scheme is proven to improve the accuracy of cancer detection and reduce the number of epochs needed in the training process. The number of epochs needed for Momentum Backpropagation with an adaptive learning rate scheme for each dataset is 48 epochs for Colon Tumor data, 39 epochs for leukemia data, 61 epochs for Lung Cancer data, and 154 epochs for Ovarian Cancer data. The reduction in the number of epochs is very significant for each dataset, which is a decrease from 390 epochs (Momentum Backpropagation algorithm) and 156 epochs (Backpropagation algorithm with the adaptive learning rate scheme) to 76 epochs (Momentum Backpropagation algorithm with the adaptive learning rate scheme). Reducing the number of epochs required in the training process effect the Neural Network reaching a convergent point quickly and the training time required is shorter. The accuracy of cancer detection can be achieved using Genetic Algorithm and Momentum Backpropagation algorithm with an adaptive learning rate is 90.51% for Colon Tumor data, 100% for Leukemia, 100% for Lung Cancer, and 100% for Ovarian Cancer data.