Random and Synthetic Over-Sampling Approach to Resolve Data Imbalance in Classification

Mardhiya Hayaty; Siti Muthmainah; Syed Muhammad Ghufran

doi:10.29099/ijair.v4i2.152


Random and Synthetic Over-Sampling Approach to Resolve Data Imbalance in Classification

^{(1) *} Mardhiya Hayaty

(Amikom Yogyakarta University, Indonesia)
⁽²⁾ Siti Muthmainah

(Amikom Yogyakarta University, Indonesia)
⁽³⁾ Syed Muhammad Ghufran

(Department of Mathematics Abdul Wali Khan University, Mardan Garden Campus, Pakistan)
^*corresponding author

Abstract

High accuracy value is one of the parameters of the success of classification in predicting classes. The higher the value, the more correct the class prediction.Â One way to improve accuracy is dataset has a balanced class composition. It is complicated to ensure the dataset has a stable class, especially in rare cases. This study used a blood donor dataset; the classification process predicts donors are feasible and not feasible; in this case, the reward ratio is quite high. This work aims to increase the number of minority class data randomly and synthetically so that the amount of data in both classes is balanced. The application of SOS and ROS succeeded in increasing the accuracy of inappropriate class recognition from 12% to 100% in the KNN algorithm. In contrast, the naÃ¯ve Bayes algorithm did not experience an increase before and after the balancing process, which was 89%.

DOI

https://doi.org/10.29099/ijair.v4i2.152

Article metrics

10.29099/ijair.v4i2.152 Abstract views : 3023 | PDF views : 735

Cite

How to cite item

Full Text

Download

References

J. C. Xavier-JÃºnior, A. A. Freitas, T. B. Ludermir, A. Feitosa-Neto, and C. A. S. Barreto, â€œAn evolutionary algorithm for automated machine learning focusing on classifier ensembles: An improved algorithm and extended results,â€ Theor. Comput. Sci., vol. 805, pp. 1â€“18, 2019.

N. Hameed, A. M. Shabut, M. K. Ghosh, and M. A. Hossain, â€œMulti-class multi-level classification algorithm for skin lesions classification using machine learning techniques,â€ Expert Syst. Appl., vol. 141, p. 112961, 2020.

C. Zhang, C. Liu, X. Zhang, and G. Almpanidis, â€œAn up-to-date comparison of state-of-the-art classification algorithms,â€ Expert Syst. Appl., vol. 82, pp. 128â€“150, 2017.

T. Pan, J. Zhao, W. Wu, and J. Yang, â€œLearning imbalanced datasets based on SMOTE and Gaussian distribution,â€ Inf. Sci. journal-Elsivier, no. xxxx, 2019.

W. Lu, Z. Li, and J. Chu, â€œAdaptive Ensemble Undersampling-Boost: A novel learning framework for imbalanced data,â€ J. Syst. Softw., vol. 132, pp. 272â€“282, 2017.

M. Palt and M. Palt, â€œScienceDirect The Proposal of Undersampling Method for Learning from The Proposal of Undersampling Method for Learning from Imbalanced Datasets Imbalanced Datasets,â€ Procedia Comput. Sci., vol. 159, pp. 125â€“134, 2019.

H.-J. Xing and W.-T. Liu, â€œRobust AdaBoost based ensemble of one-class support vector machines,â€ Inf. Fusion, vol. 55, no. July 2019, pp. 45â€“58, 2020.

P. Chujai, K. Chomboon, P. Teerarassamee, N. Kerdprasop, and K. Kerdprasop, â€œEnsemble Learning For Imbalanced Data Classification Problem,â€ no. January 2015, pp. 449â€“456, 2015.

B. Krawczyk, A. Cano, and M. Wozniak, â€œSelecting local ensembles for multi-class imbalanced data classification,â€ Proc. Int. Jt. Conf. Neural Networks, vol. 2018-July, 2018.

Sundar R and Punniyamoorthy M, â€œPerformance enhanced Boosted SVM for Imbalanced datasets,â€ Appl. Soft Comput. J., vol. 83, p. 105601, 2019.

S. Mutrofin, A. Muâ€™alif, R. V. H. Ginardi, and C. Fatichah, â€œSolution of class imbalance of k-nearest neighbor for data of new student admission selection,â€ Int. J. Artif. Intell. Res., vol. 3, no. 2, 2019.

J. Han, Jiawei; Kamber, Micheline; Pei, Data Mining Concepts and Techniques. Elsivier, 2012.

X. Wu et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2008.

O. Kramer, â€œDimensionality Reduction with Unsupervised Nearest Neighbors,â€ Intell. Syst. Ref. Libr., vol. 51, pp. 13â€“23, 2013.

Okfalisa, I. Gazalba, Mustakim, and N. G. I. Reza, â€œComparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification,â€ Proc. - 2017 2nd Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2017, vol. 2018-Janua, pp. 294â€“298, 2018.

D. Elreedy and A. F. Atiya, â€œA Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,â€ Inf. Sci. (Ny)., vol. 505, pp. 32â€“64, 2019.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, â€œSMOTE: Synthetic minority over-sampling technique,â€ J. Artif. Intell. Res., vol. 16, no. February 2017, pp. 321â€“357, 2002.

G. Douzas, F. Bacao, and F. Last, â€œImproving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,â€ Inf. Sci. (Ny)., vol. 465, pp. 1â€“20, 2018.

J. M. Johnson and T. M. Khoshgoftaar, â€œDeep learning and data sampling with imbalanced big data,â€ Proc. - 2019 IEEE 20th Int. Conf. Inf. Reuse Integr. Data Sci. IRI 2019, pp. 175â€“183, 2019.

P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, â€œAssessing the accuracy of prediction algorithms for classification: An overview,â€ Bioinformatics, vol. 16, no. 5, pp. 412â€“424, 2000.

G. KovÃ¡cs, â€œAn empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets,â€ Appl. Soft Comput., vol. 83, p. 105662, 2019.

A. Hanskunatai, â€œA New Hybrid Sampling Approach for Classification of Imbalanced Datasets,â€ 2018 3rd Int. Conf. Comput. Commun. Syst. ICCCS 2018, pp. 278â€“281, 2018.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

________________________________________________________

The International Journal of Artificial Intelligence Research

Organized by: Prodi Teknik Informatika Fakultas Teknologi Bisnis dan Sains
Published by: Universitas Dharma Wacana
Jl. Kenanga No. 03 Mulyojati 16C Metro Barat Kota Metro Lampung

Email: jurnal.ijair@gmail.com

View IJAIR Statcounter

This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me