
(2) * Aji Prasetya Wibawa

(3) Ilham Ari Elbaith Zaeni

(4) Andrew Nafalski

*corresponding author
AbstractClassification of economic journal articles has been done using the VSM (Vector Space Model) approach and the Cosine Similarity method. The results of previous studies are considered to be less optimal because Stopword Removal was carried out by using a dictionary of basic words (tuning). Therefore, the omitted words limited to only basic words. This study shows the improved performance accuracy of the Cosine Similarity method using frequency-based Stopword Removal. The reason is because the term with a certain frequency is assumed to be an insignificant word and will give less relevant results. Performance testing of the Cosine Similarity method that had been added to frequency-based Stopword Removal was done by using K-fold Cross Validation. The method performance produced accuracy value for 64.28%, precision for 64.76 %, and recall for 65.26%. The execution time after pre-processing was 0, 05033 second.
|
DOIhttps://doi.org/10.29099/ijair.v3i2.99 |
Article metrics10.29099/ijair.v3i2.99 Abstract views : 1887 | PDF views : 282 |
Cite |
Full Text![]() |
References
G. Orellana, M. Orellana, V. Saquicela, F. Baculima, and N. Piedra, “A text mining methodology to discover syllabi similarities among higher education institutions,†Proc. - 3rd Int. Conf. Inf. Syst. Comput. Sci. INCISCOS 2018, vol. 2018–Decem, pp. 261–268, 2018.
F. Rahutomo, T. Kitasuka, and M. Aritsugi, “Semantic Cosine Similarity,†Semant. Sch., vol. 2, no. 4, pp. 4–5, 2012.
A. I. Kadhim, Y. N. Cheah, N. H. Ahamed, and L. A. Salman, “Feature extraction for co-occurrence-based cosine similarity score of text documents,†2014 IEEE Student Conf. Res. Dev. SCOReD 2014, pp. 2–5, 2014.
R. T. Wahyuni, D. Prastiyanto, and E. Supraptono, “Penerapan Algoritma Cosine Similarity dan Pembobotan TF-IDF pada Sistem Klasifikasi Dokumen Skripsi,†J. Tek. Elektro, vol. 9, no. 1, pp. 18–23, 2017.
Z. Yao and C. Ze-Wen, “Research on the construction and filter method of stop-word list in text preprocessing,†Proc. - 4th Int. Conf. Intell. Comput. Technol. Autom. ICICTA 2011, vol. 1, pp. 217–221, 2011.
S. M. Babapour and M. Roostaee, “Web pages classification: An effective approach based on text mining techniques,†2017 IEEE 4th Int. Conf. Knowledge-Based Eng. Innov. KBEI 2017, vol. 2018–Janua, pp. 0320–0323, 2018.
K. Amarasinghe, M. Manic, and R. Hruska, “Optimal stop word selection for text mining in critical infrastructure domain,†Proc. - 2015 Resil. Week, RSW 2015, pp. 179–184, 2015.
A. Mishra and S. Vishwakarma, “Analysis of TF-IDF Model and its Variant for Document Retrieval,†Proc. - 2015 Int. Conf. Comput. Intell. Commun. Networks, CICN 2015, pp. 772–776, 2016.
Z. Xiaoping and S. Honghong, “Research on a VSM-based E-homework Anti-plagiarism System,†pp. 102–105, 2012.
C. Langcai, L. Zhihui, and L. Yuanfang, “Research of text clustering based on improved VSM by TF under the framework of Mahout,†Proc. 29th Chinese Control Decis. Conf. CCDC 2017, pp. 6597–6600, 2017.
B. Trstenjak, S. Mikac, and D. Donko, “KNN with TF-IDF based framework for text categorization,†Procedia Eng., vol. 69, pp. 1356–1364, 2014.
A. Guo and T. Yang, “Research and improvement of feature words weight based on TFIDF algorithm,†Proc. 2016 IEEE Inf. Technol. Networking, Electron. Autom. Control Conf. ITNEC 2016, pp. 415–419, 2016.
R. Premalatha and S. Srinivasan, “Text processing in information retrieval system using vector space model,†2014 Int. Conf. Inf. Commun. Embed. Syst. ICICES 2014, no. 978, pp. 0–5, 2015.
M. E. Sulistyo, R. Saptono, A. Asshidiq, J. Informatika, and U. S. Maret, “Penilaian Ujian Bertipe Essay Menggunakan Metode Text Similarity,†vol. 12, no. 02, pp. 146–158, 2015.
M. Alodadi and V. P. Janeja, “Similarity in Patient Support Forums: Using TF-IDF and Cosine Similarity Metrics,†Proc. - 2015 IEEE Int. Conf. Healthc. Informatics, ICHI 2015, pp. 521–522, 2015.
I. K. Hadihardaja, M. Cahyono, and I. Soekarno, “A Study of Hold-Out and K-Fold Cross Validation for Accuracy of Groundwater Modeling in Tidal Lowland Reclamation Using Extreme Learning Machine,†pp. 228–233, 2014.
S. Yadav and S. Shukla, “Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification,†Proc. - 6th Int. Adv. Comput. Conf. IACC 2016, no. Cv, pp. 78–83, 2016.
S. Sci, M. Ljumovi, and R. B. Gmbh, “Estimating Expected Error Rates of Random Forest Classifiers : A Comparison of Cross-Validation and Bootstrap,†pp. 212–215, 2015.
J. L. GarcÃa-balboa, M. V Alba-fernández, F. J. Ariza-lópez, and J. RodrÃguez-avi, “Homogeneity Test For Confusion Matrices : A Method And An Example,†pp. 1203–1205, 2018.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
________________________________________________________
The International Journal of Artificial Intelligence Research
Organized by: Prodi Teknik Informatika Fakultas Teknologi Bisnis dan Sains
Published by: Universitas Dharma Wacana
Jl. Kenanga No. 03 Mulyojati 16C Metro Barat Kota Metro Lampung
Email: jurnal.ijair@gmail.com
This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.