Improving Performance Sentiment Analysis Movie Review Film using Random Forest with Feature Selection Information Gain
(1) Vinsent Brilian Adiguna (Department of Faculty Informatics Engineering, University Dian Nuswantoro, Semarang, Indonesia) (2) Muslihul Aqqad (Department of Faculty Informatics Engineering, University Dian Nuswantoro, Semarang, Indonesia) (3) * Purwanto Purwanto (Department of Faculty Informatics Engineering, University Dian Nuswantoro, Semarang, Indonesia) (4) Jaluanto Sunu Jaluanto Sunu (Department of Faculty Economics and Business, University 17 August 1945 Semarang, Semarang, Indonesia) (5) Honorata Ratnawati Honorata Ratnawati (Department of Faculty Economics and Business, University 17 August 1945 Semarang, Semarang, Indonesia) *corresponding author
Abstract
Sentiment analysis in film reviews is an important task to understand the audience's opinion towards a cinematic work. However, the complexity and subjectivity of language in film reviews pose a challenge. This research explores the application of Random Forest algorithm, an ensemble learning method, to perform sentiment classification on film reviews. Random Forest is built from a set of decision trees, each of which provides a prediction, and the final result is obtained from majority voting. This approach has the advantage of handling overfitting data. This research uses 500 review datasets along with positive and negative sentiment labels. The review text is represented as Information Gain and TF-IDF features to model the weight of each word. The Random Forest model is then trained using these features to predict sentiment labels. The performance of the model is evaluated using metrics such as accuracy, precision, recall and f1-score. The experimental results show that Random Forest is able to achieve 95.20% accuracy in sentiment classification of film reviews, surpassing the Support Vector Machine classification algorithm which in previous studies only achieved 92%. These findings provide a new perspective on the benefits of ensemble learning in sentiment analysis and its potential application in other domains such as marketing and public opinion analysis.
Keywords
random forest, information gain, feature selection, sentiment analysis.