The Accuracy Improvement of Text Mining Classification on Hospital Review through The Alteration in The Preprocessing Stage


  • Triyas Hevianto Saputro Universitas Teknologi Yogyakarta, Yogyakarta, Indonesia
  • Arief Hermawan Universitas Teknologi Yogyakarta, Yogyakarta, Indonesia



accuracy improvement; preprocessing; text classification


Sentiment analysis is a part of text mining used to dig up information from a sentence or document. This study focuses on text classification for the purpose of a sentiment analysis on hospital review by customers through criticism and suggestion on Google Maps Review. The data of texts collected still contain a lot of nonstandard words. These nonstandard words cause problem in the preprocessing stage. Thus, the selection and combination of techniques in the preprocessing stage emerge as something crucial for the accuracy improvement in the computation of machine learning. However, not all of the techniques in the preprocessing stage can contribute to improve the accuracy on classification machine. The objective of this study is to improve the accuracy of classification model on hospital review by customers for a sentiment analysis modeling. Through the implementation of the preprocessing technique combination, it can produce a highly accurate classification model. This study experimented with several preprocessing techniques: (1) tokenization, (2) case folding, (3) stop words removal, (4) stemming, and (5) removing punctuation and number. The experiment was done by adding the preprocessing methods: (1) spelling correction and (2) Slang. The result shows that spelling correction and Slang method can assist for improving the accuracy value. Furthermore, the selection of suitable preprocessing technique combination can fasten the training process to produce the more ideal text classification model.


