The Accuracy Improvement of Text Mining Classification on Hospital Review through The Alteration in The Preprocessing Stage


  • Triyas Hevianto Saputro Universitas Teknologi Yogyakarta, Yogyakarta, Indonesia
  • Arief Hermawan Universitas Teknologi Yogyakarta, Yogyakarta, Indonesia



accuracy improvement; preprocessing; text classification


Sentiment analysis is a part of text mining used to dig up information from a sentence or document. This study focuses on text classification for the purpose of a sentiment analysis on hospital review by customers through criticism and suggestion on Google Maps Review. The data of texts collected still contain a lot of nonstandard words. These nonstandard words cause problem in the preprocessing stage. Thus, the selection and combination of techniques in the preprocessing stage emerge as something crucial for the accuracy improvement in the computation of machine learning. However, not all of the techniques in the preprocessing stage can contribute to improve the accuracy on classification machine. The objective of this study is to improve the accuracy of classification model on hospital review by customers for a sentiment analysis modeling. Through the implementation of the preprocessing technique combination, it can produce a highly accurate classification model. This study experimented with several preprocessing techniques: (1) tokenization, (2) case folding, (3) stop words removal, (4) stemming, and (5) removing punctuation and number. The experiment was done by adding the preprocessing methods: (1) spelling correction and (2) Slang. The result shows that spelling correction and Slang method can assist for improving the accuracy value. Furthermore, the selection of suitable preprocessing technique combination can fasten the training process to produce the more ideal text classification model.


I. M. A. Agastya, “Pengaruh Stemmer Bahasa Indonesia Terhadap Peforma Analisis Sentimen Terjemahan Ulasan Film,” Jurnal Tekno Kompak, vol. 12, no. 1, pp. 18–23, 2018.

S. Symeonidis, D. Effrosynidis, and A. Arampatzis, “A comparative Evaluation Of Pre-Processing Techniques and Their Interactions for Twitter Sentiment Analysis,” Expert Syst. Appl., vol. 110, pp. 298–310, 2018.

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Informasi, vol. 10, no. 4, pp. 1–68, 2019.

J. Nothman, H. Qin, and R. Yurchak, “Stop Word Lists in Free Open-source Software Packages,” in Proceedings of Workshop for {NLP} Open Source Software ({NLP}-{OSS}), pp. 7–12, 2018.

K. S. Nugroho, I. Istiadi, and F. Marisa, “Optimasi naive Bayes classifier untuk klasifikasi teks pada e-government menggunakan particle swarm optimization,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 1, pp. 21–26f, 2020.

Haniah Mahmudah, Okkie Puspitorini, Nur Adi Siswandari, Ari Wijayanti, and Eliya Alfatekha, “Metode Naive Bayes Classifier – Smoothing pada Sensor Smartphone untuk Klasifikasi Aktivitas Pengendara,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 9, no. 3, pp. 268–277, 2020.

R. Rianto, A. Mutiara, E. Prasetyo, and P. Santosa, “Improving the Accuracy of Text Classification using Stemming Method, A Case of Nonformal Indonesian Conversation.”, 2020.

P. Y. Saputra, D. H. Subhi, and F. Z. A. Winatama, “Implementasi Sentimen Analisis Komentar Channel Video Pelayanan Pemerintah di YouTube Menggunakan Algoritma Naïve Bayes,” Jurnal Informatika Polinema, vol. 5, no. 3, pp. 209–213, 2019.

M. Zidny, “Pengaruh Semantic Expansion pada Naïve Bayes Classifier untuk Analisis Sentimen Tokoh Masyarakat,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 2, pp. 141–147, 2019.

A. Rakhman and M. R. Tsani, “Analisis Sentimen Review Media Massa,” Smart Computer, vol. 8, no. 2, 2019, pp. 78–82.

S. Khomsah and A. S. Aribowo, “Model Text-Preprocessing Komentar YouTube Dalam Bahasa Indonesia,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 1, no. 10, pp. 648–654, 2021.

M. S. Simanjuntak, H. Sujaini, and N. Safriadi, “Spelling Corrector Bahasa Indonesia dengan Kombinasi Metode Peter Norvig dan N-Gram,” Jurnal Edukasi dan Penelitian Informatika, vol. 4, no. 1, 2018, p. 17.

A. I. Fahma, “Identifikasi Kesalahan Penulisan Kata (Typographical Error) pada Dokumen Berbahasa Indonesia Menggunakan Metode N-gram dan Levenshtein Distance,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 1, 2018, pp. 53–62.

E. Loper and S. Bird, “NLTK: The Natural Language Toolkit,” in In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics, 2002.

N. Aliyah Salsabila, Y. Ardhito Winatmoko, A. Akbar Septiandri, and A. Jamal, “Colloquial Indonesian Lexicon,” in 2018 International Conference on Asian Language Processing (IALP), pp. 226–229, 2018.

C. Sammut and G. I. Webb, Eds., “TF--IDF,” in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2010, pp. 986–987.

G. I. Webb, “Naïve Bayes,” in Encyclopedia of Machine Learning, C. Sammut and G. I. Webb, Eds. Boston, MA: Springer US, 2010, pp. 713–714.

M. Fachrie, “Machine Learning for Data Classification in Indonesia Regional Elections Based nn Political Parties Support,” Jurnal Ilmu Komputer dan Informatika (Journal Computer Scence Information), vol. 13, no. 2, pp. 89–96, 2020.

F. Pedregosa et al., “Scikit-learn: Machine Learning in {P}ython,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

B. N. Province, “The effects of parameter tuning on machine learning performance in a software defect prediction context,” 2015.

J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, 2012.

S. V Stehman, “Selecting and interpreting measures of thematic classification accuracy,” Remote Sensensory Environment, vol. 62, no. 1, pp. 77–89, 1997.

T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Lett., vol. 27, no. 8, pp. 861–874, 2006.




How to Cite

Saputro, T. H., & Hermawan, A. . (2021). The Accuracy Improvement of Text Mining Classification on Hospital Review through The Alteration in The Preprocessing Stage. International Journal of Computer and Information Technology(2279-0764), 10(4).