Technical Report Open Access

Comparison of Stochastic and Rule-Based POS Tagging on Malay Online Text

Kalaiarasi Sonai Muthu Anbananthen1, Jaya Kumar Krishnan1, Mohd. Shohel Sayeed1 and Praviny Muniapan1
  • 1 Department of Information Science and Technology, Multimedia University, Melaka, Malaysia


Extensive development of web 2.0 has led to production of gigantic amount of user generated data. These data consist of many useful information. Manual analyzing these data and classifying sentiment in them, is an exhausting task, thus opinion mining method is needed. Opinion mining approach uses natural language processing where Part-of-Speech (POS) Tagging is a crucial part. The performance of any NLP system depends on the accuracy of a POS tagger. Two main issues that affect the accuracy of POS tagger are unknown words and ambiguity. Although research on POS tagging has been back dated few decades ago, yet they have been mostly focused on English. Research on Malay language is still in the early stage. Also, online Malay Text differs from proper Malay text, in the sense of structure and also grammar. Online users tend use a lot of abbreviations and short forms in their text. Besides this, the “BahasaRojak” phenomena complicate tagging process even further. Thus taking all these into consideration, in this study, we will review stochastic and rule-based POS tagging methodologies to deal with ambiguous and unknown words on online Malay text.

American Journal of Applied Sciences
Volume 14 No. 9, 2017, 843-851


Submitted On: 28 September 2016 Published On: 5 April 2017

How to Cite: Anbananthen, K. S. M., Krishnan, J. K., Sayeed, M. S. & Muniapan, P. (2017). Comparison of Stochastic and Rule-Based POS Tagging on Malay Online Text. American Journal of Applied Sciences, 14(9), 843-851.

  • 13 Citations



  • Opinion Mining
  • Part-of-Speech Tagging
  • Malay Language
  • Malay Online Text
  • Rule Based Approach
  • Stochastic Approach