Anaphora Resolution in Thai EDU Segmentation

Authapon Kongwan; Siti Sakira Kamaruddin; Farzana Kabir Ahmad

doi:10.3844/jcssp.2022.306.315

Research Article Open Access

Anaphora Resolution in Thai EDU Segmentation

Authapon Kongwan¹, Siti Sakira Kamaruddin¹ and Farzana Kabir Ahmad¹

¹ School of Computing, Universiti Utara Malaysia, Malaysia

Abstract

Human knowledge is mostly in the form of unstructured text. Text can be transcribed into various languages such as the Thai language. To extract knowledge from Thai text, natural language tasks such as word segmentation, Elementary Discourse Unit (EDU) segmentation, and anaphora resolution is the needed tasks. Some interesting phenomena such as non-referential anaphora and the ellipsis of the owner are the significant problems that are necessary to resolve before constructing the complete semantic in the Natural Language Processing (NLP) application. The non-referential anaphora must be detected before identifying the referential anaphora to improve the precision of the anaphora resolution. The ellipsis of the owner is also a crucial problem that needs to be resolved to find the complete semantics. This study presents the methodology to resolve the anaphora from Thai EDU segmentation. The methodology is divided into 2 parts: Thai morphological analysis and the anaphora resolution. The ranking model is applied to resolve the reference of anaphora with the features from the surface word, surround word, syntactic information, and ontology. The results show that precision is 0.77, recall is 0.84 and the F1 score is 0.81.

Journal of Computer Science

Volume 18 No. 4, 2022, 306-315

DOI: https://doi.org/10.3844/jcssp.2022.306.315

Submitted On: 30 December 2021 Published On: 5 May 2022

How to Cite: Kongwan, A., Kamaruddin, S. S. & Ahmad, F. K. (2022). Anaphora Resolution in Thai EDU Segmentation. Journal of Computer Science, 18(4), 306-315. https://doi.org/10.3844/jcssp.2022.306.315

Copyright: © 2022 Authapon Kongwan, Siti Sakira Kamaruddin and Farzana Kabir Ahmad. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

2,988 Views
1,852 Downloads
3 Citations

Download

Keywords

Anaphora Resolution
Thai Anaphora
Ranking Model
Natural Language Processing