Research Article Open Access

An Automatic Collocation Extraction from Arabic Corpus

Abdulgabbar Mohammad Saif and Mohd J.A. Aziz

Abstract

Problem statement: The identification of collocations is very important part in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. Because of the complexities of Arabic, the collocations undergo some variations such as, morphological, graphical, syntactic variation that constitutes the difficulties of identifying the collocation. Approach: We used the hybrid method for extracting the collocations from Arabic corpus that is based on linguistic information and association measures. Results: This method extracted the bi-gram candidates of Arabic collocation from corpus and evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. Conclusion: The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

Journal of Computer Science
Volume 7 No. 1, 2011, 6-11

DOI: https://doi.org/10.3844/jcssp.2011.6.11

Submitted On: 14 October 2010 Published On: 16 December 2010

How to Cite: Saif, A. M. & Aziz, M. J. (2011). An Automatic Collocation Extraction from Arabic Corpus. Journal of Computer Science, 7(1), 6-11. https://doi.org/10.3844/jcssp.2011.6.11

  • 4,074 Views
  • 4,191 Downloads
  • 14 Citations

Download

Keywords

  • Collocation extraction
  • hybrid methods
  • collocation variations
  • Association measures
  • morphosyntactic
  • graphical variants
  • n-best evaluation