Novel Automatic Query Building Algorithm Using Similarity Thesaurus

Hayel Khafajeh; Aymen Abu-Errub; Ashraf Odeh; Nidal Yousef

doi:10.3844/ajassp.2012.1373.1377

Research Article Open Access

Novel Automatic Query Building Algorithm Using Similarity Thesaurus

Hayel Khafajeh¹, Aymen Abu-Errub², Ashraf Odeh³ and Nidal Yousef³

¹ Department of CIS, Faculty of Computing and Information Technology, Zarqa University, Zarqa, Jordan
² Department of CIS, Faculty of Information Technology, Al-Ahliyya Amman University, Amman, Jordan
³ Department of CIS, Faculty of Information Technology, AL Isra University, Amman, Jordan

Abstract

One of the most effective factors on the natural language researches is the data set which plays a significant role in designing, improving and evaluation the information retrieval systems and other applications for natural language processing. Unfortunately, building a proper data set consume time, labor and effort, in particular the query extraction from the data set documents. In this study, a novel algorithm for query extraction from any collection of documents was suggested, the algorithm elaborate the similarity thesaurus for query extraction, which leads to the ability of using the algorithm on any language, to evaluate the suggested algorithm a data set that consist of 242 Arabic documents and 60 queries was used, 48 queries was extracted 20 of them appeared in manual data set and all of them was relevant with more than one document in the used collection.

American Journal of Applied Sciences

Volume 9 No. 9, 2012, 1373-1377

DOI: https://doi.org/10.3844/ajassp.2012.1373.1377

Submitted On: 1 February 2012 Published On: 14 July 2012

How to Cite: Khafajeh, H., Abu-Errub, A., Odeh, A. & Yousef, N. (2012). Novel Automatic Query Building Algorithm Using Similarity Thesaurus. American Journal of Applied Sciences, 9(9), 1373-1377. https://doi.org/10.3844/ajassp.2012.1373.1377

Copyright: © 2012 Hayel Khafajeh, Aymen Abu-Errub, Ashraf Odeh and Nidal Yousef. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

4,727 Views
3,696 Downloads
1 Citations

Download

Keywords

Information retrieval
natural language processing
Arabic corpora
similarity thesaurus