A Printed Arabic Optical Character Recognition System using Deep Learning

Salah Alghyaline

doi:10.3844/jcssp.2022.1038.1050

Research Article Open Access

A Printed Arabic Optical Character Recognition System using Deep Learning

Salah Alghyaline¹

¹ Department of Computer Science, the World Islamic Sciences and Education University, Amman, Jordan

Abstract

RecognizingArabic script is challenging for many reasons: The Arabic language is cursiveand morphologically rich. There is a high similarity between Arabic letters.Moreover, Arabic has many diacritics and dots, and they change the letter'sphonetic transcription. This study proposes a Printed Arabic Optical CharacterRecognition approach (PAOCR) based on the state-of-the-art You Only Look Once(YOLO) object detector. Four techniques were proposed and implemented to designan end-to-end Arabic OCR system. First, the YOLO4 object detector is customizedand trained on deep Convolutional Neural Networks (CNNs) to recognize Arabiccharacters. Second, the overlapped bounding boxes are processed to keep themost accurate box for each character. Third, the Hunspell library checks theword spelling and corrects the wrong ones. Fourth, edit distance is used tocompare OCR misspelled words with Hunspell's suggestions and choose the closestcorrect word. The experimental results in the Arabic Printed Text Image (APTI)dataset showed that the Word Recognition Rate (WRR) of customized YOLO4 ArabicOCR is 66.7%, whereas applying the three proposed techniques with YOLO4achieved 82.4%. A comparison with two existing OCR systems Tesseract and ABBYYshowed that the proposed approach achieved a 95.7% Character Recognition Rate(CRR), whereas Tesseract and ABBYY achieved 92.3 and 84.8%, respectively.

Journal of Computer Science

Volume 18 No. 11, 2022, 1038-1050

DOI: https://doi.org/10.3844/jcssp.2022.1038.1050

Submitted On: 14 August 2022 Published On: 2 November 2022

How to Cite: Alghyaline, S. (2022). A Printed Arabic Optical Character Recognition System using Deep Learning. Journal of Computer Science, 18(11), 1038-1050. https://doi.org/10.3844/jcssp.2022.1038.1050

Copyright: © 2022 Salah Alghyaline. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

3,246 Views
2,399 Downloads
6 Citations

Download

Keywords

Optical Character Recognition
Arabic OCR
Convolutional Neural Network
YOLO4