Research Article Open Access

Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler

P. Jaganathan1 and T. Karthikeyan2
  • 1 , India
  • 2 Bharathiar University, India
Journal of Computer Science
Volume 11 No. 1, 2015, 120-126

DOI: https://doi.org/10.3844/jcssp.2015.120.126

Published On: 13 September 2014

How to Cite: Jaganathan, P. & Karthikeyan, T. (2015). Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler. Journal of Computer Science, 11(1), 120-126. https://doi.org/10.3844/jcssp.2015.120.126

Abstract

With the growing industrial impact over the recent years in computer science, data mining has established itself as one of the most important disciplines. In the fast growing Web and in an appropriate amount of time, locating the resources that are precise and relevant is a huge challenge for the all-purpose single process crawlers, which makes the enhanced and the convincing algorithm in demand. Gradually Large scale search engines frequently update their index and in a timely behavior which are not capable to present such information. In this study a scalable focused crawling is proposed with an incremental parallel Web crawler, the Web pages can be crawled concurrently that are relevant to multiple pre-defined topics. Furthermore, to solve the issue of URL distribution, a compound decision model based on multi-objective decision making method is introduced, which will consider multiple factors synthetically such as load balance and relevance, the update frequency issue can be solved by the local repository decision. The result shows that our proposed system will efficiently produce high quality, relevance and freshness with significantly low memory requirement.

Download

Keywords

  • Focused Crawler
  • Incremental Web Crawler
  • URL Distribution Issue
  • Load Balance
  • Relevance