A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names

نویسندگان

  • Dongyang Hou
  • Hao Wu
  • Jun Chen
  • Ran Li
چکیده

Place name is an important ingredient of borderlands situation information and plays a significant role in collecting them from the Internet with focused crawlers. However, current focused crawlers treat place name in the same way as any other common keyword, which has no geographical properties. This may reduce the effectiveness of focused crawlers. To solve the problem, this paper firstly discusses the importance of place name in focused crawlers in terms of location and spatial relation, and, then, proposes the two-tuple-based topic representation method to express place name and common keyword, respectively. Afterwards, spatial relations between place names are introduced to calculate the relevance of given topics and webpages, which can make the calculation process more accurately. On the basis of the above, a focused crawler prototype for borderlands situation information collection is designed and implemented. The crawling speed and F-Score are adopted to evaluate its efficiency and effectiveness. Experimental results indicate that the efficiency of our proposed focused crawler is consistent with the polite access interval and it could meet the daily demand of borderlands situation information collection. Additionally, the F-Score value of our proposed focused crawler increases by around 7%, which means that our proposed focused crawler is more effective than the traditional best-first focused crawler. OPEN ACCESS Sustainability 2014, 6 6530

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

The Place-Name as an Intangible Place of Memory (A Holistic Approach in Reading the Place-Names through a Comparative-Analytical Study on the Character of Name and Place)

Understanding architectural heritage and their various aspects have always been a subject of focus for the international conservation communities. Within the recent decades, eventhough the place-names are part of the living history as well as cultural heritage, they have still constantly been facing quick precipitant changes. As such, in the Conservation literature, most studies have skipped ad...

متن کامل

Modelling vague places with knowledge from the Web

Place names are often used to describe and to enquire about SPIRIT project D24 3301 Modelling vague places with knowledge from the web IST-2001-35047 v 1.1 Page 2 of 29 geographical information. Geographical place name ontologies, such as gazetteers and geographical thesauri, can help automate geographical information retrieval tasks that employ place names. Most gazetteer resources encoding kn...

متن کامل

Voronoi-based region approximation for geographical information retrieval with gazetteers

Gazeteers and geographical thesauri can be regarded as parsimonious spatial models that associate geographical location with place names and encode some semantic relations between the names. They are of particular value in processing information retrieval requests in which the user employs place names to specify geographical context. Typically the geometric locational data in a gazetteer are co...

متن کامل

Landscape Analysis of Geographical Names in Hubei Province, China

Hubei Province is the hub of communications in central China, which directly determines its strategic position in the country’s development. Additionally, Hubei Province is well-known for its diverse landforms, including mountains, hills, mounds and plains. This area is called “The Province of Thousand Lakes” due to the abundance of water resources. Geographical names are exclusive names given ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014