Resolving fine granularity toponyms: Evaluation of a disambiguation approach

نویسندگان

  • C. Derungs
  • D. Palacio
  • R. S. Purves
چکیده

Landscape descriptions in natural language, for instance from historic corpora, are a complementary source to empirical ethnographic work, for example to research exploring variation in the use of basic levels or basic terms within landscapes across localities (c.f. Mark and Turk 2003, Burenhult and Levinson 2008, Turk et al. 2011), on the condition that such descriptions can be linked to space. A key challenge in linking language to space is the detection and resolution of toponyms (Purves and Jones 2008). Central to toponym resolution is the identification of a single unambiguous referent for a given toponym, which requires that toponym referent ambiguity is resolved (c,f, Amitay et al. 2004), i.e. does the document refer to London, England or London, Ontario. Some common state of the art approaches to toponym disambiguation use default rules, such that the most prominent referent location is resolved (c.f. Purves et al. 2007), population counts, also reflecting the prominence of referent locations (c.f. Martins et al. 2010) and geometric minimality, assuming that the areal footprint of a document is to be minimised (c.f. Leidner 2004). Leidner (2007) argued that toponym disambiguation had focused on populated places, since such locations are important for a variety of applications (e.g. local search or news mapping). However, if we wish to resolve toponyms with a fine spatial granularity, such as those typically used to reference mountains, hills, fields or hamlets in natural landscape descriptions, state of the art disambiguation approaches must be adapted to work independently from a priori toponym knowledge that is usually attached to populated places and commonly found in gazetteers (Hill 2006). We present an approach for toponym disambiguation, working independently from a priori toponym knowledge. We evaluate its performance over a baseline disambiguation technique on an extensive corpus consisting of 150 years of Swiss alpine literature (Volk et al. 2009). Toponym knowledge is gathered from geomorphometric characteristics at locations of toponyms. This reflects the strong relation between toponyms and topography, since toponyms are used to name geographic objects that are attached to the earth’s surface (Smith and Mark 2003) and therefore can be hypothesised to be bound to its characteristics. We show that in a user-centered evaluation with spatial queries (i.e. classifying articles contained in the alpine corpus as being relevant or not for certain spatial extents) our approach to disambiguation, using geomorphometric information, significantly outperforms baseline disambiguation (27% improvement).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toponym Disambiguation by Arborescent Relationships

Problem statement: The way of referring to a place in the geographical space can be formal, based on the spatial coordinates, or informal, which we use in natural language by using toponyms (place names). A toponym can represent several geographical places. This ambiguity made problematic its conversion towards a unique formal representation. Toponym disambiguation in text is the task of assign...

متن کامل

Semantic Similarities between Locations based on Ontology

Toponym disambiguation or location names resolution is a critical task in unstructured text, articles or documents. Our research explores how to link ambiguous locations mentioned in documents, news and articles with latitude/longitude coordinates. We designed an evaluation system for toponym disambiguation based on annotated GEOCLEF data. We implemented a node-based approach taking population ...

متن کامل

Toponym Disambiguation Using Ontology-Based Semantic Similarity

We propose a new heuristic for toponym sense disambiguation, to be used when mapping toponyms in text to ontology concepts, using techniques based on semantic similarity measures. We evaluated the proposed approach using a collection of Portuguese news articles from which the geographic entity names were extracted and then manually mapped to concepts in a geospatial ontology covering the territ...

متن کامل

Disambiguating Toponyms in News

This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazett...

متن کامل

A structural approach to the automatic adjudication of word sense disagreements

The semantic annotation of texts with senses from a computational lexicon is a complex and often subjective task. As a matter of fact, the fine granularity of the WordNet sense inventory [Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database MIT Press], a de facto standard within the research community, is one of the main causes of a low inter-tagger agreement ranging betwee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012