An Analysis of POS Tag Patterns in Ontology Identifiers and Labels

نویسنده

  • Sandra Williams
چکیده

I describe an analysis of the syntax of identifier names found in a corpus of over 500 ontologies. The analysis was performed in five steps: (i) extraction of identifier names from the corpus; (ii) construction of dummy sentences containing the identifiers; (iii) part-of-speech (POS) tagging; (iv) extraction of POS tag strings; (v) POS string frequency analysis; and (vi) general syntactic pattern analysis. The findings of the analysis were that identifier names follow simple syntactic patterns; each type of identifier can be expressed through relatively few patterns; and the syntax of identifiers differs from natural English in consistent ways.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل ترجمه عبارت-مرزی با استفاده از برچسب‌های کم‌عمق نحوی

Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...

متن کامل

معرفی رویکردی ماشینی با استفاده از الگوریتم لسک و برچسبدهی نحوی جهت رفع ابهام از معنای کلمات

The present study introduces a machine-based approach for word sense disambiguation (WSD). In Persian, a morphologically complex language, POS tag which lots of homographs are made, one way for doing WSD is allocating the right Part Of Speech (POS) tags to words prior to WSD. Since the frequency of noun and adjective homographs in different Persian POS tag text corpuses is high, POS tag disambi...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

A Survey of Identifiers and Labels in OWL Ontologies

We present a survey of the usage and style of identifiers and labels of named entities in a corpus of OWL ontologies. We investigated the frequency of use of both labels and meaningful or meaningless identifiers in those ontologies. We also surveyed common practices of lexical encoding styles for identifiers. We found that most ontologies do not use labels for named entities. When they do use l...

متن کامل

HMM Based Chunker for Hindi

This paper presents an HMM-based chunk tagger for Hindi. Various tagging schemes for marking chunk boundaries are discussed along with their results. Contextual information is incorporated into the chunk tags in the form of partof-speech (POS) information. This information is also added to the tokens themselves to achieve better precision. Error analysis is carried out to reduce the number of c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013