Multifunctional Computational Lexicon of Contemporary Portuguese: An Available Resource for Multitype Applications

نویسندگان

  • Florbela Barreto
  • Raquel Amaro
چکیده

This paper presents some aspects of the first Portuguese frequency lexicon extracted from a corpus of large dimensions. The Multifunctional Computational Lexicon of Contemporary Portuguese (henceforth MCL) rised from the necessity of filling a gap existent in the studies of the contemporary Portuguese. Until recently, the frequency lexicons of Portuguese were of very small dimensions, such as Português Fundamental, which is constituted by 2.217 words extracted from a 700.000 word corpus and the Frequency Dictionary of Portuguese Words based on a literary corpus of 500.000 words. We describe here the main steps taken for collecting the lexical and frequency data and some of the major problems that arouse in the process. The resulting lexicon is a freely available reliable resource for several types of applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some Language Resources and Tools for Computational Processing of Portuguese at INESC

In the last few years automatic processing tools and studies based on corpora have became of a great importance for the community. The possibility of evaluating and developing such tools and studies depends on the availability of language resources. For the Portuguese language in its several national varieties these resources are not enough to meet the community needs. In this paper some valuab...

متن کامل

Extending NomLex-PT using AnCora-Nom

This work describes how we used AnCora-Nom, a Spanish nominalization lexicon, to extend NomLex-PT, a lexical resource for Portuguese, originally based on the English NomLex lexicon and fully integrated to OpenWordNet-PT, our freely available Portuguese WordNet. The complete Spanish lexicon, which contains 1,655 entries, was translated to Portuguese and then compared to our previous data. Furthe...

متن کامل

NomLex-PT: A Lexicon of Portuguese Nominalizations

This paper presents NomLex-PT, a lexical resource describing Portuguese nominalizations. NomLex-PT connects verbs to their nominalizations, thereby enabling NLP systems to observe the potential semantic relationships between the two words when analysing a text. NomLex-PT is freely available and encoded in RDF for easy integration with other resources. Most notably, we have integrated NomLex-PT ...

متن کامل

Constructing an Ontology of Japanese Lexical Properties: Specifying its Property Structures and Lexical Entries

Regarding the construction of an ontology of Japanese lexical properties (JLP-O) as fundamental in terms of establishing a conceptual framework to guide and facilitate the construction of a large-scale lexical resource (LR) database of the Japanese lexicon, this paper primarily focuses on two major concerns for the construction of the JLP-O. The first is to map out and appropriately structure t...

متن کامل

The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research

In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main obje...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004