Romanian Linguistic Resources On Very Large Scale

نویسنده

  • Dan Cristea
چکیده

This paper suggests a methodology for building a technological environment for linguistic processing, intended to conserve, update and exploit, for research, for public and for commercial purposes, strategic linguistic resources of the Romanian language, rooted in textual data contributed daily and in the long run by important editorial houses and mass-media institutions. In essence, it describes a technology able to receive, store and continuously process large amounts of textual data, received from voluntary contributors, on a daily basis. Apart from storing linguistic data à la longue for the benefit of preserving the language, the results of the processing will be returned to three categories of users: the researchers working on Romanian language and computational linguistics, the contributors of the resources, and the public at large. Such an initiative is motivated by the growing needs for linguistic resources, including textual data and processing tools, which are manifested in social sciences and humanities, and which should bring the Romanian language1, now still less-resourced, to the level of technologically-rich languages of Europe. Raising the quantity of resources dedicated to different languages was a constant preoccupation in Europe over the past 15 years2, triggered by the necessity to boost

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Resources and Technologies for Romanian Language

This paper revises notions related to Language Resources and Technologies (LRT), including a brief overview of some resources developed worldwide and with a special focus on Romanian language. It then describes a joined Romanian, Moldavian, English initiative aimed at developing electronically coded resources for Romanian language, tools for their maintenance and usage, as well as for the creat...

متن کامل

Optimal Romanian clitics:A cross-linguistic perspective*

Comparative Issues in Romanian Syntax held at the University of New Brunswick, Saint John, Canada; at the 1996 Going Romance conference held in Utrecht, the Netherlands; at the 1997 Linguistic Symposium on Romance Languages held at UC Irvine, and at the 1997 Hopkins Optimality Theory Workshop & University of Maryland Mayfest in Baltimore. I would like to thank audiences at these meetings for th...

متن کامل

A Generic Platform for Developing Language Resources and Applications

The paper describes a unification-based language engineering platform meant for development of reversible language resources and linguistic applications. The platform, called EGLU (Environnment Generique Linguistique d’Unification) is an enhanced generalized port of ISSCO’s original ELU from SUN-OS Allegro Common Lisp to Macintosh Common Lisp and Carnegie Mellon Lisp (under Solaris). Several la...

متن کامل

MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

The paper presents the fourth, “Mondilex” edition of the MULTEXT-East language resources, a multilingual dataset for language engineering research and development, focused on the morphosyntactic level of linguistic description. This standardised and linked set of resources covers a large number of mainly Central and Eastern European languages and includes the EAGLES-based morphosyntactic specif...

متن کامل

Integration of Large-Scale Linguistic Resources in a Natural Language Understanding System

Knowledge acquisition is a serious bottleneck for natural language understanding systems. For this reason, large-scale linguistic resources have been compiled and made available by organizations such as the Linguistic Data Consortium (Comlex) and Princeton University (WordNet). Systems making use of these resources can greatly accelerate the development process by avoiding the need for the deve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • The Computer Science Journal of Moldova

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2011