Data in Your Language : The ECI

نویسندگان

  • Henry S. Thompson
  • David McKelvie
  • Dominique Petitpierre
چکیده

In this paper we describe the contents and the method of production of the ACL European Corpus Initiative Multilingual Corpus 1 (ECI/MC1). This is a large multilingual electronic text corpus, containing 97 million words in 27 (mainly European) languages. It is available cheaply on CDROM. Most of the texts in the corpus are marked up using a fully-validated SGML document type description based on the Text Encoding Initiative (TEI) guidelines for corpus annotation. It is hoped that this corpus will provide a useful resource for corpus-based computational linguistics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Citizenship Classes for Bhutanese-Nepali Elders: From Cognitive Deficits to Cultural-Historical Understandings

This article focuses on home-based citizenship classes for Bhutanese-Nepali elders in Central Ohio in the United States. As part of a larger longitudinal study centered in the ethnographic, language socialization, and discourse analytic traditions, the article focuses on data, particularly regular audiovideo recordings, gathered over a five-month period and tracks one student’s progress towards...

متن کامل

Tei-conformant Structural Markup of a Trilingual Parallel Corpus in the Eci Multilingual Corpus 1 1. Overview of the Eci Corpus 1.1. Brief History and Acknowledgements

In this paper we provide an overview of the ACL European Corpus Initiative (ECI) Multilingual Corpus 1 (ECI/MC1). In particular, we look at one particular subcorpus in the ECI/MC1, the trilingual corpus of International Labour Organisation reports, and discuss the problems involved in TEI-compliant structural markup and preliminary alignment of this large corpus. We discuss gross structural ali...

متن کامل

Engineer-computer interaction for structural monitoring

An increased availability of information technology (IT) is currently influencing decisions to monitor structures more frequently. IT is invariably used to interpret structural behaviour from monitoring data. However, engineers remain frustrated with IT results. Engineers work with incomplete knowledge, problem specific characteristics, and context dependency. Although such conditions require i...

متن کامل

Elements of Feminine Writing in Sepedeh Shamlu’s Novel Sorkhi-e-to az man (“Your Redness Is Mine”)

Feministic criticism concerns the function of specific feminine cultural and ideological constituents in literal works. In its approach to narrative this kind of criticism follows two methods. First, the attributes of women and their role and personality in course of the story, and second a critique of women presented which studies female authors. Sorkhi-e-to az man by Sepedeh Shamlu is one the...

متن کامل

Measuring Union and Nonunion Wage Growth: Puzzles in Search of Solutions

This paper presents conflicting evidence on trends in private sector union and nonunion wages. The BLS quarterly Employment Cost Index (ECI), constructed from establishment surveys, uses fixed weights applied to wage changes among matched job quotes. The ECI shows a substantial decrease in wage growth for union relative to nonunion workers. The annual Employer Costs for Employee Compensation (E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007