The Web as a Parallel Corpus

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Web as a Parallel Corpus

Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structur...

متن کامل

Web as a Corpus

The World Wide Web offers a unique possibility to create very large and high quality text collections with low manual work necessary. In this paper, we describe requirements for usable linguistic corpus and we present a routine for building such corpus from the web pages. Several important issues that must be resolved for successful processing are discussed. As a pilot study, we create a billio...

متن کامل

GoogleLing: The Web as a Linguistic Corpus

We describe software to transform any search engine or searchable corpus into a tool for linguistic research with a rich query syntax. We provide support for case sensitive searches, within-sentence and within-N-words match constraints, part-ofspeech restrictions on words, and “smart” verb-ending inflection wildcards. The software generalizes the query for the underlying search engine, and then...

متن کامل

Web as Corpus

The corpus resource for the 1990s was the BNC. Conceived in the 80s, completed in the mid 90s, it was hugely innovative and opened up myriad new research avenues for comparing different text types, sociolinguistics, empirical NLP, language teaching and lexicography. But now the web is with us, giving access to colossal quantities of text, of any number of varieties, at the click of a button, fo...

متن کامل

Annotated Web As Corpus

This paper presents a proposal to facilitate the use of the annotated web as corpus by alleviating the annotation bottleneck for corpus data drawn from the web. We describe a framework for large-scale distributed corpus annotation using peerto-peer (P2P) technology to meet this need. We also propose to annotate a large reference corpus in order to evaluate this framework. This will allow us to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Linguistics

سال: 2003

ISSN: 0891-2017,1530-9312

DOI: 10.1162/089120103322711578