نتایج جستجو برای: hapax legomenon (pl. hapax legomena
تعداد نتایج: 13236 فیلتر نتایج به سال:
hapax legomenon (pl. hapax legomena; sometimes abbreviated to hapax, pl. hapaxes), is a word that occurs only once within a context, either in the written record of an entire language, in the works of an author, or in a single text. the related terms, dis legomenon, tris legomenon, and tetrakis legomenon respectively refer to double, triple, or quadruple occurrences, but are far less commonly u...
In the known literature, hapax legomena in an English text or a collection of texts roughly account for about 50% of the vocabulary. This sort of constancy is baffling. The 100-millionword British National Corpus was used to study this phenomenon. The result reveals that the hapax/vocabulary ratio follows a U-shaped pattern. Initially, as the size of text increases, the hapax/vocabulary ratio d...
Abstract Given a previously unseen form that is morphologically n-ways ambiguous, what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the relative frequencies of the various functions among the hapax legomena — the forms that occur exactly once in a corpus. This result has important impli...
We present an alignment strategy that specifically deals with the correct alignment of rare German nominal compounds to their English multiword translations. It recognizes compounds and multiwords based on their character lengths and on their most frequent POSpatterns, and aligns them based on their length ratios. Our approach is designed on the basis of a data analysis on roughly 500 German ha...
Given a form that is previously unseen in a sufficiently large training corpus, and that is morphologically n-ways ambiguous (serves n different lexical functions) what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the relative frequencies of the various functions among the hapax legomen...
In this paper we describe our analysis of vocabulary across three sets of Japanese ESL texts. We focus upon frequency analysis of individual words and multiword sequences (n-grams), giving cross comparisons of 2, 3 and 4-gram multiword sequences. In addition, we consider the degree of emphasis on multiword vocabulary that is evident in each textbook corpus. This is derived from analysis of the ...
This paper describes the LINA system for the BUCC 2015 shared track. Following (Enright and Kondrak, 2007), our system identify comparable documents by collecting counts of hapax words. We extend this method by filtering out document pairs sharing target documents using pigeonhole reasoning and cross-lingual information.
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید