نتایج جستجو برای: hapax legomenon (pl. hapax legomena

تعداد نتایج: 13236  

Journal: :پژوهش های قرآن و حدیث 0
مرتضی کریمی نیا مربی گروه قرآن و حدیث دانشگاه آزاد اسلامی، واحد علوم و تحقیقات تهران

hapax legomenon (pl. hapax legomena; sometimes abbreviated to hapax, pl. hapaxes), is a word that occurs only once within a context, either in the written record of an entire language, in the works of an author, or in a single text. the related terms, dis legomenon, tris legomenon, and tetrakis legomenon respectively refer to double, triple, or quadruple occurrences, but are far less commonly u...

Journal: :Computational Linguistics 2010
Fan Fengxiang

In the known literature, hapax legomena in an English text or a collection of texts roughly account for about 50% of the vocabulary. This sort of constancy is baffling. The 100-millionword British National Corpus was used to study this phenomenon. The result reveals that the hapax/vocabulary ratio follows a U-shaped pattern. Initially, as the size of text increases, the hapax/vocabulary ratio d...

Journal: :CoRR 1995
R. Harald Baayen Richard Sproat

Abstract Given a previously unseen form that is morphologically n-ways ambiguous, what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the relative frequencies of the various functions among the hapax legomena — the forms that occur exactly once in a corpus. This result has important impli...

2006
Bettina Schrader

We present an alignment strategy that specifically deals with the correct alignment of rare German nominal compounds to their English multiword translations. It recognizes compounds and multiwords based on their character lengths and on their most frequent POSpatterns, and aligns them based on their length ratios. Our approach is designed on the basis of a data analysis on roughly 500 German ha...

Journal: :Computational Linguistics 1996
R. Harald Baayen Richard Sproat

Given a form that is previously unseen in a sufficiently large training corpus, and that is morphologically n-ways ambiguous (serves n different lexical functions) what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the relative frequencies of the various functions among the hapax legomen...

2008
George R S Weir Toshiaki Ozasa

In this paper we describe our analysis of vocabulary across three sets of Japanese ESL texts. We focus upon frequency analysis of individual words and multiword sequences (n-grams), giving cross comparisons of 2, 3 and 4-gram multiword sequences. In addition, we consider the degree of emphasis on multiword vocabulary that is evident in each textbook corpus. This is derived from analysis of the ...

2015
Emmanuel Morin Amir Hazem Florian Boudin Elizaveta Loginova Clouet

This paper describes the LINA system for the BUCC 2015 shared track. Following (Enright and Kondrak, 2007), our system identify comparable documents by collecting counts of hapax words. We extend this method by filtering out document pairs sharing target documents using pigeonhole reasoning and cross-lingual information.

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید