نتایج جستجو برای: n grams
تعداد نتایج: 982486 فیلتر نتایج به سال:
Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assumption: bag of words. However, text is indeed a sequence of discrete word tokens, and without considering the order of words (in another word, the nearby context where a word is located), the accurate meaning of language cannot be exactly captured by word co-occurrences only. In this sense, collocation...
A language-independent means of gauging topical similarity in unrestricted text is described. The method combines information derived from n-grams (consecutive sequences of n characters) with a simple vector-space technique that makes sorting, categorization, and retrieval feasible in a large multilingual collection of documents. No prior information about document content or language is requir...
Embedded malware is a recently discovered security threat that allows malcode to be hidden inside a benign file. It has been shown that embedded malware is not detected by commercial antivirus software even when the malware signature is present in the antivirus database. In this paper, we present a novel anomaly detection scheme to detect embedded malware. We first analyze byte sequences in ben...
The paper introduces and discusses a concept of syntactic n-grams (sn-grams) that can be applied instead of traditional n-grams in many NLP tasks. Sn-grams are constructed by following paths in syntactic trees, so sngrams allow bringing syntactic knowledge into machine learning methods. Still, previous parsing is necessary for their construction. We applied sn-grams in the task of authorship at...
In this work, a study investigation was carried out using n-grams to classify sentiments with different machine learning and deep methods. We used approach, which combines existing techniques, the problem of predicting sequence tags understand advantages problems confronted unigrams, bigrams trigrams analyse economic texts. Our aims fill gap by evaluating performance these features on texts in ...
In this work we investigate the usefulness of n-grams for document indexing in text categorization (TC). We call n-gram a set gk of n word stems, and we say that gk occurs in a document dj when a sequence of words appears in dj that, after stop word removal and stemming, consists exactly of the n stems in gk, in some order. Previous researches have investigated the use of n-grams (or some varia...
N-grams have been widely investigated for a number of text processing tasks. However n-gram based systems often labor under the large memory requirements of naïve storage of the large vectors that describe the many n-grams that could potentially appear in documents. This problem becomes more severe as the number of documents (and hence the number of vectors to store and process) rises. A natura...
N-grams have been widely investigated for a number of text processing tasks. However n-gram based systems often labor under the large memory requirements of naïve storage of the large vectors that describe the many n-grams that could potentially appear in documents. This problem becomes more severe as the number of documents (and hence the number of vectors to store and process) rises. A natura...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید