Empirical Text Mining for Genre Detection
نویسندگان
چکیده
In this paper, we report on a preliminary study we carried out for identifying patterns that characterize the genre type of Greek texts. In the course of our study, we address four distinct genre types, we record their observable stylistic elements and we indicate their exploitation for automatic genre-based document classification. The findings of our study demonstrate that texts contain lexical features with discriminative power as far as genre is concerned, however modeling those features so that they can be explored by computer-based applications is still in early stages.
منابع مشابه
Genre-Based Stages Classification for Polarity Analysis
Polarity detection of Online Reviews is one of the most popular tasks related to Opinion Mining. Given that most state-of-the-art solutions ignore the structural aspects of a review, we present an approach to polarity detection that, first, distinguishes stages in the genre of hotel reviews and, subsequently, evaluates the usefulness of each type of stage in the determination of the polarity of...
متن کاملOverview of the PAN/CLEF 2015 Evaluation Lab
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...
متن کاملText genres in information organization
Introduction. Text genres used by so-called information organizers in the processes of information organization in information systems were explored in this research. Method. The research employed text genre socio-functional analysis. Five genre groups in information organization were distinguished. Every genre group used in information organization is described. Empirical evidence for genre gr...
متن کاملPlagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...
متن کاملAutomatic Detection of Text Genre
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection b...
متن کامل