Detecting Family Resemblance: Automated Genre Classification
نویسندگان
چکیده
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
منابع مشابه
Genre theory and family resemblance - revisited *
In the following discussion I will examine the application of Wittgenstein’s concept of family resemblance to genre theory. Despite its popularity among literary theorists, there is sometimes a discrepancy between the loose concept of family resemblance, at least in its negative-radical version, and the practical assumptions made about genres. In order to overcome the inadequacies of existing a...
متن کاملVariation of Word Frequencies across Genre Classification Tasks
This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use o...
متن کاملSearching for Ground Truth: A Stepping Stone in Automating Genre Classification
This paper examines genre classification of documents and its role in enabling the effective automated management of digital documents by digital libraries and other repositories. We have previously presented genre classification as a valuable step toward achieving automated extraction of descriptive metadata for digital material. Here, we present results from experiments using human labellers,...
متن کاملAn automated approach to analysis and classification of Crypto-ransomwares’ family
There is no doubt that malicious programs are one of the permanent threats to computer systems. Malicious programs distract the normal process of computer systems to apply their roguish purposes. Meanwhile, there is also a type of malware known as the ransomware that limits victims to access their computer system either by encrypting the victimchr('39')s files or by locking the system. Despite ...
متن کامل3D Scene and Object Classification Based on Information Complexity of Depth Data
In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Data Science Journal
دوره 6 شماره
صفحات -
تاریخ انتشار 2007