Automatic Classification of Scientific Records using the German Subject Heading Authority File (SWD)
نویسندگان
چکیده
The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
منابع مشابه
Matching Multi-lingual Subject Vocabularies
Most libraries and other cultural heritage institutions use controlled knowledge organisation systems, such as thesauri, to describe their collections. Unfortunately, as most of these institutions use different such systems, unified access to heterogeneous collections is difficult. Things are even worse in an international context when concepts have labels in different languages. In order to ov...
متن کاملAuthority Control of People and Organizations on the Semantic Web
Authors and documents with identical titles are common in the digital library environment. In order to manage identities correctly, authority control is used by library and information scientists for disambiguating and cross-referencing entity names. We argue that the benefits of traditional authority control can be enhanced by using techniques and technologies of the Semantic Web, leading to s...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملDimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)
This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...
متن کاملLibrary of Congress Classification as linked data
In 2009 and in 2011, the Library of Congress made two of its largestauthority files –Subject Headings and Names available as linked data via LC’slinked data service, id.loc.gov. Both are offered in MADS/RDF and SKOS. It isLC’s objective, in 2012, to publish another of its largest authority files as linkeddata: LC Classification. However, whereas the source records for Subject He...
متن کامل