Assigning Library of Congress Classification Codes to Books Based Only on their Titles
نویسندگان
چکیده
Many publishers follow the Library of Congress Classification (LCC) scheme to indicate a classification code on the first pages of their books. This is useful for many libraries worldwide because it makes possible to search and retrieve books by content type, and this scheme has become a de facto standard. However, not every book has been pre-classified by the publisher; in particular, in many universities, new dissertations have to be classified manually. Although there are many systems available for automatic text classification, all of them use extensive information which is not always available, such as the index, abstract, or even the whole content of the work. In this work, we present our experiments on supervised classification of books by using only their title, which would allow massive automatic indexing. We propose a new text comparison measure, which mixes two well-known text classification techniques: the Lesk voting scheme and the Term Frequency (TF). In addition, we experiment with different weighing as well as logical-combinatorial methods such as ALVOT in order to determine the contribution of the title in the correct classification. We found this contribution to be approximately one third, as we correctly classified 36% (on average by each branch) of 122,431 previously unseen titles (in total) upon training with 489,726 samples (in total) of one major branch (Q) of the LCC catalogue.
منابع مشابه
The Utility of Information Extraction in the Classification of Books
We describe work on automatically assigning classification labels to books using the Library of Congress Classification scheme. This task is non-trivial due to the volume and variety of books that exist. We explore the utility of Information Extraction (IE) techniques within this text categorisation (TC) task, automatically extracting structured information from the full text of books. Experime...
متن کاملAssigning Library Classification Numbers to People on the Web
To help users select and understand people during searches for them, we present a method of assigning Nippon Decimal Classification (NDC), which is a system of library classification numbers, to people on the web. By assigning NDC numbers to people, we can assign not only labels to people but also build a NDC-based people-search directory. We use a relative index in NDC, which lists the related...
متن کاملبررسی کتاب های منتشر شده کتابداری و اطلاع رسانی (١۳۵۸-۸۴) و همخوانی آنها با محتوای سرفصل مصوَب دروس
Introduction: It has been five decades after the first official education for librarian science and informatics in Iran. During half century this field has been developed significantly. By comparing the numbers of educational librarianship centers and publications during past decades, and also review programs by approved headlines in this field, it is shown the matter too. This research has bee...
متن کاملTranslation of Psychology Book Titles: A Skopos theory perspective
The focus of current research is the translation of psychology book titles. There are numerous studies in the field of titles translation, but they are restricted in Persian context. The aim of this study is thus to investigate the translation strategies used by Persian translators when transferring English psychology book titles into Persian. To achieve this objective, 245 titles of translated...
متن کاملWe’re E‐Preferred. Why Did We Get That Book in Print?
While California State University, Fullerton’s Pollak Library has an e‐preferred approval plan for all subject areas, the Library still continues to receive a number of print titles on approval. However, 25% of the print approval books received in the 2013–14 fiscal year were published by only eight publishers, all of which actively publish their books in e‐format. This paper investigates the r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Informatica (Slovenia)
دوره 34 شماره
صفحات -
تاریخ انتشار 2010