Retrospective Document Conversion Application to the Library Domain
نویسنده
چکیده
This paper describes a framework for retrospective document conversion in the library domain. Drawing on the experience and insight gained from projects launched over the present decade by the European Commission, it outlines the requirements for solving the problem of retroconversion and traces the main phases of associated processing. To highlight the main problems encountered in this area, the paper also outlines studies conducted by our group in the more project for the retroconversion of old catalogues belonging to two diierent Libraries : National French Library and Royal Belgian Library. F or the French Library, the idea was to study the feasibility of a recognition approach a voiding the use of ocr and basing the strategy mainly on visual features. The challenge was to recognize a logical structure from its physical aspects. The modest results obtained from experiments for this rst study led us, in the second study, to base the structural recognition methodology more on the logical aspects by focussing the analysis on the content. Furthermore, for the Bel-gian references, the aim was to convert reference catalogues into a more conventional unimarc format while respecting the industrial constraints.Without manual intervention , 75% rate of correct recognition was obtained on 11 catalogues containing about 4548 references. The success of library automation,resulting in user-friendly on-line catalogues 1 integrated with the web and other circulation-systems facilities, has created an urgent need for retroconversion of the older parts of catalogues 2,14,28,31]. As users get familiar with the new catalogue medium, the documents not registered in machine readable form become \invisible" and unreadable. 1 A catalogue is a list of bibliographic descriptions of works. This has meant for many libraries the relegation of an important part of their rich s t o c k of documents to a state of inaccessibility. Such o b vious waste of library collections in addition to the cost diierence between manual handling and an equivalent set of automatic routines has made a strong case for the need to convert a library's entire collection of works to machine-readable records, in the interest of ensuring an eecient use of the investment in the new technology. This has led to the search for cost-eeective tools for the conversion of old catalogues into machine-readable forms. This search has not been limited to the sole problem of conversion but has been extended to embracing other objectives such as ensuring very high rates of …
منابع مشابه
International Journal on Document Analysis and Recognition Manuscript No. Retrospective Document Conversion Application to the Library Domain
This paper describes a framework for retrospective document conversion in the library domain. Drawing on the experience and insight gained from projects launched over the present decade by the European Commission, it outlines the requirements for solving the problem of retroconversion and traces the main phases of associated processing. To highlight the main problems encountered in this area, t...
متن کاملRetrospective Conversion of Old Bibliographic Catalogues
This paper describes a framework for retrospective document conversion in the library domain. Drawing on the experience and insight gained from the more project launched over the present decade by the European Commission, it outlines the requirements for solving the problem of retroconversion of old catalogues in unimarc format. Based on ocr technique and automatic structure recognition, the sy...
متن کاملSampling Rate Conversion in the Discrete Linear Canonical Transform Domain
Sampling rate conversion (SRC) is one of important issues in modern sampling theory. It can be realized by up-sampling, filtering, and down-sampling operations, which need large complexity. Although some efficient algorithms have been presented to do the sampling rate conversion, they all need to compute the N-point original signal to obtain the up-sampling or the down-sampling signal in the tim...
متن کاملA Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives
Mass digitization of document collections with further processing and semantic annotation is an increasing activity among libraries and archives at large for preservation, browsing and navigation, and search purposes. In this paper we propose a software architecture for the process of converting high volumes of document collections to semantically annotated digital libraries. The proposed archi...
متن کاملStructural Classification for Retrospective Conversion of Documents
This paper describes the structural classification method used in a strategy for retrospective conversion of documents. This strategy consists in an cycle in which document analysis and document understanding interact. This cycle is initialized by the extraction of the outline of the layout and logical structures of the document. Then, each iteration of the cycle consists in the detection and t...
متن کامل