Extracting Text from PostScript

نویسندگان

  • Craig G. Nevill-Manning
  • Todd R. Reed
  • Ian H. Witten
چکیده

Finding structure in multiple streams of datais an important problem. Consider thestreams of data flowing from a robot’s sen-sors, the monitors in an intensive care unit,or periodic measurements of various indica-tors of the health of the economy. Thereis clearly utility in determining how currentand past values in those streams are relatedto future values. We formulate the prob-lem of finding structure in multiple streamsof categorical data as search over the spaceof dependenceies, unexpectedly frequent orInternal data inparenthesesWord fragments No spaces Figure 2 A PostScript document and the text extracted from it a /show { print } defFindingstructureinmultiplestreamsofdataisanimportantproblem.Considerthestreamsofdata§owingfromarobot'ssensors,themonitorsinanintensivecareunit,orperiodicmeasurementsofvariousindicatorsofthehealthoftheeconomy.Thereisclearlyutilityindetermininghowcurrentandpastvaluesinthosestreamsarerelatedtofuturevalues b /show { print ( ) print } def Finding structure in m ultiple streams of data is an importan t problem. Consider the streams of data §o wingfrom a rob ot's sensors, the monitors in an in tensiv ecare unit, or p erio dic measuremen ts of v ariousindicators of the health of the econom y . There isclearly utilit y in determining ho w curren t and past values in those streams are related to future v alues

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi automatic indexing of PostScript files using Medical Text Indexer in medical education.

At Albert Einstein College of Medicine a large part of online lecture materials contain PostScript files. As the collection grows it becomes essential to create a digital library to have easy access to relevant sections of the lecture material that is full-text indexed; to create this index it is necessary to extract all the text from the document files that constitute the originals of the lect...

متن کامل

Semi-Structured File Analysis for Information Integration

This paper describes a PostScript file analyzer for extracting information from Web PostScript documents. Our motivation for studying this problem is the building of an informationintegration system. The information extracted from these semi-structured files can be used to model the contents of Web information sources and to define semantic links between items of information. Extracted informat...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Better PostScript than PostScript: portable self-extracting PostScript representation of scanned document images

We present a Pattern Matching Based Compression (PMBC) system which compresses scanned documents into PostScript format. The output of a PMBC system is a pattern library, or font, and a series of pattern indices and positions. PMBC represents scanned documents in the same way that word processing programs represent their output pages. We explore various PostScript representations of this output...

متن کامل

Building a Public Digital Library Based on Full-Text Retrieval

Digital libraries are expensive to create and maintain, and generally restricted to a particular corporation or group of paying subscribers. While many indexes to the World Wide Web are freely available, the quality of what is indexed is extremely uneven. The digital analog of a public library—a reliable, quality, community service—has yet to appear. This paper demonstrates the feasibility of a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 28  شماره 

صفحات  -

تاریخ انتشار 1998