منابع مشابه
Intelligent Wrapping from PDF Documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. The semi-structured form of web pages, coupled with the availability of business-relevant data, has led to the availability of several established products on the market for wrapping data from the Web. One such approach is the Lixto me...
متن کاملLayout and Content Extraction for PDF Documents
Portable document format (PDF) is a common output format for electronic documents. Most PDF documents are untagged and do not have basic high-level document logical structural information, which makes the reuse or modification of the documents difficult. We developed techniques that identified logical components on a PDF document page. The outlines, style attributes and the contents of the logi...
متن کاملC:\Documents and Settings\dept .PDF
A semi-empirical method is used to characterize the 3s23p2–3s3p3 J = 2 transition array in P II. In this method Slater, spin–orbit, and radial parameters are fitted to experimental energy levels to obtain a description of the array in terms of LS-coupling basis vectors. The intermediate coupling (IC) and configuration interaction (CI) amplitudes so obtained are then used to predict the branchin...
متن کاملAutomatic indexing of PDF documents with ontologies
Indexing large bodies of data is necessary to enable satisfactory search results. Ontologies serve as fixed vocabularies to index data from different viewpoints. We describe how AIDAS, a software tool, automatically divides the source data (PDF documents) into reusable chunks, how it automatically indexes these chunks and stores them in a database to enable reuse.
متن کاملOptimizing PDF output size of TEX documents
There are several tools for generating PDF output from a TEX document. By choosing the appropriate tools and configuring them properly, it is possible to reduce the PDF output size by a factor of 3 or even more, thus reducing document download times, hosting and archiving costs. We enumerate the most common tools, and show how to configure them to reduce the size of text, fonts, images and cros...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Stata Journal: Promoting communications on statistics and Stata
سال: 2014
ISSN: 1536-867X,1536-8734
DOI: 10.1177/1536867x1401400108