Web Document Analysis: How can Natural Language Processing Help in Determining Correct Content Flow?

نویسندگان

  • Hassan Alam
  • Fuad Rahman
  • Yuliya Tarnikova
چکیده

One of the fundamental questions for document analysis and subsequent automatic re-authoring solutions is the semantic and contextual integrity of the processed document. The problem is particularly severe in web document re-authoring as the segmentation process often creates an array of seemingly unrelated snippets of content without providing any concrete clue to aid the layout analysis process. This paper presents a generic technique based on natural language processing for determining 'semantic relatedness' between segments within a document and applies it to a web page reauthoring problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Information Processing from Document Images

Analysis of document images for information extraction has become very prominent in recent past. Wide variety of information, which has been conventionally stored on paper is now being converted into electronic form for better storage and intelligent processing. This needs processing of documents using image analysis algorithms. Document image analysis differs from the conventional image proces...

متن کامل

nature of information literacy in elementary schools Case study of Persian literature in fourth grade

Background and Aim: Information literacy is a contextual concept that needs to be studied in different contexts like schools. Promoting reading literacy is a core instructional objectives of Persian literature curriculum and also a part of information literacy. Understanding Concept of information literacy helps us to understand information literacy in elementary schools and can implement it in...

متن کامل

Applying Natural Language Generation to Indicative Summarization

The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implement...

متن کامل

Knowledge Extraction for Information Retrieval

Document retrieval is the task of returning relevant textual resources for a given user query. In this paper, we investigate whether the semantic analysis of the query and the documents, obtained exploiting state-of-the-art Natural Language Processing techniques (e.g., Entity Linking, Frame Detection) and Semantic Web resources (e.g., YAGO, DBpedia), can improve the performances of the traditio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003