Information extraction for classified advertisements

نویسندگان

  • David FAURE
  • Claire MORLON
چکیده

1. Abstract This report describes our work on the project part of the course Language Processing and Computational Linguistics. It presents a java written program that extracts six important pieces of information from French job advertisements. The inputs of the system are advertisements taken from the internet and converted as text files. The results presented in this paper show that the extraction mechanism is reliable and robust.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Unsupervised Learning of Field Structure Models for Information Extraction

The applicability of current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain field structured extraction tasks, small amounts of prior knowledge can be used to effectively learn models in a primarily unsupervised fashion. Many text information sources exhibit a latent field structure: such documents can be viewed as...

متن کامل

A Content Analysis of Health and Safety Communications Among Internet-Based Sex Work Advertisements: Important Information for Public Health

BACKGROUND The capacity to advertise via the Internet continues to contribute to the shifting dynamics in adult commercial sex work. eHealth interventions have shown promise to promote Internet-based sex workers' health and safety internationally, yet minimal attention has been paid in Canada to developing such interventions. Understanding the information communicated in Internet-based sex work...

متن کامل

The Hidden Dynamics of Print-Online Competition in Classified Advertising Markets

Classified advertisements are an important revenue source for newspaper publishers and they constitute a large share of noneditorial content, highly valued by readers. By matching supply and demand in the corresponding markets for goods and services, publishers of classified advertisements serve as information brokers increasing transparency and driving market clearance. These days specific onl...

متن کامل

Discovering Fraud in Online Classified Ads

Classified ad sites routinely process hundreds of thousands to millions of posted ads, and only a small percentage of those may be fraudulent. Online scammers often go through a great amount of effort to make their listings look legitimate. Examples include copying existing advertisements from other services, tunneling through local proxies, and even paying for extra services using stolen accou...

متن کامل

Using Information Extraction to Classify Newspapers Advertisements

This paper presents a text classification procedure that has been developed in the context of an information extraction project. In the prototype that has been developed for this project, newspaper advertisements are processed by three main modules: first of all, a classification module associates a category to the advertisement. Then, a tagging module identifies textual information units that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005