Learning Local Content Shift Detectors from Document-level Information
نویسنده
چکیده
Information-oriented document labeling is a special document multi-labeling task where the target labels refer to a specific information instead of the topic of the whole document. These kind of tasks are usually solved by looking up indicator phrases and analyzing their local context to filter false positive matches. Here, we introduce an approach for machine learning local content shifters which detects irrelevant local contexts using just the original document-level training labels. We handle content shifters in general, instead of learning a particular language phenomenon detector (e.g. negation or hedging) and form a single system for document labeling and content shift detection. Our empirical results achieved 24% error reduction – compared to supervised baseline methods – on three document labeling tasks.
منابع مشابه
An Immune-based Approach to Document Classification
artificial immune system, document classification, machine learning, concept learning, coevolution The human immune system as a biological complex adaptive system has provided inspiration for a range of innovative problem solving techniques in areas such as computer security, knowledge management and information retrieval. In this paper the construction and performance of a novel immune-based l...
متن کاملHebbian learning and competition in the neural abstraction pyramid
The recently introduced Neural Abstraction Pyramid is a hierarchical neural architecture for image interpretation that is inspired by the principles of information processing found in the visual cortex. In this paper we present an unsupervised learning algorithm for it’s connectivity based on Hebbian weight updates and competition. The algorithm yields a sequence of feature detectors that produ...
متن کاملA Light-weight Relevance Feedback Solution for Large Scale Content-Based Video Retrieval
This paper addresses the problem of large scale content-based video retrieval with relevance feedback. We analyze the common methods which leverage local feature detectors to extract feature descriptors from video collections and perform multi-level matching after indexing and retrieval of feature vectors. Instead of learning similarity-preserving codes, we introduce the relevance feedback appr...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملDevising ethical codes for e-contents in e-learning
Background: Promoting ethics is one of the goals of education, but the free flow of communication and divulging unethical behaviors in e-learning make the urgent need to clarify ethical values. Therefore, the aim of this study was to prepare ethical codes to develop and deliver e-contents. Methods: A draft of e-content ethical codes was prepared based on the literature review. Then, it was ...
متن کامل