Identifying subjective statements in news titles using a personal sense annotation framework
نویسندگان
چکیده
Subjective language contains information about private states. The goal of subjective language identification is to identify that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, "Personal Sense", has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal experience and carries the personal characteristics. In this paper, we investigate how Personal Sense can be harnessed for the purpose of identifying subjectivity in news titles. In the process, we develop a new Personal Sense annotation framework, for annotating and classifying subjectivity, polarity and emotion. The Personal Sense framework yields a high performance in a fine-grained sub-sentence subjectivity classification. Our experiments demonstrate lexico-syntactic features to be useful for the identification of subjectivity indicators and the targets which receive the subjective Personal Sense. Introduction Subjective language is language containing information about private states, i.e. opinions and emotions (Wiebe et al, 2004). The goal of subjective language identification is to identify that a private state is expressed, without going into detail about its polarity or its specific emotion. On one hand, it is a preliminary stage in opinion mining: before identifying an opinion as positive or negative, it is necessary to identify it as an opinion, as opposed to a fact description etc. Furthermore, it may serve as a technique for separating facts from points of view, classifying opinionated text and identifying ideological perspective of the author. Subjectivity has been defined in (Wiebe, 1994) as private states, i.e. "states ... not open to objective observation or verification" (Pang & Lee, 2008). This includes opinions, emotions, moods etc. Research in subjectivity analysis has increased significantly in recent years, due largely to the vast growth and availability of personal texts in the blogosphere. In polarity classification, authorship attribution, background characteristics identification of authors, and in the basic subjectivity identification, various features have been proposed and high results achieved. However, it is widely overlooked that a personal component of meaning Personal Sense (Leontev, 1978) is a very important feature of any subjective text. Moreover, it forms unique idiolect features and reflects personal preferences in text; although in theory its role in subjective language research is obvious, as it is defined as a former of subjective consciousness. Leontev (Leontev, 1978) stated that consciousness is subjective, and defined two types of word-meaning: significance, being the meaning shared by the speakers of a language and representing a part of the objective reality, and Personal Sense, representing subjective characteristics in consciousness, in terms of unique experience of a person. Thus, Personal Sense, serving as a building block for the subjective consciousness, can be harnessed from the writings of bloggers, in order to more accurately deduce information about their opinions, private states and sentiments. By way of illustration, consider the following examples, taken from a debate (“The Green Line”, sourced from bitterlemons.org) about establishing a border between Israel and Palestine The "green line" is invisible, undocumented and unfounded in international law[...] it sets a precedent of substituting principles of international law with agreements signed under duress. (Example 1) Despite these trans-boundary movements, the line remained an important point of separation between the two territories.[...] the green line-with some minor deviationshas the greatest likelihood of constituting the formal international boundary between two independent states. (Example 2) Both pieces contain information about the green line, not serving as a border between two independent states yet and this is where opinions begin the first author believes it to be illegitimate and gives a negative assessment of the possibility of it becoming a formal and legal object. The second author, on the other hand, assesses it positively as one of the formers of two independent states. Both authors describe the same phenomena, but use different words relating to it. The words 'undocumented, invisible, duress' in the first passage and 'important, independent' in the second are the clues that help us detect subjectivity expressed. An automatic subjectivity identification tool uses broadly the same technique: it captures subjective clues in text and relates them to certain objects or topics of discussion. In this paper, we set out to investigate how Personal Sense can be harnessed for the purpose of identifying subjectivity in news titles. We provide an annotation schema for the Personal Sense ‘target’ and ‘indicator’ constructions covering emotion, polarity and subjectivity-objectivity in terms of the Personal Sense. We proceed to analyze the subjectivity-objectivity issue. Assuming that the subjective Personal Sense patterns are constructed in text on a regular basis using lexical and syntactic elements, we perform an experiment on the automatic detection of subjectivity in text, as opposed to objective expressions not containing any subjective emotion. First, we demonstrate that subjective expressions are more accurately described using a combination of lexical and syntactic information than by using lexical means only. Next, we select a number of lexical and syntactic features for the identification of the subjective patterns in text. We apply the Personal Sense technique to pairs of words, at least one of which is a noun: thereby identifying the Personal Sense of the noun in the pair. We argue that the suggested features, including the syntactic path between the words in the pair and lexical information about the Personal Sense indicator word, are useful for the identification of the subjective Personal Sense. We use the resulting subjective and objective word-pairs for the subjectivity classification applying the suggested feature set. Thus we learn to identify automatically the word-pairs, connected by a certain syntactic path, bearing an emotion, as opposed to the pairs that do not bear any emotional content. The results confirm our expectations and demonstrate the lexico-syntactic features to be useful for the identification of subjectivity indicators and the targets which receive the subjective
منابع مشابه
A Model for Identifying and Enhancing the Sense of Place and Collective Memories (Case Study: Dez river)
The lack of sense of belonging to place in urban spaces is one of the problems widely stated. There are objective and subjective factors in space that create sense of place. The combined effect of these factors create meaning for place and endow the environment with identity. This research seeks to find the relationship between the objective and subjective factors of space and the levels of sen...
متن کاملExtracting Common Sense Knowledge from Wikipedia
Much of the natural language text found on the web contains various kinds of generic or “common sense” knowledge, and this information has long been recognized by artificial intelligence as an important supplement to more formal approaches to building Semantic Web knowledge bases. Consequently, we are exploring the possibility of automatically identifying “common sense” statements from unrestri...
متن کاملc○2010 The Association for Computational Linguistics
The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. They require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog – a fineg...
متن کاملStating with Certainty or Stating with Doubt: Intercoder Reliability Results for Manual Annotation of Epistemically Modalized Statements
Texts exhibit subtle yet identifiable modality about writers’ estimation of how true each statement is (e.g., definitely true or somewhat true). This study is an analysis of such explicit certainty and doubt markers in epistemically modalized statements for a written news discourse. The study systematically accounts for five levels of writer’s certainty (ABSOLUTE, HIGH, MODERATE, LOW CERTAINTY ...
متن کاملEmotiBlog: un esquema de anotación detallado para la sujetividad en los nuevos géneros textuales de la Web 2.0 EmotiBlog: a fine-grained annotation schema for labelling subjectivity in the new-textual genres born with the Web 2.0
The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. These applications require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents Emoti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 64 شماره
صفحات -
تاریخ انتشار 2013