Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks
نویسندگان
چکیده
We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with N independent words as vertices or nodes, and edges or links allotted to words occurring within m places of a given vertex; we call m the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance m. The literature samples were found to be most effectively represented by their corresponding power laws over degree distribution P (k) and clustering coefficient C(k); we also studied the mean geodesic distance, and found all our texts were small-world networks. We observed a natural break-point at k = √ N where the power law in the degree distribution changed, leading to separate power law fit for the bulk and the tail of P (k). Our linear discriminant analysis yielded a 73.8± 5.15% accuracy for the correct classification of novels and 69.1 ± 1.22% for news stories. We found an optimal word distance of m = 4 and a minimum text length of 100 to 200 words N .
منابع مشابه
Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps
Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...
متن کاملThe System of Engagement in a Sample of Prose Fiction and the News
Emerging within Systemic Linguistics, Appraisal/Evaluation is a framework for analyzing the language of evaluation, providing techniques for the systematic analysis of evaluation and stance as they operate in whole texts and in groupings of texts. There are three systems in the Appraisal framework: Attitude, Engagement, and Graduation. This study sets out to analyze the use of the system of Eng...
متن کاملDistinguishing between Positive and Negative Opinions with Complex Network Features
Topological and dynamic features of complex networks have proven to be suitable for capturing text characteristics in recent years, with various applications in natural language processing. In this article we show that texts with positive and negative opinions can be distinguished from each other when represented as complex networks. The distinction was possible by obtaining several metrics of ...
متن کاملLocal Derivative Pattern with Smart Thresholding: Local Composition Derivative Pattern for Palmprint Matching
Palmprint recognition is a new biometrics system based on physiological characteristics of the palmprint, which includes rich, stable, and unique features such as lines, points, and texture. Texture is one of the most important features extracted from low resolution images. In this paper, a new local descriptor, Local Composition Derivative Pattern (LCDP) is proposed to extract smartly stronger...
متن کاملPattern Recognition in Control Chart Using Neural Network based on a New Statistical Feature
Today for the expedition of the identification and timely correction of process deviations, it is necessary to use advanced techniques to minimize the costs of production of defective products. In this way control charts as one of the important tools for the statistical process control in combination with modern tools such as artificial neural networks have been used. The artificial neural netw...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1007.3254 شماره
صفحات -
تاریخ انتشار 2010