Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns

نویسندگان

  • Stefania Degaetano-Ortlieb
  • Elke Teich
چکیده

We present a data-driven approach to investigate intra-textual variation by combining entropy and surprisal. With this approach we detect linguistic variation based on phrasal lexico-grammatical patterns across sections of research articles. Entropy is used to detect patterns typical of specific sections. Surprisal is used to differentiate between more and less informationally-loaded patterns as well as types of information (topical vs. stylistic). While we here focus on research articles in biology/genetics, the methodology is especially interesting for digital humanities scholars, as it can be applied to any text type or domain and combined with additional variables (e.g. time, author or social group) to obtain insights on intra-textual variation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling of Stylistic Variation in Social Media with Stretchy Patterns

In this paper we describe a novel feature discovery technique that can be used to model stylistic variation in sociolects. While structural features offer much in terms of expressive power over simpler features used more frequently in machine learning approaches to modeling linguistic variation, they frequently come at an excessive cost in terms of feature space size expansion. We propose a nov...

متن کامل

Effects of Surprisal and Entropy on Vowel Duration in Japanese.

Research on English and other languages has shown that syllables and words that contain more information tend to be produced with longer duration. This research is evolving into a general thesis that speakers articulate linguistic units with more information more robustly. While this hypothesis seems plausible from the perspective of communicative efficiency, previous support for it has come ma...

متن کامل

Textual Stylistic Variation: Choices, Genres and Individuals

T his chapter argues for more informed target metrics for the statistical processing of stylistic variation in text collections. Much as operationalized relevance proved a useful goal to strive for in information retrieval, research in textual stylistics, whether application oriented or philologically inclined, needs goals formulated in terms of pertinence, relevance, and utility—notions that a...

متن کامل

Multivariate Surprisal Analysis of Gene Expression Levels

We consider here multivariate data which we understand as the problem where each data point i is measured for two or more distinct variables. In a typical situation there are many data points i while the range of the different variables is more limited. If there is only one variable then the data can be arranged as a rectangular matrix where i is the index of the rows while the values of the va...

متن کامل

Towards the detection and description of textual meaning indicators in spontaneous conversations

The description of textual and stylistic features has so far been largely neglected in the empirical study of conversational speech. In this paper we want to make a couple of strong initial points towards the use textual meaning and stylistic features in language engineering: First of all we want to show that there are other besides the traditional features in spontaneous speech that are worth ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017