Annotating Subsets of the Enron Email Corpus

نویسندگان

  • Jade Goldstein-Stewart
  • Andres Kwasinksi
  • Paul Kingsbury
  • Roberta Evans Sabin
  • Albert McDowell
چکیده

We present an annotation project for two subsets of the Enron email corpus. The first is a subset of the UC Berkeley Enron Email Analysis Project and the second consists of a portion of emails from the Voice Transcripts Email Correlated Corpora. Parts of the automatic content extraction (ACE) annotation guidelines, extended for the email domain are used for annotation. We also categorize the emails with email speech acts, mark whether the text contains discussions of meetings/conversations, and determine the degree of correlation of the subject line with the text body.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating the Enron Email Corpus with Number Senses

The Enron Email Corpus provides “Real World” text in the business email domain, which is a target domain for many speech and language applications. We present a section of this corpus annotated with number senses labelling each number as a date, time, year, telephone number etc. We show that sense categories and their frequencies are very different in this domain than in newswire text. The anno...

متن کامل

Network Analysis with the Enron Email Corpus

We use the Enron email corpus to study relationships in a network by applying six different measures of centrality. Our results came out of an in-semester undergraduate research seminar. The Enron corpus is well suited to statistical analyses at all levels of undergraduate education. Through this article’s focus on centrality, students can explore the dependence of statistical models on initial...

متن کامل

Email Formality in the Workplace: A Case Study on the Enron Corpus

Email is an important way of communication in our daily life and it has become the subject of various NLP and social studies. In this paper, we focus on email formality and explore the factors that could affect the sender’s choice of formality. As a case study, we use the Enron email corpus to test how formality is affected by social distance, relative power, and the weight of imposition, as de...

متن کامل

Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora

In this paper, we present an empirical study of email classification into two main categories “Business” and “Personal”. We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For th...

متن کامل

Enron Emails as Graph Data Corpus for Large-scale Graph Querying Experimentation

In this paper we describe Enron email corpus in graph/network data format. Nodes of the graph are emails connected with named entities (NE) extracted from text like people, email addresses, telephone numbers. Edges are links between NE representing concurrence in same email part, paragraph, sentence or composite NE. Enron Graph corpus contains a few millions of nodes and it is quite large corpu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006