A corpus-based investigation of junk emails
نویسندگان
چکیده
Almost everyone who has an email account receives from time to time unwanted emails. These emails can be jokes from friends or commercial product offers from unknown people. In this paper we focus on these unwanted messages which try to promote a product or service, or to offer some “hot” business opportunities. These messages are called junk emails. Several methods to filter junk emails were proposed, but none considers the linguistic characteristics of junk emails. In this paper, we investigate the linguistic features of a corpus of junk emails, and try to decide if they constitute a distinct genre. Our corpus of junk emails was build from the messages received by the authors over a period of time. Initially, the corpus consisted of 1563, but after eliminating the duplications automatically we kept only 673 files, totalising just over 373,000 tokens. In order to decide if the junk emails constitute a different genre, a comparison with a corpus of leaflets extracted from BNC and with the whole BNC corpus is carried out. Several characteristics at the lexical and grammatical levels were identified.
منابع مشابه
A Survey on Various Classifiers Detecting Gratuitous Email Spamming
Email becomes the major source of communication these days. Most humans on the earth use email for their personal or professional use. Email is an effective, faster and cheaper way of communication. The importance and usage for the email is growing day by day. It provides a way to easily transfer information globally with the help of internet. Due to it the email spamming is increasing day by d...
متن کاملThe Effect of CMC in Business Emails in Lingua Franca: Discourse Features and Misunderstandings
The paper argues that everyday exchange of business emails produces a development in the work-group relationship, which, in turn, makes new communication styles possible and acceptable by the users' habit to computer-mediated forms, even in unbalanced professional exchanges. The focus is on the (spoken) discourse features of email messages in a self-compiled corpus of selected computer-mediated...
متن کاملContext Awareness Information Sharing Service based on Location-based Communication Policy
Since more networking technologies and communication channels have been developed, Internet users have moved their roles from being strictly information consumers to both information consumers and producers. Moreover, users on nets might also be intermediaries, inter-actors, interferences or message providers, even service providers. Information on the Internet is dramatically growing. On one h...
متن کاملContent-aware Email Multiclass Classification Categorize Emails According to Senders
Categorize Emails According to Senders Liwei Wang, Li Du s Abstract People nowadays are overwhelmed by tons of coming emails everyday at work or in their daily life. The large quantities of emails keep causing confusions. Not only spam emails are considered to be ‘junk’, but also unwanted emails (e.g. advertisements) cause people to waste time on reading them. Therefore, it becomes urgent to de...
متن کاملApplication Planned Behavior in theory Predicting Junk Food Consumption among Female Students
Background and Objectives: There are high tendency among adolescence for consuming junk foods. The aim of this study was to predict the junk food consumption based on the theory of planned behavior among female student in Kermanshah, 2011. Material and Methods: In this descriptive-analytical study, 207 female students studying in middle schools of Kermanshah were selected using multi stage samp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002