ResToRinG CaPitaLiZaTion in #TweeTs

نویسندگان

  • Kamel Nebhi
  • Kalina Bontcheva
  • Genevieve Gorrell
چکیده

The rapid proliferation of microblogs such as Twitter has resulted in a vast quantity of written text becoming available that contains interesting information for NLP tasks. However, the noise level in tweets is so high that standard NLP tools perform poorly. In this paper, we present a statistical truecaser for tweets using a 3-gram language model built with truecased newswire texts and tweets. Our truecasing method shows an improvement in named entity recognition and part-of-speech tagging tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Innovative Instruments and Legal Mechanisms of Bank Capitalization: National Features and World Trends

The article investigates the role of bank capital in financing reproduction processes and increase of economic growth indicators, considers the essence of the concept of “bank capitalization”, determines the structure and main sources of bank capital increase. Taking into account the fact that Ukraine and Poland have similar vectors of economic development, the current practices of increasing t...

متن کامل

NER from Tweets: SRI-JU System @MSM 2013

Now a day Twitter has become an interesting source of experiment for different NLP experiments like entity extraction, user opinion analysis and more. Due to the noisy nature of user generated content it is hard to run standard NLP tools to obtain a better result. The task of named entity extraction from tweets is one of them. Traditional NER approaches on tweets do not perform well. Tweets are...

متن کامل

Experiments to Improve Named Entity Recognition on Turkish Tweets

Social media texts are significant information sources for several application areas including trend analysis, event monitoring, and opinion mining. Unfortunately, existing solutions for tasks such as named entity recognition that perform well on formal texts usually perform poorly when applied to social media texts. In this paper, we report on experiments that have the purpose of improving nam...

متن کامل

Non-lexical Features Encode Political Affiliation on Twitter

Previous work on classifying Twitter users’ political alignment has mainly focused on lexical and social network features. This study provides evidence that political affiliation is also reflected in features which have been previously overlooked: users’ discourse patterns (proportion of Tweets that are retweets or replies) and their rate of use of capitalization and punctuation. We find robust...

متن کامل

The Impact of Corporate income Tax and Firm Size on Fixed Investment

This paper is an attempt to analyze the impact of income taxes and market capitalization on fixed investment (investment in tangible assets) by manufacturing companies listed on KSE. This paper basically examines that how corporate income taxes affect fixed investment by reducing cash flow available for a firm to invest and how the firm size in the lights of market capitalization affects fixed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015