Statistical Markovian Data Modeling for Natural Language Processing

نویسنده

  • Fawaz S. Al-Anzi
چکیده

Markov chain theory is a popular statistical tool in applied probability that is quite useful in modelling real-world computing applications. Over the past years; there has been grown interest to employ Markov chain theory in statistical learning of temporal (i.e. time series) data. A wide range of applications found to utilize Markov concepts; such applications include computational linguists, image processing, communications, bioinformatics, finance systems, etc .In fact, Markov processes based research applied with great success in many of the most efficient natural language processing (NLP) tools. Hence, this paper explores the Markov chain theory and its extension hidden Markov models (HMM) in (NLP) applications. This paper also presents some aspects related to Markov chains and HMM such as creating transition and observation matrices, calculating data sequence probabilities, extracting the hidden states, and profile

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Statistical Approaches to Natural Language Processing

This survey attempts to catch up with the recent increasing interests in statistical approach to natural language processing based on large corpora. First of all, a historical overview traces back to 1950s when Noam Chomsky proposed his phrase structure transformation grammar and rejected the Markov process natural language modeling. With the development of large corpora and language modeling i...

متن کامل

Adaptive Natural Language Processing

In the past decades of NLP, there has been a steady shift away from rule-based, linguistically motivated modeling towards statistical learning and the induction of unsupervised feature representations. However, natural language components used in today’s NLP pipelines are still static in the sense that their statistical model or rule-base is created once, then subsequently applied without furth...

متن کامل

Trameur: A Framework for Annotated Text Corpora Exploration

Corpus resources with complex linguistic annotations are becoming increasingly important in the work of language specialists. They often need to perform extensive corpus research, including Natural Language Processing (NLP), statistical modelling and data visualisation. Our software system, called Trameur, aims at making these analyses possible within a single graphical user interface. It relie...

متن کامل

Language Modeling Approaches to Information Retrieval

This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal ...

متن کامل

Using Domain-Specific Knowledge to Classify E-negotiations

Texts exchanged in business-related Computer-Mediated Communication, or CMC, differ from texts exchanged in other business situations. CMC data have a high concentration of non-standard textual features. The fast-growing amount of business CMC data offers opportunities for the application of statistical Natural Language Processing and Machine Learning methods, especially for text-classification...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017