Corefrence resolution with deep learning in the Persian Labnguage

نویسندگان

چکیده مقاله:

Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one of the challenges in this field is the low accuracy in detecting name entitieschr('39') reference, which detection process has been named as coreference resolution. Coreference resolution is finding all expressions that refer to a name entity, and two expressions are coreference together when these expressions located in the same coreference cluster.      Coreference resolution could be used in many natural language processing tasks such as question answering, text summarization, machine translation, information extraction, etc. Coreference resolution methods are into two main categories; machine learning and rule-based approaches. In the rule-based approaches for detecting coreferences, a set of rich rule ordinary which written by a specialist is execued. These methods are quick, but these are language-dependent and necessary written to each language firstly again by a specialist. The machine learning method divides into supervised and unsupervised methods, in a supervised approach, it is require to have data labeled by a specialist. Coreference resolution included three main phases: named entities recognition, features extraction of name entities, and analyzes the coreferences, in which the primary phase is feature extraction. After corpus creation, name entities should be recognized in the corpus. This step depends on a corpus, in some corpora entities named as golden data, in this paper, we used RCDAT corpus, which determined name entities itself. After the name entities recognition phase, the mention pairs are determined, and the features are extracted. The proposed method uses two categories of the features: the first is word embedding vector, the second is handcrafted features, which are the distance between the mentions, head matching, gender matching, etc. This paper used a deep neural network to train the features extracted, in the analyze coreferences phase a Feed Forward Neural Network (FFNN) is trained by the candidate mention pairs (extracted features from them) and their labels (coreference / non-coreference or 1/0) so that the trained FFNN assigns a probability (between 0 and 1) to any given mention pair. Then used the graph technique with a threshold level to determine different or compatible name entities in the coreference resolution cluster.  This step creates the graph by using the extracted mention pairs from the previous step. In this graph, nodes are the mention pairs that are clustered by using the agglomerative hierarchical clustering algorithm inorder to locate similar mention pairs in a group. The resulting clusters are considered as coreference resolution chains. In this paper, RCDAT Persian language corpus is used for training the proposed coreference resolution approach and for testing the Uppsala Persian language dataset which is used and in the calculation of the accurate of system, different tools have been taken for features extraction which each of them effects on the accuracy of the whole system. The corpora, tools, and methods used in the system are standard. They are quite comparable to the ACE and Ontonotes corpora and tools used at the same time in the coreference resolution algorithm.  The results of the improvements proposed method (F1 = 62.09) is expressed in the text of the paper.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

the relationship of wtc with communication apprehension and self-perceived communication competene in english and persian context

بیشتر تحقیقات پیشین در زمینه تمایل به برقراری ارتباط به رابطه آن با عوامل فردی چون سن، جنس، نوع شخصیت و... صورت گرفته است. در صورتی که مطالعات کمتری به بررسی رابطه تمایل به برقراری ارتباط زبان آموزان فارسی زبان با ترس از برقراری ارتباط و توانش خود ادراکانه آنها در برقراری ارتباط در محیط فارسی و انگلیسی انجام شده است. بر اساس نظریه الیس (2008) تمایل به برقراری ارتباط جایگاه مهمی در زمینه آموزش م...

15 صفحه اول

Super-Resolution via Deep Learning

The recent phenomenal interest in convolutional neural networks (CNNs) must have made it inevitable for the super-resolution (SR) community to explore its potential. The response has been immense and in the last three years, since the advent of the pioneering work, there appeared too many works not to warrant a comprehensive survey. This paper surveys the SR literature in the context of deep le...

متن کامل

A Deep Learning Approach to Persian Plagiarism Detection

Plagiarism detection is defined as automatic identification of reused text materials. General availability of the internet and easy access to textual information enhances the need for automated plagiarism detection. In this regard, different algorithms have been proposed to perform the task of plagiarism detection in text documents. Due to drawbacks and inefficiency of traditional methods and l...

متن کامل

Melanoma detection with a deep learning model

Background: Skin cancer is one of the most common forms of cancer in the world and melanoma is the deadliest type of skin cancer. Both melanoma and melanocytic nevi begin in melanocytes (cells that produce melanin). However, melanocytic nevi are benign whereas melanoma is malignant. This work proposes a deep learning model for classification of these two lesions.    Methods: In this analytic s...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 17  شماره 2

صفحات  138- 121

تاریخ انتشار 2020-09

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023