Tibetan Word Segmentation as Sub-syllable Tagging with Syllable's Part-of-Speech Property

نویسندگان

  • Huidan Liu
  • Congjun Long
  • Minghua Nuo
  • Jian Wu
چکیده

When Tibetan word segmentation task is taken as a sequence labelling problem, machine learning models such as ME and CRFs can be used to train the segmenter. The performance of the segmenter is related to many factors. In the paper, three factors, namely strategy on abbreviated syllables, tag set, and the syllable’s Part-Of-Speech property, are compared. Experiment data show that: first, if each abbreviate syllable is separated into two units for labelling rather than one, the F-measure improves 0.06% and 0.10% on 4-tag set and 6-tag set respectively. Second, if 6-tag set is used rather than 4-tag set, the F-measure improves 0.10% and 0.14% on the two strategies on abbreviated syllables respectively. Third, when the syllable’s Part-Of-Speech property is take into account, F-measure improves 0.47% and 0.41% respectively than the other two methods without using it on 4-tag set, while it improves 0.45% and 0.35% on 6-tag set, which is much more higher than the former improvements. So it’s a better choice to take advantage of the syllable’s Part-Of-Speech property information while using the sub-syllable as the tag unit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word segmentation in Persian continuous speech using F0 contour

Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...

متن کامل

Tibetan Word Segmentation as Syllable Tagging Using Conditional Random Field

In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is gener...

متن کامل

NING MA et al: FUSION OF WORD CLUSTERING FEATURES FOR TIBETAN PART OF SPEECH TAGGING

Tibetan Part of Speech (POS) tagging, the foundation of Tibetan natural language processing, judges word classification according to contextual information of words. Based on the framework of the maximum entropy model, the paper studied the fusion of morphological features for Tibetan part of speech with maximum entropy model with the integration of word clustering features. Experimental result...

متن کامل

From Word Segmentation to POS Tagging for Vietnamese

This paper presents an empirical comparison of two strategies for Vietnamese Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy where we consider the output of a word segmenter as the input of a POS tagger, and (ii) a joint strategy where we predict a combined segmentation and POS tag for each syllable. We also make a comparison between state-of-the-art (SOTA) featureba...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015