Topic Segmentation: Application of Mathematical Morphology to Textual Data
نویسندگان
چکیده
Mathematical Morphology (MM) offers a generic theoretical framework for data processing and analysis. Nevertheless, it remains essentially used in the context of image analysis and processing, and the attempts to use MM on other kinds of data are still quite rare. We believe MM can provide relevant solutions for data analysis and processing in a far broader range of application fields. To illustrate, we focus here on textual data and we show how morphological operators (here the morphological segmentation using watershed transform) may be applied on these data. We thus provide an original MM-based solution to the thematic segmentation problem, which is a typical problem in the fields of natural language processing and information retrieval (IR). More precisely, we consider here TV broadcasts through their transcription obtained by automatic speech recognition. To perform topic segmentation, we compute the similarity between successive segments using a technique called vectorization which has recently been introduced in the IR field. We then apply a gradient operator to build a topographic surface to be segmented using the watershed transform. This new topic segmentation technique is evaluated on two corpora of TV broadcasts on which it outperforms other existing approaches. Despite using very common morphological operators (i.e., the standard Watershed Transform), we thus show the potential interest of MM to be applied on non-image data.
منابع مشابه
Application of Topic Segmentation in Audiovisual Information Retrieval
Segmentation into topically coherent segments is one of the crucial points in information retrieval (IR). Suitable segmentation may improve the results of IR system and help users to find relevant passages faster. Segmentation is especially important in audiovisual recordings, in which the navigation is difficult. We present several methods used for topic segmentation, based on textual, audio a...
متن کاملTopic Segmentation of TV-Streams by Mathematical Morphology and Vectorization
A fine-grained segmentation of Radio or TV broadcasts is an essential step for most multimedia processings. Applying segmentation algorithms to the speech transcripts seems straightforward. Yet, most of these algorithms are not suited when dealing with short segments or noisy data. In this paper, we propose a new segmentation technique inspired from the image segmentation field and relying on a...
متن کاملMathematical Morphology based gray scale Image Segmentation using improved watershed transform
-Mathematical Morphology provides systematic approach to analyze geometric Characteristic of signal or images, has been applied to many application such as Edge Detection, Object segmentation, noise suppression. Image segmentation is one of the most important categories of image processing. The watersheds transformation for image segmentation using mathematical morphology is widely used. When w...
متن کاملReview of Application of Mathematical Morphology in Crop Disease Recognition
Mathematical morphology is a non-linear image processing method with twodimensional convolution operation, including binary morphology, gray-level morphology and color morphology. Erosion, dilation, opening operation and closing operation are the basis of mathematical morphology. Mathematical morphology can be used for edge detection, image segmentation, noise elimination, feature extraction an...
متن کاملSegmentation thématique : apport de la vectorisation
This paper deals with topic segmentation of TV broadcasts using their transcription obtained by automatic speech recognition. Topic segmentation has been studied for several years, and most often the techniques proposed rely on information retrieval techniques to compute similarities between segments. In this paper, we propose a new segmentation approach inspired by mathematical morphology stud...
متن کامل