Voting experts: An unsupervised algorithm for segmenting sequences
نویسندگان
چکیده
We describe a statistical signature of chunks and an algorithm for finding chunks. While there is no formal definition of chunks, they may be reliably identified as configurations with low internal entropy or unpredictability and high entropy at their boundaries. We show that the log frequency of a chunk is a measure of its internal entropy. The Voting-Experts exploits the signature of chunks to find word boundaries in text from four languages and episode boundaries in the activities of a mobile robot.
منابع مشابه
Hierarchical Voting Experts: An Unsupervised Algorithm for Segmenting Hierarchically Structured Sequences
This paper extends the Voting Experts (VE) algorithm (Cohen, Adams, & Heeringa 2007) to segment hierarchically structured sequences. The original algorithm was tested on text segmentation, and made use of two proposed characteristics of chunks, namely low internal entropy and high boundary entropy of segments. VE looks for these two properties, and uses them to segment sequences of tokens. It i...
متن کاملAn Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes
This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The VOTINGEXPERTS algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four languages...
متن کاملBootstrap Voting Experts
BOOTSTRAP VOTING EXPERTS (BVE) is an extension to the VOTING EXPERTS algorithm for unsupervised chunking of sequences. BVE generates a series of segmentations, each of which incorporates knowledge gained from the previous segmentation. We show that this method of bootstrapping improves the performance of VOTING EXPERTS in a variety of unsupervised word segmentation scenarios, and generally impr...
متن کاملLayered Mereotopology
BOOTSTRAP VOTING EXPERTS (BVE) is an extension to the VOTING EXPERTS algorithm for unsupervised chunking of sequences. BVE generates a series of segmentations, each of which incorporates knowledge gained from the previous segmentation. We show that this method of bootstrapping improves the performance of VOTING EXPERTS in a variety of unsupervised word segmentation scenarios, and generally impr...
متن کاملAn Unsupervised Algorithm for Finding Episode Boundaries
This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The VotingExperts algorithm rst collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two \expert methods" decide where in the window boundaries should be drawn. The algorithm segments text into words successfully in four languages. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Intell. Data Anal.
دوره 11 شماره
صفحات -
تاریخ انتشار 2007