Automatic Codebook Acquisition
نویسنده
چکیده
concepts include very useful indicators. For example: ’Judicial Power’ yields trial, court, trials, courts, jurisdiction, proceedings, evidence, law, inquiry, adjudication: all in all 10 direct synonyms and 12 indicators in the first 25 words (this is the lowest important nonpolitical actor in figure 2). 6 Summary and Discussion This paper proposes a method to aid the researcher in two steps that are performed in most content analyses: the creation of a the list of categories or entities that the researcher wants to count, and the definition of these terms in the codebook or synonym list. For the first step, our method can suggest terms with a recall of 80%-90%, with the higher figure especially for concrete political actors. Precision of the method is lower with an upper bound of around 50%. This means that the method is not suited as an automatic method to generate entities since too much noise would be contained in the result. On the other hand, the high recall means that the method can be very useful to help the researcher prevent errors of ommission. For the second step, our method can suggest lists of candidate-synonyms. In the first 25 candidates, there are on average 1.5 direct synonyms and 2.5 words that indicate the presence of the entity. In contrast to the first step, performance is best on general or abstract terms, although this might be because there simply are not many synonyms for a person.
منابع مشابه
Using Vector Quantization for Universal Background Model in Automatic Speaker Verification
We aim to describe different approaches for vector quantization in Automatic Speaker Verification. We designed our novel architecture based on multiples codebook representing the speakers and the impostor model called universal background model and compared it to another vector quantization approach used for reducing training data. We compared our scheme with the baseline system, Gaussian Mixtu...
متن کاملImproving automatic writer identification
State-of-the-art systems for automatic writer identification from handwritten text are based on two approaches: a statistical approach or a model-based approach. Both approaches have limitations. The main limitation of the statistical approach is that it relies on single-scale statistical features. The main limitation of the model-based approach is that the codebook generation is time-consuming...
متن کاملA Simple Algorithm for Ordering and Compression of Vector Codebooks
The problem of storage or transmission of codevectors is an essential issue in vector quantization with custom codebook. The proposed technique for compression of codebooks relies on structuring and ordering properties of a binary split algorithm used for codebook design. A simple algorithm is presented for automatic ordering of the codebook entries in order to group similar codevectors. This s...
متن کاملDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
The results of a case study carried out while developing an automatic speaker recognition system are presented in this paper. The Vector Quantization (VQ) approach is used for mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword. The collection of all codewords is called a co...
متن کاملA Comparison of Vector Quantization Codebook Generation Algorithms Applied to Automatic Face Recognition
Automatic facial recognition is an attractive solution to the problem of computerised personal identification. In order to facilitate a cost effective solution, high levels of data reduction are required when storing the facial information. Vector Quantization has previously been used as a data reduction technique for the encoding of facial images. This paper identifies the fundamental importan...
متن کامل