نتایج جستجو برای: text database
تعداد نتایج: 420490 فیلتر نتایج به سال:
The specific complexity of textual data sets (free answers in surveys, documentary data bases, etc.) is emphasized. Recent trends of research show that classification techniques (discrimination and unsupervised clustering as well) are widely used and have great potential in both Information Retrieval and Text Mining.
Out of dissatisfaction with currently available software, we built a music management program for large digital music libraries. We propose a system that solves some of the difficulties of organizing these libraries. The system introduces a new visual interaction style with the user’s music collection. A physics system coupled to album’s genre information allows the user to spatially order his ...
In this paper, we present the experiments we made to recover the original page layout structure into two columns from layout damaged digitized files. We designed several CRF-based approaches, either to identify column separator or to classify each token from each line into left or right columns. We achieved our best results with a model trained on homogeneous corpora (only files composed of 2 c...
Structured text is a general concept that is implicit in a variety of approaches to handling information. Syntactically, an item of structured text is a number of grammatically simple phrases together with a semantic label for each phrase. Items of structured text may be nested within larger items of structured text. Much information is potentially available as structured text including tagged ...
In this paper we describe a flexible and portable infrastructure for setting up large monolingual language corpora. The approach is based on collecting a large amount of monolingual text from various sources. The input data is processed on the basis of a sentencebased text segmentation algorithm. We describe the entry structure of the corpus database as well as various query types and tools for...
Various efforts have been made for the development of tools and methods dedicated to the automatic processing of multilingual terminology databases. For that purpose, multilingual parallel corpora have been used as a basis resource. However, most of the neologisms in technical and scientific domains are realised by multiword terms that are rarely identified in parallel corpora. In this paper, w...
word spotting is a technique which can extract the text from input image. Here, we implemented on scanned Tamil land documents. Using Gabor feature, we extract the feature values for the input image. The main goal is recognize the text from the document using K nearest neighbor classifier. The features were calculated and the features were combined. Using these features, we can classify and rec...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید