text database

Join Queries with External Text Sources : Execution and Optimization

1995

Surajit Chaudhuri Umeshwar Dayal Tak W. Yan

Text is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing in a system that loosely integrates an extensible database system and a text retrieval system. We focus on a class of conjunctive queries that include joins between text and structured data, in addition to selections ov...

متن کامل

XQuery 1.0 and XPath 2.0 Full-Text

Journal: :IBM Systems Journal 2006

Pat Case

P. Case Powerful queries of character strings, numbers, dates, and nodes are familiar to users of relational database systems. Full-text database search systems feature queries that (1) use logical, proximity, and starts-with operators, (2) offer user control of case and diacritics, stemming, and wildcards, and (3) support thesauruses, taxonomies, and ontologies. Two emerging standards, XQuery ...

متن کامل

LYSGROUP: Adapting a Spanish microtext normalization system to English

2015

Yerai Doval Jesús Vilares Carlos Gómez-Rodríguez

In this article we describe the microtext normalization system we have used to participate in the Normalization of Noisy Text Task of the ACL W-NUT 2015 Workshop. Our normalization system was originally developed for text mining tasks on Spanish tweets. Our main goals during its development were flexibility, scalability and maintainability, in order to test a wide variety of approximations to t...

متن کامل

Sentiment Analisis on Web-based Reviews using Data Mining and Support Vector Machine

2016

Renato S. C. da Rocha Marco Aurelio Pacheco

This work aims to use sentiment analysis techniques, data mining, text mining and natural language processing to indicate the polarity of texts using support vector machine. Weka software and a movie review database from Internet Movie Database IMDb were used. This work uses preprocessing filters and WRAPPER techniques and Support Vector Machine (SVM) for classification. It presents better resu...

متن کامل

Prototype for Integrating Probabilistic Fact and Text Retrieval

1991

Norbert Fuhr Thorsten Hoffmann

We describe a prototype for an information system that integrates text and fact retrieval. A query is a set of conditions which relate either to the text or the attribute values of a database object. Conditions may be assigned weights w.r.t. the query as well as to an object. These weights form the basis for a ranking of the database objects w.r.t. the query. As user interface, the system provi...

متن کامل

A Mutually Beneficial Integration of Data Mining and Information Extraction

2000

Un Yong Nahm Raymond J. Mooney

Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DISCOTEX, that combines IE and data mining methodologies to perform text mining as well as...

متن کامل

A Set of Novel Features for Writer Identification

2003

Caroline Hertel Horst Bunke

A system for writer identification is described in this paper. It first segments a given page of handwritten text into individual lines and then extracts a set of features from each line. These features are subsequently used in a k-nearest-neighbor classifier that compares the feature vector extracted from a given input text to a number of prototype vectors coming from writers with known identi...

متن کامل

Firebird Database Backup by Serialized Database Table Dump

Journal: :CoRR 2007

Maurice H. T. Ling

This paper presents a simple data dump and load utility for Firebird databases which mimics mysqldump in MySQL. This utility, fb_dump and fb_load, for dumping and loading respectively, retrieves each database table using kinterbasdb and serializes the data using marshal module. This utility has two advantages over the standard Firebird database backup utility, gbak. Firstly, it is able to backu...

متن کامل

The Design and Implementation of a Legal Text Database

1994

Diomidis Spinellis

We describe the design and implementation of a legal text database. The database of provides a number of Greek Council of State decisions in the form of a computer-accessible medium (CD-ROM). A graphical front-end is provided which allows the rapid retrieval of cases based on arbitrary keywords combined using boolean operators. The database was populated by automatically converting the word-pro...

متن کامل

RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases

2012

Anthony Larcher Kong-Aik Lee Bin Ma Haizhou Li

This paper describes a new speech corpus, the RSR2015 database designed for text-dependent speaker recognition with scenario based on fixed pass-phrases. This database consists of over 71 hours of speech recorded from English speakers covering the diversity of accents spoken in Singapore. Acquisition has been done using a set of six portable devices including smart phones and tablets. The pool ...

متن کامل