text database

A Corpus-Based Quantitative Study of Nominalizations across Chinese and British Media English

2014

Ying Liu Alex Chengyu Fang Naixing Wei

This paper reports on a corpus-based quantitative study of the use of nominalizations across China English and British English in two comparable media corpora. In contrast to previous corpus-based studies of nominalizations, we start by using a syntactic approach and proceed with some methodological innovations incorporating large lexical databases and syntactically annotated corpora. The data ...

متن کامل

A Superimposed Coding Scheme Based on Multiple Block Descriptor Files for Indexing Very Large Data Bases

1988

Alan J. Kent Ron Sacks-Davis Kotagiri Ramamohanarao

A new signature file method for accessing information from large data files containing both formatted and free text data is presented. The new method, called the multiorganizational scheme is proposed for indexing very large data files containing hundreds of thousands or possibly millions of records.

متن کامل

Recent Developments within the European Language Resources Association (ELRA)

2000

Khalid Choukri Audrey Mance Valérie Mapelli

The main achievement of ELRA (the most visible) is the growth of its catalogue. The ELRA catalogue as of April 2000 lists 111 speech resources, 50 monolingual lexica, 113 multilingual lexica, 24 written corpora and 275 terminological databases. However, many Language Resources (LRs) need to be identified and/or produced. To this effect, ELRA is active in promoting and funding the co-production ...

متن کامل

Multilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources

2006

Emmanuel Giguet Pierre-Sylvain Luquet

This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘...

متن کامل

The Eeects of Query-based Sampling on Automatic Database Selection Algorithms Keywords: Distributed Collections, Merging Search Results/information Synthesis, Database Selection

2000

Database selection algorithms need to know the subject areas covered by each text database, but this metadata can be diicult to acquire in multi-party environments, such as the Internet, where each party has diierent interests and capabilities. Query-based sampling is a relatively new technique in which metadata is inferred by interacting with each text database and observing the outcomes. Quer...

متن کامل

A Survey on VariousTopic Mining Techniques &Applications

2014

Sakshi Jain Virendra Raghuwanshi

Text-mining commonly consigns to the development of mining attractive patterns and non-trivial information from the database and gets acquaintance from non-arrangement text. Generally text mining covers several computer science restraints with a physically powerful orientation towards artificial intelligence in wide-ranging, together with but not maximum valued of given attractive patterns to r...

متن کامل

Multilingual Lexical Database Generation from parallel texts with endogenous resources

2005

Emmanuel Giguet

This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘...

متن کامل

First experiments on a new online handwritten flowchart database

2011

Ahmad-Montaser Awal Guihuan Feng Harold Mouchère Christian Viard-Gaudin

We propose in this paper a new online handwritten flowchart database and perform some first experiments to have a baseline benchmark on this dataset. The collected database consists of 78 flowcharts labeled at the stroke and symbol levels. In addition, an isolated database of graphical and text symbols was extracted from these collected flowcharts. Then, we tackle the problem of online handwrit...

متن کامل

Design Framework of a Database for Structured Documents with Object Links

1999

Masatoshi YOSHIKAWA Hiroyuki KATO Hiroko KINUTANI

Structured documents often contain character strings of which semantics can be naturally stored as database values or has direct correspondence with database values. By building bilateral logical links between character strings in documents and corresponding database values, semantically rich queries are made expressible. We have introduced a new ADT, named “paratext,” to model text which has l...

متن کامل

Full-Text Search Engines for Databases

2009

László Kovács Domonkos Tikk

Current databases are able to store several Tbytes of free-text documents. The main purpose of a database from the user’s viewpoint is the efficient information retrieval. In the case of textual data, information retrieval mostly concerns the selection and the ranking of documents. The selection criteria can contain elements that apply to the content or the grammar of the language. In the tradi...

متن کامل