TIDES Language Resources: A Resource Map for Translingual Information Access
نویسندگان
چکیده
Continuing improvements in human language algorithms, coupled with improvements in digital storage and processing, inspire growing confidence in multilingual information access systems. Systems exist to transcribe broadcast news, segment broadcasts into individual stories and sort them by topic. These technologies, useful in isolation, are now being combined to produce intelligent multilingual systems. DARPA TIDES combines technologies in detection, extraction, summarization and translation to create systems capable of searching a wide range of streaming multilingual text and speech sources, in real time, to provide effective access for English-speaking users. The broad scope of tasks and languages in programs like TIDES demands close coordination of research and shared resources. These resources includes large collections of raw text and speech; translations and summaries; annotations of topics, named entities and relations, syntactic structures and propositional content; lexicons; annotation specifications and protocols; and distribution formats and standards. The TIDES program has initiated ambitious attacks on difficult problems, with linguistic resources matched to the needs of each piece of the overall research enterprise. This paper will describe the coordinated language resources being created under the TIDES aegis.
منابع مشابه
Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report
Changes in the supply of and demand for language resources continues to affect the role of large data centers such as the Linguistic Data Consortium (LDC) and European Language Resource Center (ELRA) within the research communities they serve. The past few years have seen increased demand for: intensively multi-modal resources, larger data sets in high-density languages and new data in low dens...
متن کاملFIRE-2008 at Maryland: English-Hindi CLIR
In this year's Forum for Information Retrieval Evaluation (FIRE), the University of Maryland participated in the Ad-hoc task cross-language document retrieval task, with English queries and Hindi documents. The experiments focused on evaluating the effectiveness of a “meaning matching” approach based on translation probabilities. The FIRE Hindi test collection provides the first opportunity to ...
متن کاملIntegrated Feasibility Experiment for Bio-Security: IFE-Bio, A TIDES Demonstration
As part of MITRE’s work under the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) program, we are preparing a series of demonstrations to showcase the TIDES Integrated Feasibility Experiment on Bio-Security (IFE-Bio). The current demonstration illustrates some of the resources that can be made available to analysts tasked with monitoring infectious disease outbrea...
متن کاملTranslingual Mining from Text Data
Like full-text translation, cross-language information retrieval (CLIR) is a task that requires some form of knowledge transfer across languages. Although robust translation resources are critical for constructing high quality translation tools, manually constructed resources are limited both in their coverage and in their adaptability to a wide range of applications. Automatic mining of transl...
متن کاملCross Language Information Retrieval for Digital Museums
The trend toward information globalization has brought new challenges for digital libraries. On the one hand, it is often necessary for a digital library to share its valuable resources with users of different languages. On the other hand, it is also necessary for a DL user to utilize knowledge presented in a foreign language. This paper deals with the translingual issue on the design of a digi...
متن کامل