Live Lexicons and Dynamic Corpora Adapted to the Network Resources for Chinese Spoken Language Processing Applications in an Internet Era
نویسندگان
چکیده
In the future network era, huge volume of information on all subject domains will be readily available via the network. Also, all the network information are dynamic, ever-changing and exploding. Furthermore, many of the spoken language processing applications will have to do with the content of the network information, which is dynamic. This means dynamic lexicons, language models and so on will be required. In order to cope with such a new network environment, automatic approaches for the collection, classification, indexing, organization and utilization of the linguistic data obtainable from the networks for language processing applications will be very important. On the one hand, high performance spoken language technology can hopefully be developed based on such dynamic linguistic data on the network. On the other hand, it is also necessary that such spoken language technology can be intelligently adapted to the content of the dynamic and the ever-changing network information. Some basic concept for live lexicons and dynamic corpora adapted to the network resources has been developed for Chinese spoken language processing applications and briefly summarized here in this paper. Although the major considerations here are for Chinese language, the concept may equally apply to other languages as well.
منابع مشابه
Feature Extraction to Identify Network Traffic with Considering Packet Loss Effects
There are huge petitions of network traffic coming from various applications on Internet. In dealing with this volume of network traffic, network management plays a crucial rule. Traffic classification is a basic technique which is used by Internet service providers (ISP) to manage network resources and to guarantee Internet security. In addition, growing bandwidth usage, at one hand, and limit...
متن کاملSpoken language resources for Cantonese speech processing
This paper describes the development of CU Corpora, a series of large-scale speech corpora for Cantonese. Cantonese is the most commonly spoken Chinese dialect in Southern China and Hong Kong. CU Corpora are the first of their kind and intended to serve as an important infrastructure for the advancement of speech recognition and synthesis technologies for this widely used Chinese dialect. They ...
متن کاملBuild a Situation-based Language Knowledge Base
Language resources are very important for natural language processing research and applications. This paper will introduce our ongoing research work to build a situation-based language knowledge base for the Chinese language, based on two basic language resources: three Chinese semantic lexicons and a large scale Chinese treebank. We developed a supporting platform to make full use of the abund...
متن کاملBuilding a Situation-Based Language Knowledge Base
Language resources are very important for natural language processing research and applications. This paper will introduce our ongoing research work to build a situation-based language knowledge base for the Chinese language, based on two basic language resources: three Chinese semantic lexicons and a large scale Chinese treebank. We developed a supporting platform to make full use of the abund...
متن کاملCorefrence resolution with deep learning in the Persian Labnguage
Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...
متن کامل