Language Homogeneity in the Japanese Wikipedia
نویسنده
چکیده
Wikipedia is a potentially very useful source of information, but intuitively it is difficult to have confidence in the quality of an encyclopedia that anyone can modify. One aspect of correctness is writing style, which we examine in a computer based study of the full Japanese Wikipedia. This is possible because Japanese is a language with clearly distinct writing styles using e.g., different verb forms. We find that the writing style of the Japanese Wikipedia is largely consistent with the style guidelines for the project. Exceptions appear to occur primarily in articles with a small number of changes and editors.
منابع مشابه
Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملIRCE at the NTCIR-12 IMine-2 Task
The IRCE team participated in the IMine-2 task at the NTCIR-12 workshop. We submitted one Chinese language run and five Japanese language runs for the Query Understanding subtask. Our methods exploited online text corpora BaiduPedia for the Chinese language run and Japanese Wikipedia for the Japanese language runs. The approaches employed in the Chinese and Japanese language topics are differed...
متن کاملEnriching Wikipedia's Intra-language Links by their Cross-language Transfer
Although hyperlinks enhance the utility of Wikipedia, embedding them in articles imposes a burden on contributors. To alleviate this burden as well as enrich hyperlinks in Wikipedia articles, we propose a method for transferring intra-language links between different-language articles linked via an interlanguage link. The method avoids anchor selection and disambiguation problems by which usual...
متن کاملCross-language Entity Linking Adapting to User’s Language Ability
In this paper, we propose a method to automatically discover valuable keyphrases in Japanese and link these keyphrases to related Chinese Wikipedia pages. The method that we propose has four stages. Firstly, we extract nouns from a Japanese document using a morphological analyzer and extract the candidates of keyphrases using a method called Top Consecutive Nouns Cohesion (TCNC) [1]. Then, we j...
متن کاملA Pipeline Japanese Entity Linking System with Embedding Features
Entity linking (EL) is the task of connecting mentions in texts to entities in a large-scale knowledge base such as Wikipedia. In this paper, we present a pipeline system for Japanese EL which consists of two standard components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese EL corpus. For candidate g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010