Using Web Corpus Statistics to Infer Conceptual Structure
نویسندگان
چکیده
The basic level is the level of conceptual structure at which categories are maximally informative. In this research, we investigated whether the privileged status of the basic level might be captured by the statistical properties of the Web. Using Google’s Web search programming interface, we found that frequency ratios for terms across three levels of abstraction (superordinate, basic, and subordinate) significantly predicted human participants’ spontaneous labeling of images obtained via Mechanical Turk. Specifically, the Web statistics paralleled participants’ preference for superordinate labels for natural kinds (e.g., trees, fish) and basic-level labels for other categories. Further, analyses of genre-specific text from the Corpus of Contemporary American English revealed that children’s texts were significantly more predictive than academic texts. Our findings suggest that distributional statistics from subsets of the Web can be used to infer properties of conceptual structure, potentially offering a powerful, high-resolution, yet low-cost tool for empirically testing theoretical predictions.
منابع مشابه
Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling
issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, nat...
متن کاملA NewAlgorithmic Identity Soft Biopolitics and the Modulation of Control
Marketing and web analytic companies have implemented sophisticated algorithms to observe, analyze, and identify users through large surveillance networks online. These computer algorithms have the capacity to infer categories of identity upon users based largely on their web-surfing habits. In this article I will first discuss the conceptual and theoretical work around code, outlining its use ...
متن کاملHow to Expand Dictionaries with Web-Mining Techniques
This paper presents an approach to enrich conceptual classes based on the Web. To test our approach, we first build conceptual classes using syntactic and semantic information provided by a corpus. The concepts can be the input of a dictionary. Our web-mining approach deals with a cognitive process which simulates human reasoning based on the enumeration principle. The experiments reveal the in...
متن کاملHow to Expand Dictionaries by Web-Mining Techniques
This paper presents an approach to enrich conceptual classes based on the Web. To test our approach, we first build conceptual classes using syntactic and semantic information provided by a corpus. The concepts can be the input of a dictionary. Our web-mining approach deals with a cognitive process which simulates human reasoning based on the enumeration principle. The experiments reveal the in...
متن کاملDesigning and Evaluating a Conceptual Model of Credibility Evaluation of Web Information: a Meta-synthesis and Delphi Study
Background and Aim: The current research aims to develop a literature-dependent and expert-modified model related to credibility evaluation of web information. Methods: Regarding the approach, mixed method would be utilized. The research method then is mixed-heuristic using both qualitative and quantitative methodologies. In the first stage of the research, meta- synthesis was used as a qualita...
متن کامل