Annual Meeting of the Association for Computational Linguistics
ثبت نشده
چکیده
Compounds differ in the degree to which they are semantically compositional (compare, e.g., "carwash","handbag", "beefcake" and "humbug"). Since even relatively transparent compounds such as "carwash"may leave the uninitiated reader with uncertainty about the intended meaning (soap for washing cars? aplace where you can get your car washed?), an efficient way of retrieving the meaning of a compound isto use the compound’s form as an access key for its meaning. However, in psychology, the view has become popular that at the earliest stage of lexical processingin reading, a morpho-orthographic decomposition into morphemes would necessarily take place. Theo-rists ascribing to obligatory decomposition appear to have some hash coding scheme in mind, with theconstituents providing entry points to a form of table look-up (e.g., Taft & Forster, 1976). Leaving aside the question of whether such a hash coding scheme would be computationally efficientas well as the question how the putative morpho-orthographic representations would be learned, mypresentation focuses on the details of lexical processing as revealed by an eye-tracking study of thereading of English compounds in sentences. A careful examination of the eye-tracking record with generalized additive modeling (Wood, 2006),combined with computational modeling using naive discrimination learning (Baayen, Milin, Filipovic,Hendrix, & Marelli, 2011) revealed that how far the eye moved into the compound is co-determined bythe compound’s lexical distributional properties, including the cosine similarity of the compound and itshead in document vector space (as measured with latent semantic analysis, Landauer & Dumais, 1997).This indicates that compound processing is initiated already while the eye is fixating on the precedingword, and that even before the eye has landed on the compound, processes discriminating the meaningof the compound from the meaning of its head have already come into play. Once the eye lands on the compound, two very different reading signatures emerge, which criticallydepend on the letter trigrams spanning the morpheme boundary (e.g., "ndb" and "dba" in "handbag").From a discrimination learning perspective, these boundary trigrams provide the crucial (and only) or-thographic cues for the compound’s (idiosyncratic) meaning. If the boundary trigrams are sufficientlystrongly associated with the compound’s meaning, and if the eye lands early enough in the word, a singlefixation suffices. Within 240 ms (of which 80 ms involve planning the next saccade) the compound’smeaning is discriminated well enough to proceed to the next word. However, when the boundary trigrams are only weakly associated with the compound’s meaning, multi-ple fixations become necessary. In this case, without the availability of the critical orthographic cues, theeye-tracking record bears witness to the cognitive system engaging not only bottom-up processes fromform to meaning, but also top-down guessing processes that are informed by the a-priori probability ofthe head and the cosine similarities of the compound and its constituents in semantic vector space. These results challenge theories positing obligatory decomposition with hash coding, as hash codingpredicts insensitivity to semantic transparency, contrary to fact. Our results also challenge theories posit-ing blind look-up based on compounds’ orthographic forms. Although this might be computationallyefficient, the eye can’t help seeing parts of the whole. In summary, reality is much more complex, withdeep pre-arrival parafoveal processing followed by either efficient discrimination driven by the boundaryxv trigrams (within 140 ms), or by an inefficient decompositional process (requiring an additional 200 ms)that seeks to make sense of the conjunction of head and modifier. ReferencesBaayen, R. H., Kuperman, V., Shaoul, C., Milin, P., Kliegl, R. & Ramscar, M. (submitted), Decom-position makes things worse. A discrimination learning approach to the time course of understandingcompounds in reading. Baayen, R. H., Milin, P., Filipovic Durdjevic, D., Hendrix, P. & Marelli, M. (2011), An amorphousmodel for morphological processing in visual comprehension based on naive discriminative learning,Psychological Review, 118, 3, 438-481. Landauer, T.K. & Dumais, S.T. (1997), A Solution to Plato’s Problem: The Latent Semantic Analysistheory of acquisition, induction and representation of knowledge, Psychological Review, 104, 2, 211-240. Taft, M. & Forster, K. I. (1976), Lexical Storage and Retrieval of Polymorphemic and PolysyllabicWords, Journal of Verbal Learning and Verbal Behavior, 15, 607-620. Wood, S. N. (2006), Generalized Additive Models, Chapman & Hall/CRC, New York.
منابع مشابه
21st Annual Meeting of the Association for Computational Linguistics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, June 15-17, 1983
متن کامل