Explainable natural language processing with matrix product states
نویسندگان
چکیده
Abstract Despite empirical successes of recurrent neural networks (RNNs) in natural language processing (NLP), theoretical understanding RNNs is still limited due to intrinsically complex non-linear computations. We systematically analyze RNNs’ behaviors a ubiquitous NLP task, the sentiment analysis movie reviews, via mapping between class called arithmetic circuits (RACs) and matrix product state. Using von-Neumann entanglement entropy (EE) as proxy for information propagation, we show that single-layer RACs possess maximum propagation capacity, reflected by saturation EE. Enlarging bond dimension beyond EE threshold does not increase model prediction accuracies, so minimal best estimates data statistics can be inferred. Although saturated smaller than allowed area law, our achieves ? 99 % training accuracies realistic sets. Thus, low warrant against adoption NLP. Contrary common belief long-range main source successes, harness high expressiveness from subtle interplay word vector embeddings. Our work sheds light on phenomenology learning RACs, more generally explainability NLP, using tools many-body quantum physics.
منابع مشابه
Stochastic matrix product states.
The concept of stochastic matrix product states is introduced and a natural form for the states is derived. This allows us to define the analogue of Schmidt coefficients for steady states of nonequilibrium stochastic processes. We discuss a new measure for correlations which is analogous to entanglement entropy, the entropy cost S(C), and show that this measure quantifies the bond dimension nee...
متن کاملEntanglement classification with matrix product states
We propose an entanglement classification for symmetric quantum states based on their diagonal matrix-product-state (MPS) representation. The proposed classification, which preserves the stochastic local operation assisted with classical communication (SLOCC) criterion, relates entanglement families to the interaction length of Hamiltonians. In this manner, we establish a connection between ent...
متن کاملMatrix and Tensor Factorization Methods for Natural Language Processing
Tensor and matrix factorization methods have attracted a lot of attention recently thanks to their successful applications to information extraction, knowledge base population, lexical semantics and dependency parsing. In the first part, we will first cover the basics of matrix and tensor factorization theory and optimization, and then proceed to more advanced topics involving convex surrogates...
متن کاملNew trends in natural language processing: statistical natural language processing.
The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and informa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: New Journal of Physics
سال: 2022
ISSN: ['1367-2630']
DOI: https://doi.org/10.1088/1367-2630/ac6232