In the past decade, sarcasm detection has been intensively conducted in a textual scenario. With popularization of video communication, analysis multi-modal scenarios received much attention recent years. Therefore, detection, which aims at detecting conversations, becomes increasingly hot both natural language processing community and community. this paper, considering that is often conveyed t...