Multimodal Information Processing: Some recent NLP applications
Multimodal information processing deals with the efficient usage of information available in different modalities such as audio, video, text, etc. for solving various task applications of real life. This talk will discuss how the multimodal information extracted from different modalities can help in improving different tasks of dialogue systems, summarization, hate speech detection, and complaint mining. Multimodal information collected from audio tones, facial expressions, and texts is utilized for determining the type of utterance in a multitask setting where emotion recognition and dialogue act classification tasks are solved simultaneously. Multimodal information collected from videos, images, and texts can also be utilized for generating a summary. Images and texts collected from Amazon reviews are utilized for developing some aspect-based multimodal complaint detection systems in a multi-task setting where sentiment and emotion information are utilized as auxiliary tasks. Memes collected from social media are utilized for the detection of hate speech in a multitask setting where sentiment, emotion, and sarcasm detection are utilized as auxiliary tasks. This talk will highlight these different applications of multimodal information processing in solving different NLP tasks.