Causal Inference
Causal inference is to generalize models via eliminating, clarifying the expected model effects, and modularizing reusable features.
Multimodal Sentiment Analysis
The goal of multimodal sentiment analysis is to regress or classify the overall sentiment of an utterance using acoustic, visual, and language cues.
OOD MSA Task
Traditional MSA models tend to heavily rely on the textual modality for sentiment analysis because salient textual semantics (e.g., “it is great.”) easily reflect the sentiment tendency.
ABSTRACT
Existing studies on multimodal sentiment analysis heavily rely on the textual modality and unavoidably incorporate the spurious correlations between textual words and sentiment labels. This greatly hinders the model generalization ability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sentiment analysis which aims to estimate and mitigate the bad effect of the textual modality for strong OOD generalization. To this end, we embrace causal inference, which inspects the causal relationships via a causal graph. From the causal graph, we find that the spurious correlations are attributed to the direct effect of the textual modality on the model prediction. In this light, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis, which captures the direct effect of the textual modality via an extra text module during training. In the inference stage, we first estimate the direct effect by counterfactual inference and then subtract it from the total effect of all modalities to alleviate spurious correlations in the textual modality. Extensive experiments show that the proposed framework significantly improves the OOD generalization ability of existing approaches.
DATASET
Towards comprehensive evaluation, for each task, we performed experiments in both IID and OOD settings. In the former setting, the testing set shares the IID with the training set, while in the latter one, we expect that the sample distribution of each word over different sentiment categories in the testing set is as different as possible from and training set. In a sense, for each task, we need four sets of samples from each dataset: IDD training, IDD validation, IDD testing set, and OOD testing set. The IID training and validation sets are used for training our model, while the IID and OOD testing sets are used for evaluating the model performance in different settings. In fact, towards this end, we can first divide each dadaset (\ie, CMU-MOSI or CMU-MOSEI) into two parts: IID set, and OOD set, where the former part can be further split into three chunks (\ie, IID training, IID validation, and IID testing sets) with random sampling, and the latter constitutes the OOD testing set. Inspired by the traveling salesman problem(TSP), we realize that the problem of partitioning datasets with different distributions can also be transformed into such an optimization problem. So we use simulated annealing to repartiton the IID set (IID training set, IID validing set, IID testing set) and OOD testing set from MOSEI and MOSI dataset.