Complexity of Integration: Multimodal systems must handle the integration of different data types, each with its own inherent complexities and subtleties. Merging these into a coherent system amplifies the chance of unexpected behaviors.
Spurious Correlations: As noted earlier with CLIP, exploiting spurious correlations can lead to increases in accuracy on specific distributions, but it may result in failures when faced with distribution shifts or unseen examples.
Erroneous Agreement: As mentioned in the abstract, MULTIMON identifies systematic failures by looking for erroneous agreement (inputs that produce the same output but should not). This can happen in multimodal systems when different types of input data lead to similar representations in the shared embedding space, causing the system to mistakenly treat them as equivalent.
Transfer Learning Vulnerabilities: Multimodal systems often leverage pre-trained models and adapt them to new tasks. While this approach is powerful, it might also inherit vulnerabilities and biases from the original training data, leading to unforeseen failures in the new context.
Relationship between tasks (transfer learning), distribution, data types, correlations
MultiMon
14 system failures