
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with + intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, + including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, + embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, + multimodality has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity + of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research + has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application + domains and theoretical frameworks from both historical and recent perspectives, this tutorial is designed to provide an overview + of the computational and theoretical foundations of multimodal machine learning.
+ Building upon a new edition of our survey paper on multimodal ML and academic courses at CMU, this tutorial will cover three topics: + (1) what is multimodal: the principles in learning from heterogeneous, connected, and interacting data, (2) why is it hard: + a taxonomy of six core technical challenges faced in multimodal ML but understudied in unimodal ML, and (3) what is next: + major directions for future research as identified by our taxonomy. +

+ +
  • Time: Monday, 7/24/2022, 9:30am - 12:00pm HST.
  • +
  • Location: ICML 2023, Honolulu, Hawaii, USA. Recorded videos will also be uploaded here soon.
  • +
  • Contact: Presenters can be contacted at pliang@cs.cmu.edu and morency@cs.cmu.edu.
  • +
+ +
+ +