diff --git a/chapters/en/unit4/multimodal-models/a_multimodal_world.mdx b/chapters/en/unit4/multimodal-models/a_multimodal_world.mdx index fac5ffa49..5516b363c 100644 --- a/chapters/en/unit4/multimodal-models/a_multimodal_world.mdx +++ b/chapters/en/unit4/multimodal-models/a_multimodal_world.mdx @@ -48,8 +48,7 @@ A dataset consisting of multiple modalities is a multimodal dataset. Out of the - Vision + Audio: [VGG-Sound Dataset](https://www.robots.ox.ac.uk/~vgg/data/vggsound/), [RAVDESS Dataset](https://zenodo.org/records/1188976), [Audio-Visual Identity Database (AVID)](https://www.avid.wiki/Main_Page). - Vision + Audio + Text: [RECOLA Database](https://diuf.unifr.ch/main/diva/recola/), [IEMOCAP Dataset](https://sail.usc.edu/iemocap/). -Now let us see what kind of tasks can be performed using a multimodal dataset? There are many examples, but we will focus generally on tasks that contains the visual and textual -A multimodal dataset will require a model which is able to process data from multiple modalities, such a model is a multimodal model. +Now, let us see what kind of tasks can be performed using a multimodal dataset. There are many examples, but we will generally focus on tasks that contain both visual and textual elements. A multimodal dataset requires a model that is able to process data from multiple modalities. Such a model is called a multimodal model. ## Multimodal Tasks and Models