Skip to content

cambridgeltl/COD

Repository files navigation

COD Dataset 🐟

The released dataset comprises manually generated, localised and cross-lingually aligned TOD data in Arabic, Indonesian, Russian and Swahili, as well as the corresponding data in English from the SGD dataset, which served as the source of dialogue prompts. For details of our prompt-based language-specific dialogue generation method please see our paper.

Baseline code will be released shortly.

Languages

ISO 639-2 Name Family Area1 Script
ar Arabic Afro-Asiatic Northern Africa/Western Asia Arabic
id Indonesian Austronesian Southeastern Asia Latin
ru Russian Indo-European Eastern Europe Cyrillic
sw Swahili Niger-Congo Eastern Africa Latin

1 According to the United Nations geoscheme.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published