You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm currently working on reproducing the results of MaLa-ASR and have downloaded the slidespeech dataset from https://www.openslr.org/144/. While running the provided decoding script, I noticed that it requires the file located at /nfs/yangguanrou.ygr/slidespeech/${split}_oracle_v1/. Could you please clarify what the format of this file is? Do I need to preprocess the downloaded data in any specific way, such as splitting the audio based on timestamps?
Error logs
no file named test_oracle_v1
Expected behavior
Could you please provide the steps for data processing and explain the format of the data? Thanks, looking forward to your reply.
The text was updated successfully, but these errors were encountered:
The location of the slidespeech dataset can be modified through config file "mala_asr_config.py".
You can change "/nfs/yangguanrou.ygr/slidespeech/${split}_oracle_v1/." to your own path.
The dataset requires four files: "my_wav.scp", "utt2num_samples", "text", "hot_related/ocr_1gram_top50_mmr070_hotwords_list"
"my_wav.scp" is a file of audio path lists. We transform wav file to ark file, so this file looks like
ID1 xxx/slidespeech/dev_oracle_v1/data/format.1/data_wav.ark:22
ID2 xxx/slidespeech/dev_oracle_v1/data/format.1/data_wav.ark:90445
This related_files.tar.gz also provides "text" and a file named "keywords". The file "keywords" refers to "hot_related/ocr_1gram_top50_mmr070_hotwords_list", which contains hotwords list.
"utt2num_samples" contains the length of the wavs, which looks like
ID1 103680
ID2 181600
...
Sorry for the late reply, been busy lately, hope your reproduction goes well!
System Info
torch 2.1
Information
🐛 Describe the bug
Hi, I'm currently working on reproducing the results of MaLa-ASR and have downloaded the slidespeech dataset from https://www.openslr.org/144/. While running the provided decoding script, I noticed that it requires the file located at /nfs/yangguanrou.ygr/slidespeech/${split}_oracle_v1/. Could you please clarify what the format of this file is? Do I need to preprocess the downloaded data in any specific way, such as splitting the audio based on timestamps?
Error logs
no file named test_oracle_v1
Expected behavior
Could you please provide the steps for data processing and explain the format of the data? Thanks, looking forward to your reply.
The text was updated successfully, but these errors were encountered: