What is the format of "list" file? #9

couragelfyang · 2020-01-06T01:35:00Z

I saw list files such as "LibriSpeech/list/train.txt" are required parameters for main.py. It seems such files are not provided by librispeech officially. What is the format of them? Could you provide them or the script to generate them?

The text was updated successfully, but these errors were encountered:

colinator · 2020-09-19T03:39:17Z

I believe they are just lists of utterance ids. In my librispeech install, I found a bunch of files ending with .txt, that had utterance ids and transcriptions. This is how I generated the list files:

# Generates train.txt, eval.txt, validation.txt, which
# are just lists of utterance ids. This script looks
# at all the .txt files within LibriSpeech to extract
# the ids and write the files.
# An utterance id is a string like "61-70968-0009".

import os

trainroot = 'LibriSpeech/train-clean-100/' #, 'train-clean-360/', 'train-other-500/'
devroot = 'LibriSpeech/dev-clean/' #, 'LibriSpeech/dev-other/'
testroot = 'LibriSpeech/test-clean/'

def generate_list(root_dir, fn):

    # get the utterance ids
    utterance_ids = []
    for subdir, _, files in os.walk(root_dir):
        for filename in [f for f in files if f.endswith(".txt")]:
            with open(os.path.join(subdir, filename)) as f:
                ids = [l.split(" ")[0] + "\n" for l in f.readlines()]
                utterance_ids.extend(ids)

    # write them
    with open(fn, "w") as of:
        of.writelines(utterance_ids)

if __name__ == "__main__":
    generate_list(trainroot, "LibriSpeech/list/train.txt")
    generate_list(testroot, "LibriSpeech/list/eval.txt")
    generate_list(devroot, "LibriSpeech/list/validation.txt")

wubo2180 · 2021-07-07T12:30:02Z

Dataset is available at the website http://www.openslr.org/12/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the format of "list" file? #9

What is the format of "list" file? #9

couragelfyang commented Jan 6, 2020 •

edited

Loading

colinator commented Sep 19, 2020

wubo2180 commented Jul 7, 2021

What is the format of "list" file? #9

What is the format of "list" file? #9

Comments

couragelfyang commented Jan 6, 2020 • edited Loading

colinator commented Sep 19, 2020

wubo2180 commented Jul 7, 2021

couragelfyang commented Jan 6, 2020 •

edited

Loading