change the feature extraction setting of vctk config #433

bondio77 · 2024-08-21T05:47:21Z

Hi, first of all, thank you so much for providing pre-trained models through many experiments. But what I want to ask is, I want to fine-tune the pre-trained VCTK model with my multi-speaker dataset. In the VCTK config file, fft_size = 2048, hop_length = 300, win_length = 1024, but the config of the TTS model I trained is 1024, 256, 1024. When fine-tuning, will it work if I change the config file to 1024, 256, 1024 to match my TTS model? The sampling rate is 24000. Thank you!

kan-bayashi · 2024-08-27T08:50:33Z

Sorry for the late reply.
I think you should train from scratch for the following reasons:

The difference of fft size or window size might be OK if you finetune the model
The difference of hop length is critical since the hop length determines the upsampling layer structure.
If you want to change hop length, you need to change the upsample layers as well.

Example:
hop length = 300 -> 5 * 5 * 4 * 3

ParallelWaveGAN/egs/vctk/voc1/conf/hifigan.v1.yaml

Line 40 in 8674037

upsample_scales: [5, 5, 4, 3] # Upsampling scales.

bondio77 · 2024-09-06T02:28:50Z

thank you so much for answering me
i will try your recommend. thank you!

kan-bayashi added the question Further information is requested label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change the feature extraction setting of vctk config #433

change the feature extraction setting of vctk config #433

bondio77 commented Aug 21, 2024

kan-bayashi commented Aug 27, 2024

bondio77 commented Sep 6, 2024

change the feature extraction setting of vctk config #433

change the feature extraction setting of vctk config #433

Comments

bondio77 commented Aug 21, 2024

kan-bayashi commented Aug 27, 2024

bondio77 commented Sep 6, 2024