Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear signal flow related to usage of mel spectrograms in StyleMelGAN #384

Open
andrewrose43 opened this issue Nov 29, 2022 · 1 comment
Labels
question Further information is requested

Comments

@andrewrose43
Copy link

andrewrose43 commented Nov 29, 2022

Hello,

This is probably just a documentation problem.

It is unclear how mel spectrograms are used by the StyleMelGAN generator module.

I've been trying to figure out how to format mel spectrograms so the generator will accept them. To figure that out, I've been looking at the initialization parameters of the StyleMelGANGenerator module.

The only obvious candidate for defining the format/dimensions of the input spectrogram is the aux_channels parameter. But that wouldn't make sense, for these reasons:

  1. Its default value is 80, but a mel spectrogram contains much more than 80 points of data.
  2. aux_channels controls only one parameter: the in_channels parameter of the first layer in the first TADEResBlock. That would make sense if if the mel spectrograms' dimensions corresponded to this parameter, but...
  3. The diagram of StyleMelGAN's signal path in the original StyleMelGan paper conflicts with point 2); the diagram shows the spectrograms being inserted into every TADEResBlock, not just the first.

So my questions are:

  1. What is aux_channels? (What kind of data is considered "auxiliary input" - am I correct that this is the spectrograms?)
  2. If aux_channels does not determine how the input spectrograms should be formatted, what does?

If you can answer these questions for me, I would be happy to improve the documentation/comments myself.

Thank you!

@kan-bayashi
Copy link
Owner

kan-bayashi commented Dec 12, 2022

What is aux_channels?

The dimension of auxiliary inputs, i.e., mel-spectrogram.

If aux_channels does not determine how the input spectrograms should be formatted, what does?

I could not understand your meaning. The parameter decides the dimension of mel-spectrogram.

Its default value is 80, but a mel spectrogram contains much more than 80 points of data.

You may confuse the shape of mel-spectrogram. Mel-spectrogram shape is (#frames, #dim) and aux_channels corresponds to #dim .

@kan-bayashi kan-bayashi added the question Further information is requested label Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants