-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WeKws Roadmap 2.0 #121
Comments
Hi,Robin. I found that modelscope has opensourced a model that borrow many codes from wekws. They use CTC as loss function and it seems to work well. I think there are two directions as least worth trying:
Thank you for opensource wekws anyway! It is wonderful! |
Hi, I implement this in PR #135, of course I borrow a lot of codes from modelscope, too. Hope this will be a good solution. But there is still some thing need to be done, especially runtime code. |
Hi, Robin. My personal feeling is the KWS we have are good but the manner we train and the datasets we use are not so great. The simple models we have are not the problem its the datasets that are not liguistically analysed to create phonetic classification around a KW that we are missing. If you had the datasets and also datasets of the device of use then you can make accurate simple and lite KWS so its a bit of a catch-22, but if you picked a model and a device and gave users an option to opt-in then the dataset could be collected as did Big Data with accompany quality metadata that had simple gender,age, region. Also you can collect locally as with ondevice training you can bias a larger pretrained model with a smaller on device model of a dataset collected locally. |
Hi, @robin1001 and @duj12 , any plan for ctc-kws runtime? |
I just noticed Mining Effective Negative Training Samples for Keyword Spotting (github, paper) I have been wondering about a dataset creator and how to select !KW without class imbalance. Just to add with my own experiments of using 'own voice' I can quickly make a KWS that is very accurate. https://github.com/StuartIanNaylor/Dataset-builder was just a rough hack to create a word capture cli boutique to quickly capture 'own voice' KW & !KW as forced aligment is so prone to error (plus is my voice). These are augmented with speex with random noise added to give 2k-4k items in each class. Its really easy to make very accurate 'Own Voice' KWS but they are totally useless for anyone else. The small collection of a few Phonetic pangrams surprised me to how accurate the results are and always had a hunch that in larger datasets and how Phones have distinct spectra that the balance of phones and position in the timeframe requires balance, or at least balance uniqueness. |
WeKws is a community-driven project and we love your feedback and proposals on where we should be heading.
Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).
The following items are in 2.0:
The text was updated successfully, but these errors were encountered: