Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pronunciation rules instead of g2p model #9

Open
ctlaltdefeat opened this issue Jul 3, 2021 · 6 comments
Open

Pronunciation rules instead of g2p model #9

ctlaltdefeat opened this issue Jul 3, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@ctlaltdefeat
Copy link

If I understand correctly, one difference between espeak's g2p system and gruut is that if a word is not present in the dictionary, espeak uses a set of rules based on letter groups while gruut uses a prediction model.
For some purposes, the pronunciation rules used by espeak are better than the default trained g2p model (such as pronouncing usernames, abbreviations, acronyms, etc. - essentially anything where the "words" are dissimilar to the ones trained on by the g2p model).
Would it be possible to address this somehow, say by providing espeak-derived pronunciation rules as an alternative?

@synesthesiam
Copy link
Contributor

Your understanding is correct :)

For some cases, I could see expanding on gruut's abbreviation system, which does the usual regex match/expand stuff. Additionally, the lexicon can be extended with custom words fairly easily. But these approaches obviously won't work for all cases.

I'm not able to dig into eSpeak's code due to the license, so I would need to find a different resource for pronunciation rules. How would you detect things like a username, though? I know to pronounce yours as "control alt defeat", but that's based on quite a bit of knowledge outside of the letters.

@synesthesiam synesthesiam added the enhancement New feature or request label Jul 4, 2021
@ctlaltdefeat
Copy link
Author

I know to pronounce yours as "control alt defeat", but that's based on quite a bit of knowledge outside of the letters.

That specific example is definitely out of scope, but I've noticed for example that when encountering clusters of consonants inside one "word", or parts of different words within one "word", that espeak tends to do better.
For example, espeak "correctly" deals with "qtpie" by pronouncing it "cue tee pie", whereas the output of gruut is k t p ˈi. Examples along those lines are relatively simple to come up with in domains of usernames, online lingo, etc.

@nitinthewiz
Copy link

I was wondering - I think it's possible to integrate something like spaCy into this project to better suss our Names Entities such as time, location, organizations. With some training of the models or using the larger models, it'll even be able to detect usernames and other nouns.

For example, in this example using the smallest English model, it can detect person, organization, time, location, and money -

https://bit.ly/3mCjAnN

Sorry for the short link. I only put it there because the long URL is like... this...

https://explosion.ai/demos/displacy-ent?text=ctlaltdefeat%20and%20%20Michael%20Hansen%20were%20talking%20this%20morning%20about%20pronunciation%20rules%20such%20as%20those%20used%20by%20eSpeak.%20These%20rules%20can%20detect%20entities%20like%20names%2C%20nouns%2C%20organizations%2C%20and%20locations%20better%20and%20thus%20lead%20to%20a%20better%20conversion.%20For%20example%2C%20AWS%2C%20USA%2C%20Amazon%2C%20gruut%2C%20and%20twenty%20dollars.&model=en_core_web_sm&ents=person%2Corg%2Cgpe%2Cloc%2Cproduct%2Cnorp%2Cdate%2Cper%2Cmisc%2Cmoney%2Ctime

I looked at spaCy's license and it's MIT, allowing for commercial use, modification, private use, etc.

So it would be a good fit, as far as I can tell.

Also, I don't quite understand how CRFSuite models help in tagging, but perhaps spaCy could sit after that and just tag things that CRFSuite didn't catch?

@synesthesiam
Copy link
Contributor

I actually started the first version of gruut with spaCy, so this would be bringing it full circle 😆

spaCy's named entity recognition is certainly much better than mine. My issue with spaCy originally was that its tokenizer broke apart words like "don't" into sub-words, and I had to disable a lot of functionality to stop this. Maybe there are more options now in newer versions?

The CRFSuite models are used for part of speech tagging right now in English and French. If I used spaCy, I wouldn't need these anymore for tagging (they're still needed for guessing word pronunciations, however).

With some training of the models or using the larger models, it'll even be able to detect usernames and other nouns.

Do you know if spaCy would be able to split non-delimited compound words apart? Like "ctrlaltdefeat" to "ctrl", "alt", and "defeat".

@nitinthewiz
Copy link

That's so neat! Well, spaCy is very powerful, but definitely not perfect. We might have to further train it to recognize usernames and also understand how to spell them out. I'll look into whether someone's written a document about that. There is a whole document about how to train it for new NERs here so perhaps we can go along those lines. There could also be a rule-based approach to look for the "@" which commonly precedes mentions of usernames.

For now, I downloaded the spaCy English Large and transformer models, hoping that one of them might have some knowledge of Internet usernames, but they both fell flat.

However, there is a rule-based approach to not splitting up "don't" when tokenizing the word. Here's some code about the same. Notice the nlp.tokenizer.rules -

>>> import en_core_web_lg
>>> nlp = en_core_web_lg.load()
>>> nlp.tokenizer.rules = {key: value for key, value in nlp.tokenizer.rules.items() if "'" not in key and "’" not in key and "‘" not in key}
>>> doc = nlp("ctlaltdefeat and  Michael Hansen were talking this morning about pronunciation rules such as those used by eSpeak. These rules don't reflect the full power of NLP. For example, AWS, USA, Amazon, gruut, and twenty dollars.")
>>> for token in doc:
...     print(token.text)
... 
ctlaltdefeat
and
 
Michael
Hansen
were
talking
this
morning
about
pronunciation
rules
such
as
those
used
by
eSpeak
.
These
rules
don't
reflect
the
full
power
of
NLP
.
For
example
,
AWS
,
USA
,
Amazon
,
gruut
,
and
twenty
dollars
.
>>> 
>>> 
>>> 
>>> for ent in doc.ents:
...     print(ent.text, ent.start_char, ent.end_char, ent.label_)
... 
Michael Hansen 18 32 PERSON
this morning 46 58 TIME
NLP 159 162 ORG
AWS 177 180 ORG
USA 182 185 GPE
Amazon 187 193 ORG
gruut 195 200 ORG
twenty dollars 206 220 MONEY
>>> 

@nitinthewiz
Copy link

nitinthewiz commented Aug 30, 2021

I just looked up part-of-speech tagging in CRFSuite and yeah, you're right that spaCy will be able to do that just fine if you go with it.

About guessing pronunciation... Am I understanding it right that you train a pre-created corpus of g2p from Phonetisaurus to make it more customized and the role of the CRFSuite is just to load the data in gruut/g2p.py?

I ask because I filed this here issue - rhasspy/gruut-ipa#6 - about some incorrect phonemes and I'm wondering what's the best way to address that issue. Is it coming from Phonetisaurus or from your finetuning of their model?

synesthesiam pushed a commit that referenced this issue Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants