Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating mnemonic language determination #142

Merged
merged 7 commits into from
Sep 27, 2024

Conversation

valefar-on-discord
Copy link
Collaborator

@valefar-on-discord valefar-on-discord commented Sep 7, 2024

Fixes Closes Resolves #119

Changes

Updating the language determination of mnemonics to allow a list of languages per word

Types of changes

What types of changes does your code introduce?

  • Optimization

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Creating mnemonics with Chinese Simplified or Chinese Traditional can easily reproduce the issue with a set_trace in the determine_mnemonic_language logic

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

Remarks

The issue here is the word language map only allows for a single language per word. Because there is so much overlap between Chinese Simplified and Chinese Traditional, you can run into a situation where, after generating a mnemonic in Chinese Simplified, your word map could look like this:

ipdb> word_languages
[['chinese_traditional'], ['chinese_traditional'], ['chinese_simplified'], ['chinese_traditional'], ['chinese_traditional'], 
['chinese_traditional'], ['chinese_traditional'], ['chinese_traditional'], ['chinese_traditional'], ['chinese_simplified'], 
['chinese_simplified'], ['chinese_traditional'], ['chinese_simplified'], ['chinese_traditional'], ['chinese_simplified'], 
['chinese_traditional'], ['chinese_simplified'], ['chinese_traditional'], ['chinese_traditional'], ['chinese_traditional'], 
['chinese_traditional'], ['chinese_traditional'], ['chinese_traditional'], ['chinese_simplified']]

But even in this situation, the logic will go through both discovered languages and provide the correct mnemonic.

The failure case would be if you generate a mnemonic with Chinese Simplified, but each word is also in Chinese Traditional and results in an invalid mnemonic.

If that scenario is even possible in the bounds of how mnemonics work then it would have to be from a mnemonic created outside this tool as there is a verification step in the new-mnemonic flow.

Either way, the logic has been updated to allow for multiple languages per word resulting in this for the same test mnemonic:

ipdb> word_languages
[['chinese_simplified', 'chinese_traditional'], ['chinese_simplified', 'chinese_traditional'], ['chinese_simplified'], 
['chinese_simplified', 'chinese_traditional'], ['chinese_simplified', 'chinese_traditional'], ['chinese_simplified', 
'chinese_traditional'], ['chinese_simplified', 'chinese_traditional'], ['chinese_simplified', 'chinese_traditional'], 
['chinese_simplified', 'chinese_traditional'], ['chinese_simplified'], ['chinese_simplified'], ['chinese_simplified', 
'chinese_traditional'], ['chinese_simplified'], ['chinese_simplified', 'chinese_traditional'], ['chinese_simplified'], 
['chinese_simplified', 'chinese_traditional'], ['chinese_simplified'], ['chinese_simplified', 'chinese_traditional'], 
['chinese_simplified', 'chinese_traditional'], ['chinese_simplified', 'chinese_traditional'], ['chinese_simplified', 
'chinese_traditional'], ['chinese_simplified', 'chinese_traditional'], ['chinese_simplified', 'chinese_traditional'], 
['chinese_simplified']]

@valefar-on-discord
Copy link
Collaborator Author

Every test failed in the same manner: This mnemonic abbreviated form is available in multiple languages

This happens because some mnemonics, 的 的 的 的 的 的 的 的 的 的 的 在 as an example, are the same in both Chinese Simplified and Chinese Traditional.

Prior to this change the resulting word languages would have only contained Chinese Traditional as all instances of Chinese Simplified would have been overwritten.

With this change both are now returned and both result in a valid mnemonic which then raises the error.

I'm not sure how we want to handle this. Though this is a little spooky, I don't believe it is possible for the underlying private key to differ if there is duplication between languages but having that confirmed by someone more knowledgeable than I would be preferred.

@yorickdowne
Copy link

yorickdowne commented Sep 9, 2024

The failure case would be if you generate a mnemonic with Chinese Simplified, but each word is also in Chinese Traditional and results in an invalid mnemonic.

Do we know that this failure case is possible? I want to be sure we are solving something that can actually happen. Did the original report give us a sample mnemonic that causes a failure, that is, two different (but valid) keys in Simplified and Traditional?

@remyroy
Copy link
Member

remyroy commented Sep 10, 2024

I'm trying to wrap my head around this.

In the new-mnemonic flow, we try to reconstruct the mnemonic from words reentered after the user already selected a language. I think we should restrict the reentered mnemonic language to the one already selected instead of trying to guess it again. So in https://github.com/eth-educators/ethstaker-deposit-cli/blob/main/ethstaker_deposit/cli/new_mnemonic.py#L52, we could pass the mnemonic_language down the reconstruct_mnemonic function for this. This would still support entering the abreviated form of the mnemonic when confirming it in that flow. This would mean that if the user selected Chinese Simplified initially, we would not try to guess it was reentered in Chinese Traditional for instance in the new-mnemonic flow.

This would leave the existing-mnemonic, the generate-bls-to-execution-change and the exit-transaction-mnemonic flows remaining for this issue where we try to guess the mnemonic language in https://github.com/eth-educators/ethstaker-deposit-cli/blob/main/ethstaker_deposit/cli/existing_mnemonic.py#L60 . In that case, we could have a potentially conflict where the words entered match multiple languages. I think we could more elegantly solve this in one or multiple ways:

  1. In case we are not sure, we could prompt the user with something like The mnemonic you entered matches multiple languages, which ones would you like to use for this command? [1. 简体中文, 2. 繁體中文,]: [简体中文]:
  2. We could add another flag --mnemonic_language to the associated commands to force using that language.
  3. We could simply fail with that message above This mnemonic abbreviated form is available in multiple languages. This has the con that if someone really has one of those potentially conflicting mnemonics that match to multiple languages that he or she wouldn't be able to use it in those flows. This would seems like a bad outcome to me.

What do you think @valefar-on-discord ? Is this a good picture of the whole situation? Am I missing something here?

@remyroy
Copy link
Member

remyroy commented Sep 10, 2024

It would still make sense to fix the determine_mnemonic_language function because in the grand scheme of things, the same word can match to multiple languages and this new PR logic would better reflect this reality.

@remyroy
Copy link
Member

remyroy commented Sep 10, 2024

I just noticed that this would also apply to the following additional commands, and not just existing-mnemonic since they use the load_mnemonic_arguments_decorator decorator and they need a mnemonic from the user:

  • generate-bls-to-execution-change
  • exit-transaction-mnemonic

Copy link
Member

@remyroy remyroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think we can fix the failing unit tests with assuming that the validatorerror will be raised, ie expecting it to be raised. As a better overall fix for this whole issue, I think we should create another PR with one of the solutions in #142 (comment)

@valefar-on-discord
Copy link
Collaborator Author

I talked to a cryptographic expert and they let me know that the mnemonic word is used as a lookup to determine the index in the word list. As long as the matching words have the identical index in the corresponding word list, the underlying private key and cryptographic results will be identical.

I did a comparison of Chinese simplified/traditional and, though there are 1275 shared words, each one are of the same index.

My guess is this is a non-issue and the creation of these word lists has this consideration in mind.

That's not to say we should be complicit. I am a fan of Remy's prompt solution and it's not that big of a lift.

@yorickdowne
Copy link

Does prompting improve UX? It may be simpler to just make an assumption either way, if the resulting keys are the same anyway.

@yorickdowne
Copy link

yorickdowne commented Sep 11, 2024

Here’s what we have from the reporting user. Let’s see how this comes out in simplified and traditional Chinese. If it’s valid in both and conflicts, we need to prompt. If it’s valid in only one but our current code guesses the wrong language, we want to fix the guessing algorithm.

This was one of the vectors that had failed.
的 的 的 的 的 的 的 的 的 的 的 在

@chong-he
Copy link

chong-he commented Sep 11, 2024

Is the user trying to recover the private key with another tool other than the deposit-cli? Because with ./deposit.sh existing-mnemonic, there is no option of Chinese Traditional (CHT), only Chinese Simplified (CHS): 12. 简 体中文

For mnemonic like: 的 的 的 的 的 的 的 的 的 的 的 在 where the mnemonic is valid for both languages CHS and CHT (edit: I assume it is valid on CHT too, haven't tested this as deposit-cli can't recover CHT), the resulting private key should be the same. I remember it is something like:

12 words --> 12 indices (12 numbers from 0-2047) --> using these 12 numbers to generate the keys

As @valefar-on-discord mentioned, each word corresponds to an index in the word list. I have also checked and can confirm that when CHS and CHT are the same word, the indices are the same. Therefore, the resulting key should be the same, because they essentially have the same 12 indices/numbers. If there is some tools to verify this, then we can make an assumption either way as @yorickdowne says.

Curious what tools did @anupsv used that encounter this issue?

@yorickdowne
Copy link

I’d like to verify that this mnemonic works with deposit cli as is and generates the same secret key. We know it should; let me see it does.

If it does, then @valefar-on-discord and @remyroy , I am not convinced we should change the existing logic. Unless we risk detecting the wrong language in some cases, as in, a language that doesn’t contain all words.

For example, if we have a mnemonic that is all Spanish, and matches Portuguese/Spanish only on the last word, would it correctly use Spanish? Or the other way around … you get the drift.

@valefar-on-discord
Copy link
Collaborator Author

@chong-he The prompt you are seeing is for the language used for the tool's prompts and outputs. This allows for 12 languages that are partially translated.

When generating a mnemonic, you are given the mnemonic language option of
Please choose the language of the mnemonic word list [1. 简体中文, 2. 繁體中文, 3. čeština, 4. English, 5. Français, 6. Italiano, 7. 日本語, 8. 한국어, 9. Português, 10. Español]: [english]:
Where option 1 is Chinese Simplified and 2 is Chinese Traditional.

@yorickdowne In your example, because multiple languages are determined for the list of words, each language will be attempted and ultimately output:

  • A ValidatorError complaining the abbreviated form exists in multiple languages
  • The fully constructed mnemonic
  • None signifying an invalid mnemonic

Based on how this logic currently exists, Portuguese will not overwrite Spanish but vice-versa, so lets say you generated a Portuguese mnemonic and a single word existed in Spanish: Your word map would be 23 entries containing Portuguese and 1 of Spanish. Two languages will be determined and both iterated upon. Portuguese would result in a valid mnemonic but Spanish would not and the Portuguese full word mnemonic would be returned.

I ran a script to check the indices of each word across each language to see if there are any that contain mis-matched indices and there are specifically between English and French:

languageA languageB wordA wordB indexA indexB
english french abandon abandon 0 1
english french amateur amateur 61 88
english french angle angle 70 107
english french animal animal 72 110
english french aspect aspect 107 148
english french badge badge 139 190
english french bicycle bicycle 175 230
english french bonus bonus 203 262
english french brave brave 218 277
english french canal canal 264 323
english french capable capable 271 328
english french caution caution 293 347
english french civil civil 332 403
english french client client 342 409
english french concert concert 373 436
english french correct correct 388 451
english french coyote coyote 397 461
english french crucial crucial 419 478
english french cruel cruel 420 479
english french cycle cycle 438 493
english french danger danger 443 498
english french digital digital 495 562
english french distance distance 509 573
english french double double 525 594
english french dragon dragon 528 598
english french effort effort 565 631
english french essence essence 618 725
english french exact exact 626 757
english french excuse excuse 632 763
english french fatal fatal 668 795
english french fatigue fatigue 670 796
english french festival festival 681 812
english french figure figure 688 820
english french fortune fortune 732 854
english french fragile fragile 739 861
english french fruit fruit 750 880
english french globe globe 794 919
english french guide guide 828 953
english french humble humble 887 998
english french image image 906 1011
english french immense immense 908 1014
english french impact impact 910 1017
english french innocent innocent 932 1043
english french intact intact 940 1053
english french jaguar jaguar 953 1070
english french junior junior 969 1093
english french label label 993 1102
english french lecture lecture 1016 1123
english french loyal loyal 1060 1165
english french machine machine 1068 1178
english french million million 1125 1248
english french minute minute 1130 1254
english french miracle miracle 1131 1255
english french mobile mobile 1139 1259
english french muscle muscle 1164 1286
english french nation nation 1178 1301
english french nature nature 1179 1302
english french noble noble 1197 1322
english french notable notable 1204 1331
english french opinion opinion 1243 1381
english french orange orange 1246 1387
english french ozone ozone 1268 1409
english french palace palace 1273 1411
english french panda panda 1275 1416
english french phrase phrase 1310 1476
english french piano piano 1312 1478
english french pizza pizza 1325 1492
english french position position 1347 1524
english french prison prison 1368 1548
english french public public 1384 1567
english french puzzle puzzle 1399 1576
english french question question 1404 1580
english french relief relief 1450 1626
english french rival rival 1493 1671
english french romance romance 1500 1674
english french salon salon 1524 1707
english french science science 1543 1727
english french sentence sentence 1567 1748
english french service service 1569 1756
english french simple simple 1608 1769
english french social social 1647 1777
english french source source 1664 1801
english french spatial spatial 1668 1805
english french stable stable 1694 1809
english french surface surface 1745 1830
english french surprise surprise 1747 1833
english french suspect suspect 1750 1836
english french talent talent 1771 1847
english french train train 1848 1911
english french tunnel tunnel 1876 1933
english french unique unique 1900 1948
english french usage usage 1916 1954
english french vague vague 1925 1963
english french valve valve 1928 1970
english french village village 1952 2008
english french virus virus 1956 2014
english french vital vital 1960 2020
english french volume volume 1966 2034
english french voyage voyage 1968 2039
english french wagon wagon 1970 2041

Given the amount of duplicates I think this is an issue we want to account for.

@yorickdowne
Copy link

Thanks for doing that work! Ok, so we could have a valid mnemonic in either English or French, but obviously only one language gives the key the user wants.

In that case yes: If the mnemonic is valid in multiple languages - we can check for that first - then we should prompt. If the words exist in multiple languages but only one language gives us a mnemonic with correct checksum, we want to silently choose the language that has a working mnemonic. Better UX that way.

How’s that sound?

@chong-he
Copy link

chong-he commented Sep 12, 2024

Edit to remove the quote as it's too long

Thanks for the explanation @valefar-on-discord ! Now I get that the existing-mnemonic language option is just to display the language in the UI, but the mnemonic entered can be of any languages.

As from the above, it seems now that CHS and CHT is not an issue already, right? Rather, it is now English or French issue. And Yorick's proposal sounds great - only prompt when detecting valid mnemonic

@anupsv
Copy link

anupsv commented Sep 13, 2024

Is the user trying to recover the private key with another tool other than the deposit-cli? Because with ./deposit.sh existing-mnemonic, there is no option of Chinese Traditional (CHT), only Chinese Simplified (CHS): 12. 简 体中文

For mnemonic like: 的 的 的 的 的 的 的 的 的 的 的 在 where the mnemonic is valid for both languages CHS and CHT (edit: I assume it is valid on CHT too, haven't tested this as deposit-cli can't recover CHT), the resulting private key should be the same. I remember it is something like:

12 words --> 12 indices (12 numbers from 0-2047) --> using these 12 numbers to generate the keys

As @valefar-on-discord mentioned, each word corresponds to an index in the word list. I have also checked and can confirm that when CHS and CHT are the same word, the indices are the same. Therefore, the resulting key should be the same, because they essentially have the same 12 indices/numbers. If there is some tools to verify this, then we can make an assumption either way as @yorickdowne says.

Curious what tools did @anupsv used that encounter this issue?

To expand this a bit more further on what I tried, the for loop implementations could be more optimized and I had implemented that. But the test vectors started to fail and that's when i noticed the issue. WIth the current way of how the dict mapping is generated between the word and language, there are many cases where keys are overwritten when collisions happened which I encountered first with CHT and CHS primarily because that is the first test case in the mnemonic test vectors.

@valefar-on-discord
Copy link
Collaborator Author

To expand this a bit more further on what I tried, the for loop implementations could be more optimized and I had implemented that. But the test vectors started to fail and that's when i noticed the issue. WIth the current way of how the dict mapping is generated between the word and language, there are many cases where keys are overwritten when collisions happened which I encountered first with CHT and CHS primarily because that is the first test case in the mnemonic test vectors.

Yeah that's exactly what I am seeing as well and the problem is substantially more concerning with English/French. I think we have an agreed path forward on how to resolve this though it is going to be a bit of a lift.

Great catch and thanks for reporting it!

@anupsv
Copy link

anupsv commented Sep 13, 2024

To expand this a bit more further on what I tried, the for loop implementations could be more optimized and I had implemented that. But the test vectors started to fail and that's when i noticed the issue. WIth the current way of how the dict mapping is generated between the word and language, there are many cases where keys are overwritten when collisions happened which I encountered first with CHT and CHS primarily because that is the first test case in the mnemonic test vectors.

Yeah that's exactly what I am seeing as well and the problem is substantially more concerning with English/French. I think we have an agreed path forward on how to resolve this though it is going to be a bit of a lift.

Great catch and thanks for reporting it!

Perfect! Happy to help as well since i've been looking at this for a bit now. If someone is working on it already, thats cool too.

Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              33      0   100%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               59     11    81%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  41      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   31      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          2      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               93      7    92%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         54      3    94%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1489    184    88%

Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              33      0   100%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               59     11    81%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  41      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   31      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          2      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               93      6    94%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         54      3    94%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1489    183    88%

@valefar-on-discord
Copy link
Collaborator Author

Added the ability to prompt the user to select a language if multiple are valid for the same mnemonic.

I tried to manipulate the existing mnemonic logic as little as I could but this is a solution. I'm not 100% thrilled with it so would love some feedback on how this can be improved.

If we can get to a place where we are happy with the solution I'll wrap it in some unit tests.

But if you run python -m ethstaker_deposit existing-mnemonic with 的 的 的 的 的 的 的 的 的 的 的 在 you can see the flow.

I haven't done a thorough round of testing so might be some edge cases

Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              40      6    85%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               60     11    82%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  42      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   32      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          7      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               94      6    94%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         54      3    94%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/terminal.py                                     21      8    62%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1526    197    87%

@yorickdowne
Copy link

Everything removed in tests/test_key_handling/test_key_derivation/test_vectors/mnemonic.json has a conflict, I take it. When we test this multi-language, we should also test that we end up with the correct private key

A solution is good enough for me, tbh. I am not sure we are getting to "a very elegant / beautiful" solution - it's a somewhat messy problem by its nature.

@yorickdowne
Copy link

@anupsv and @chong-he , if you have time to test against this solution here, that'd be amazing. What should happen is that it prompts you for the source language of the mnemonic, for a mnemonic that creates a valid key in more than one language

@yorickdowne
Copy link

yorickdowne commented Sep 19, 2024

@valefar-on-discord , we should probably honor --mnemonic_language, and not prompt in that case ... as well as not try to detect and use what we are given, even if that means the mnemonic is then invalid, in case a user provided the wrong language.

@valefar-on-discord
Copy link
Collaborator Author

@valefar-on-discord , we should probably honor --mnemonic_language, and not prompt in that case ... as well as not try to detect and use what we are given, even if that means the mnemonic is then invalid, in case a user provided the wrong language.

--mnemonic_language currently isn't an argument in those situations. The only time that is an option is when a new mnemonic is created. Which reminds me this is still an issue when the user verifies the mnemonic in that flow 😭

@yorickdowne
Copy link

I am thinking about --non_interactive scripting, where a prompt is undesirable. We may want the --mnemonic_language here as an optional parameter, because we can’t always reliably detect the language.

@chong-he
Copy link

chong-he commented Sep 21, 2024

Did some testing about CHS and CHT mnemonic:

  1. For mnemonic where CHS and CHT characters are exactly the same, it prompts the user to select the language (1 for CHS and 2 for CHT) and then enter the mnemonic.
    The public keys generated for both mnemonic are the same
    Example: 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 性

  2. For mnemonic where CHS and CHT are different (but the same index in the word list), it wouldn't prompt the user to select the language since they are not the same characters.
    The public keys generated are different. I think this is why the issue surfaced in the first place, but I don't understand why, since the indices are the same?

    Anyway, I think most of the time a user that is using CHS will always use CHS characters, and likewise for CHT (i.e., people wouldn't convert characters from CHS to CHT just like that, it's not something that people usually do)
    So they will end up getting the same public key so long as they keep the characters right

For case 1, I think the language prompt is fine solution, it will only prompts when the mnemonic is valid for >1 language (which I think is not often the case)
Using prompt (assuming the detection works well) is easier imo, so that users don't have add an additional flag when entering the mnemonic (simpler and better UX)
Under what case can we not reliably detect the language?

@yorickdowne
Copy link

Thanks for helping us test, @chong-he !

We cannot reliably detect the language when:

  • Each word in the mnemonic is valid in two or more languages
  • The checksum (last word) is valid in two or more languages

For CHS and CHT the indices are the same, so it might be the same key regardless and arguably we don’t need to prompt. But then, why did we get a report - I am cautious whether it’s really the same key.

For English and French the indices are not the same. This should mean that again only one version is correct, as the other one will fail checksum. But, do we know that for certain?

Can I ask, in your testing: You are saying the pub key for CHS is different from the one in CHT, for the same mnemonic? If so, that’s a clear conflict where we need to prompt, exactly what the person originally reporting this was concerned about.

@valefar-on-discord
Copy link
Collaborator Author

I am thinking about --non_interactive scripting, where a prompt is undesirable. We may want the --mnemonic_language here as an optional parameter, because we can’t always reliably detect the language.

I never think of --non_interactive 😞

If we want to add the --mnemonic_language argument for that use case then I think a cleaner (code wise) experience would be to exit execution if multiple languages are detected and request the user to provide that argument instead of prompting.

@yorickdowne
Copy link

That might be cleaner code-wise but isn’t it better UX to prompt if we’re interactive? Overall the promise is that new-mnemonic and existing-mnemonic will work without any additional arguments and prompt for everything they need.

@valefar-on-discord
Copy link
Collaborator Author

That might be cleaner code-wise but isn’t it better UX to prompt if we’re interactive? Overall the promise is that new-mnemonic and existing-mnemonic will work without any additional arguments and prompt for everything they need.

I agree it is a better UX but I question the benefit with the additional code complexity for a use case that, as far as I can tell, is incredibly unlikely to trigger. I am still not sure if there is such a mnemonic with english and french that would result in different cryptographic values as the underlying mathematics behind it are beyond me.

I created an alternative version that is very similar to this PR but doesn't prioritize UX: a35d5fd

But if we do want to maximize UX then I can easily add the --mnemonic_language as an optional argument doing something similar to: valefar-on-discord@23a65b6

Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              40      6    85%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               60     11    82%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  42      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   32      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          7      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               94      6    94%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         54      3    94%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/terminal.py                                     26      8    69%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1531    197    87%

@chong-he
Copy link

Can I ask, in your testing: You are saying the pub key for CHS is different from the one in CHT, for the same mnemonic? If so, that’s a clear conflict where we need to prompt, exactly what the person originally reporting this was concerned about.

Example:
CHS: 缝 塘 诗 浆 遇 困 速 凉 介 西 搭 奖 友 传 吸 释 丧 歌 针 疫 迎 毕 击 盘
CHT: 縫 塘 詩 漿 遇 困 速 涼 介 西 搭 獎 友 傳 吸 釋 喪 歌 針 疫 迎 畢 擊 盤

Each word is the same, except that CHS that have their corresponding CHT characters are changed to the CHT characters
(Same indices word-to-word according to this BIP39 word list: https://github.com/bitcoin/bips/tree/master/bip-0039)

Both mnemonic are valid, but they should be considered different? Because the resulting public keys are different.
I also look up for their word indices, grab their corresponding English language words to form a mnemonic. The resulting mnemonic is valid too, but with different public keys.
This means that even though the 24 word indices are the same, the keys are different for different languages

The original report was, there is a bug with CHS and CHT where when CHS and CHT are having the same characters, the CHS one is being overwritten

So I am thinking, it should be the case that, when a mnemonic that have the exact same words and valid for 2 or more languages, such as:
的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 性

Selecting CHS or CHT during key generation/recovery should result in different keys. I think that's what is meant by the collision mentioned in the original report.

Because if the resulting keys are the same, there is no need to prompt/to have an additional flag (except for the possibility that the keys are different (e.g., English and French), even though the probability of this happening can be very low).

For now, the public keys generated by ``的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 性` when selecting CHS and CHT during key recovery are the same.

Should they be different in the first place? Because they are the same

Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              40      0   100%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               60     11    82%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  42      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   32      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          7      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               95      6    94%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         57      3    95%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/terminal.py                                     28     20    29%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1537    203    87%

Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              42      0   100%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               60     11    82%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  42      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   32      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          7      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               95      6    94%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         54      3    94%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/terminal.py                                     28     20    29%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1536    203    87%

Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              42      0   100%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               60     11    82%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  42      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   32      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          7      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               95      6    94%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         54      3    94%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/terminal.py                                     28     20    29%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1536    203    87%

Copy link
Member

@remyroy remyroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do some minor changes before merging this. Let me know what you think.

ethstaker_deposit/exceptions.py Show resolved Hide resolved
ethstaker_deposit/key_handling/key_derivation/mnemonic.py Outdated Show resolved Hide resolved
ethstaker_deposit/intl/en/cli/existing_mnemonic.json Outdated Show resolved Hide resolved
Copy link

Test Coverage: Download HTML Report

Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
ethstaker_deposit/__init__.py                                            1      0   100%
ethstaker_deposit/bls_to_execution_change_keystore.py                   33      2    94%
ethstaker_deposit/cli/__init__.py                                        0      0   100%
ethstaker_deposit/cli/existing_mnemonic.py                              42      0   100%
ethstaker_deposit/cli/exit_transaction_keystore.py                      41      2    95%
ethstaker_deposit/cli/exit_transaction_mnemonic.py                      63      8    87%
ethstaker_deposit/cli/generate_bls_to_execution_change.py               60     11    82%
ethstaker_deposit/cli/generate_bls_to_execution_change_keystore.py      42      2    95%
ethstaker_deposit/cli/generate_keys.py                                  42      2    95%
ethstaker_deposit/cli/new_mnemonic.py                                   32      0   100%
ethstaker_deposit/cli/partial_deposit.py                                62      5    92%
ethstaker_deposit/cli/test_keystore.py                                  20      0   100%
ethstaker_deposit/credentials.py                                       197     68    65%
ethstaker_deposit/deposit.py                                            58     10    83%
ethstaker_deposit/exceptions.py                                          7      0   100%
ethstaker_deposit/key_handling/__init__.py                               0      0   100%
ethstaker_deposit/key_handling/key_derivation/__init__.py                0      0   100%
ethstaker_deposit/key_handling/key_derivation/mnemonic.py               95      6    94%
ethstaker_deposit/key_handling/key_derivation/path.py                   17      1    94%
ethstaker_deposit/key_handling/key_derivation/tree.py                   36      0   100%
ethstaker_deposit/key_handling/keystore.py                             100      0   100%
ethstaker_deposit/settings.py                                           26      0   100%
ethstaker_deposit/utils/__init__.py                                      0      0   100%
ethstaker_deposit/utils/ascii_art.py                                     2      0   100%
ethstaker_deposit/utils/click.py                                        72      3    96%
ethstaker_deposit/utils/config.py                                        3      0   100%
ethstaker_deposit/utils/constants.py                                    29      0   100%
ethstaker_deposit/utils/crypto.py                                       29      1    97%
ethstaker_deposit/utils/deposit.py                                       9      0   100%
ethstaker_deposit/utils/exit_transaction.py                             24      0   100%
ethstaker_deposit/utils/file_handling.py                                 8      0   100%
ethstaker_deposit/utils/intl.py                                         54      3    94%
ethstaker_deposit/utils/ssz.py                                          60      7    88%
ethstaker_deposit/utils/terminal.py                                     28     20    29%
ethstaker_deposit/utils/validation.py                                  244     52    79%
----------------------------------------------------------------------------------------
TOTAL                                                                 1536    203    87%

Copy link
Member

@remyroy remyroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is fine. I'm merging.

@remyroy remyroy merged commit 820ea0a into eth-educators:main Sep 27, 2024
26 checks passed
remyroy pushed a commit to remyroy/ethstaker-deposit-cli that referenced this pull request Sep 27, 2024
* Updating mnemonic language determination

* Removing duplicate mnemonic values

* Adding test for multi-language mnemonics

* Prompt user for language selection on MultiLanguageError

* Adding optional mnemonic_language argument

* Adding tests for determine_mnemonic_language

* Resolving UX concerns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Common characters being overridden in chinese simplified and chinese Traditional word lists
5 participants