BIP39 Validator is a small program for checking BIP39 wordlists for Latin languages. It supports checking wordlists for semantic errors and implements three different tests:
- A minimum Levenshtein distance test
- A minimum unique prefix length
- A maximum length test
It also has a Python API for running each test programmatically and interactively exploring the results.
Contents
BIP39 Validator checks that wordlists use the best practices written in the BIP39 standard. These checks are ones that maintainers frequently ask submitters for compliance before merging the wordlist. By using this tool, you avoid having to manually verify the technical rules of the list.
Note that there is no support for validating with rules such as "Words cannot sound too similar" or "Wordlists cannot contain words from any other languages' wordslists". There is also no support for wordlists in non-Latin languages such as Arabic, Hebrew or CJK languages.
You can install BIP39 Validator either from PyPI or directly from source on Github.
To install from PyPI:
pip3 install bip39validator
Alternatively, to install BIP39 Validator from source, head over to the Releases page, and download the version you want to install. Unzip the package, change into the newly created directory and then run:
python3 setup.py install
You invoke BIP39 Validator like this:
bip39validator [OPTIONS] {INPUT_FILE | URL_OF_TEXT_FILE}
One, and only one of INPUT_FILE and URL_OF_TEXT_FILE should be specified, where INPUT_FILE is a file in your local filesystem, while URL_OF_TEXT_FILE is an HTTP or HTTPS URL pointing to the wordlist file with a mimetype of text/plain. In both cases, the input must be a plain text file.
BIP39 Validator displays rich formatted status messages as it progresses with validation, however it is also possible to run BIP39 Validator with minimum diagnostic messages, or to log status messages to a file. The complete list of command-line arguments is below:
Option | Description |
---|---|
-d, --min-levenshtein-distance | set the minimum required Levenshtein distance between words (default: 2) |
-u, --min-initial-unique | set the minimum required unique initial characters between words (default: 4) |
-l, --max-length | set the maximum length of each word (default: 8) |
-D, --no-levenshtein-distance | do not run the Levenshtein distance test |
-U, --no-initial-unique | do not run the unique initial characters test |
-L, --no-max-length | do not run the maximum length test |
-o <FILE>, --output-file <FILE> | log all console output to an additional file |
-a, --ascii | turn off rich text formatting and progress bars for console output |
-q, --quiet | do not display details of test failures, only whether they succeeded or failed |
--nosane | Suppress wordlist sanity check. This might cause other tests to fail. |
-v, --version | print the version number and exit |
BIP39 Validator displays which validation tests succeeded and the total number of tests that succeeded.
BIP39 Validator comes with a powerful API for querying the result of validation tests. The most basic class provided is BIP39WordList. It is responsible for creating a word list object from a file, string buffer or even a URL. BIP39WordList objects are immutable and words can't be changed, added or removed from the object one they are loaded. To alter the wordlist, you'd need to change it on file and then create a BIP39WordList from it again.
When a test fails, it throws a ValidationFailed exception. This contains a member called status_obj that contains a class with diagnostic information about the test that threw the exception. This object is also returned by the validation test if it succceeds, but the reason there are two different ways to capture the test state is because it's most common for users to look at the state only if a test fails.
Here are some of the anticipated uses of the BIP39 Validator API.
- Validate that Levenshtein distances >= 2, then find all the word pairs with Levenshtein distance less than 2:
from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed
f = open('wordlist-en.txt')
try:
wordlist = BIP39Wordlist('English wordlist', handle=f)
wordlist.test_lev_distance(2)
# At this point, no word pairs have Levenshtein distance < 2.
except ValidationFailed as e:
dists = e.status_obj.getwordpairs_lt(2)
for wordpair in dists:
word1 = wordpair[0]
word2 = wordpair[1]
# Do something with word1 and word2...
except InvalidWordList as e:
print("Wordlist file is not well-formed")
- Validate that Levenshtein distances >= 2, then calculate the number and percentage of word pairs with Levenshtein distance less than 2 (assume 2048-word list):
from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed
f = open('wordlist-en.txt')
try:
wordlist = BIP39Wordlist('English wordlist', handle=f)
wordlist.test_lev_distance(2)
# At this point, the percentage and number of
# words fulfilling the condition are 0.
except ValidationFailed as e:
dists = e.status_obj.getwordpairs_lt(2)
n = len(dists)
prct = n/(2048*2048)
except InvalidWordList as e:
print("Wordlist file is not well-formed")
- Validate that words are unique in at least 4 initial characters, then find all the words beginning with "str" (prefix-3 group "str"):
from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed
f = open('wordlist-en.txt')
try:
wordlist = BIP39Wordlist('English wordlist', handle=f)
wordlist.test_initial_chars(4)
# At this point, all words are unique in at least 4 initial characters
except ValidationFailed as e:
words = e.status_obj.similar_wordgroup("str")
for word in words:
# Do something with word...
except InvalidWordList as e:
print("Wordlist file is not well-formed")
- Validate that words are unique in at least 4 initial characters, then calculate the number and percentage of word prefix-4 groups with at least two words in them:
from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed
f = open('wordlist-en.txt')
try:
wordlist = BIP39Wordlist('English wordlist', handle=f)
wordlist.test_initial_chars(4)
# At this point, the percentage and number of
# words fulfilling the condition are 0.
except ValidationFailed as e:
groups = e.status_obj.similar_wordgroup_all(4)
n = sum([c for c in groups.values() if len(c) >= 2])
denom = len(groups.values())
perc = n/denom
except InvalidWordList as e:
print("Wordlist file is not well-formed")
- Validate that words are no longer than 8 characters, then find all of the words longer than 8 characters:
from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed
f = open('wordlist-en.txt')
try:
wordlist = BIP39Wordlist('English wordlist', handle=f)
wordlist.test_max_length(8)
# At this point, all words are no longer than 8 characters
except ValidationFailed as e:
words = e.status_obj.getwords_gt(8)
lines = e.status_obj.getlines_gt(8)
for word, line in [*zip(words, lines)]:
# Do something with word and line...
except InvalidWordList as e:
print("Wordlist file is not well-formed")
- Validate that words are no longer than 8 characters, then calculate the number and percentage of words longer than 8 characters:
from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed
f = open('wordlist-en.txt')
try:
wordlist = BIP39Wordlist('English wordlist', handle=f)
wordlist.test_max_length(8)
# At this point, the percentage and number of
# words fulfilling the condition are 0.
except ValidationFailed as e:
words = e.status_obj.getwords_gt(8)
n = sum([w for w in words if len(w) > 8])
perc = n/len(words)
except InvalidWordList as e:
print("Wordlist file is not well-formed")
First, clone the master branch of this repository, and then make a new virtualenv:
python3 -m venv env-bip39validator
source env-bip39validator/bin/activate
Then install the module dependencies using:
pip3 install -r requirements.txt -r dev-requirements.txt
See CONTRIBUTING.md for details on how to contribute issues and pull requests to this project.
BIP39 Validator is provided under the MIT license that can be found in the LICENSE file. By using, distributing, or contributing to this project, you agree to the terms and conditions of this license.