-
Notifications
You must be signed in to change notification settings - Fork 7
Guide to Scripts
Nina Gial edited this page Mar 1, 2024
·
1 revision
This guide covers the contents of the scripts/
directory.
Αυτός ο οδηγός καλύπτει τα περιεχόμενα του φακέλου scripts/
μόνο.
File | Exlanation |
---|---|
(Folder) conversion/ | |
multiple_xml_files.py | Use to deal with multiple xml files nested in directory, format common with OPUS |
pickle_to_sql.py | Pickle file to SQLITE (words) - use for tokenizer |
pickle_to_sql_sentences.py | Pickle file to SQLITE (sentences) - use for RoBERTa, etc |
(Folder) lda/ | |
lda_post.py | Run this to prepare LDA |
lda_pre.py | Run this to visualize LDA *consider Jupyter notebooks |
(Folder) training/ | |
Dockerfile | For docker build as container |
requirements.txt | Python requirements for training |
script.py | Command line script; use python script.py --help for parameters |
(Folder) tokenizer/ | |
train_bpe.py | Train tokenizer with Byte Pair Encoding |
train_bpe_thread.py | Train tokenizer with Byte Pair Encoding |