Skip to content

Guide to Scripts

Nina Gial edited this page Mar 1, 2024 · 1 revision

Guide to Scripts | Οδηγός για προγράμματα επεξεργασίας

This guide covers the contents of the scripts/ directory.

Αυτός ο οδηγός καλύπτει τα περιεχόμενα του φακέλου scripts/ μόνο.

File Exlanation
(Folder) conversion/
multiple_xml_files.py Use to deal with multiple xml files nested in directory, format common with OPUS
pickle_to_sql.py Pickle file to SQLITE (words) - use for tokenizer
pickle_to_sql_sentences.py Pickle file to SQLITE (sentences) - use for RoBERTa, etc
(Folder) lda/
lda_post.py Run this to prepare LDA
lda_pre.py Run this to visualize LDA *consider Jupyter notebooks
(Folder) training/
Dockerfile For docker build as container
requirements.txt Python requirements for training
script.py Command line script; use python script.py --help for parameters
(Folder) tokenizer/
train_bpe.py Train tokenizer with Byte Pair Encoding
train_bpe_thread.py Train tokenizer with Byte Pair Encoding