OL4TeX: Adaptive Online Learning for Text Classification under Distribution Shifts

Data Preparation

Datasets are used in the following file structure:

│adaptive-model-update/
├──data/
│  ├── cybersecurity
│  │   ├── cybersecurity_source.csv
│  │   ├── cybersecurity_target.csv
│  ├── disaster
│  │   ├── disaster_source.csv
│  │   ├── disaster_target.csv
│  ├── review
│  │   ├── hotel_review.csv

cs_source.csv: You can download it from: here
cs_target.csv: You can download it from: here
disaster_source.csv: You can download it from: here
disaster_target.csv: Please refer to emergency.csv file.
hotel_review.csv: You can download it from: here

Setups

All code was developed and tested on Nvidia RTX A4000 (48SMs, 16GB) the following environment.

Ubuntu 18.04
python 3.6.9
gensim 3.8.3
keras 2.6.0
numpy 1.19.5
pandas 1.1.5
tensorflow 2.6.2

Implementation

To pre-train the model, run the following script using command line:

sh run_pretrain_offline.sh

To adapt the model online, run the following script using command line:

sh run_update_online.sh

Hyperparameters

The following options can be passed to main.py

-dataset: Name of the dataset. (Supported names are cybersecurity, disaster, review)
-model: Neural architecture of the OnlineAdaptor. (Supported models are CNN, LSTM, Transformer)
-adjust_weight: Relative importance between learning efficiency and accuracy. Default is 0.5.
-epochs: Epochs for training model. Deault is 20.
-event_size: Size of streaming batches.
-batch_size: Size of batch to train the model.
-keyword_size: Size of keyword set to calculate the frequency indicator.
-embedding_size: Size of embedding layer.
-output_path: Path for the output results.
-token_path: Path for saving and loading tokenizer.
-model_path: Path for saving and loading machine learning-based OnlineAdaptor.
-ml_path: Path for saving and loading machine learning-based AccPredictor.
-pretrain: Execute the model pre-training in offline.
-update: Execute the model update in online.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
custom_tokenizer.py		custom_tokenizer.py
emergency.csv		emergency.csv
main.py		main.py
models.py		models.py
nlp_utils.py		nlp_utils.py
requirements.txt		requirements.txt
run_pretrain_offline.sh		run_pretrain_offline.sh
run_update_online.sh		run_update_online.sh
strategies.py		strategies.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OL4TeX: Adaptive Online Learning for Text Classification under Distribution Shifts

Data Preparation

Setups

Implementation

Hyperparameters

About

Releases

Packages

Languages

License

bigbases/online-learning-text

Folders and files

Latest commit

History

Repository files navigation

OL4TeX: Adaptive Online Learning for Text Classification under Distribution Shifts

Data Preparation

Setups

Implementation

Hyperparameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages