Text layer correctness experiments

In this repository you can run experiments with all methods described in paper.

Requirements

This repository requires python==3.9
You can create virtual environment with requirements.txt

In order to use RuBert you need to install torch and torchvision with versions that suit your GPU and cuda.

Dataset

Synthetic dataset for training and benchmark dataset will download automatically when running main.py.
All data will be stored in a ./data folder that will also be created automatically.

Experiments

You can run experiments with XGBoost, Random Forest, Logistic Regression, N-Gram, Rubert with following command:

python main.py

By default, it runs experiments with all methods, except RuBert, using TF-IDF feature extractor

You can select models for experiments by changing the corresponding list models in main.py
You can also select feature extractor for experiments by changing the value of final_feature_extractor in main.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
models		models
README.md		README.md
abstract_feature_extractor.py		abstract_feature_extractor.py
abstract_model.py		abstract_model.py
bert.py		bert.py
custom_feature_extractor.py		custom_feature_extractor.py
dataset.py		dataset.py
forest.py		forest.py
logreg.py		logreg.py
main.py		main.py
n_gram.py		n_gram.py
requirements.txt		requirements.txt
tf_idf.py		tf_idf.py
xgb.py		xgb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text layer correctness experiments

Requirements

Dataset

Experiments

About

Releases

Packages

Languages

alexander1999-hub/txt_layer_correctness

Folders and files

Latest commit

History

Repository files navigation

Text layer correctness experiments

Requirements

Dataset

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages