The goal of this package is to be able run several distinct NLP related algorithms in parallel either within users own projects or through a provided CLI, currently a wrapper for the BERTopic topic modeling package. There is also infrastructure to continue adding new features to said CLI as well as use existing components of the CLI within users own projects.
- Make sure that you have python 3.9 installed
git clone https://github.com/Jayman391/lnlp.git
python3.9 -m venv venv
source venv/bin/activate
python -m pip install -r requirements.txt
python -m spacy download en
python main.py
for now only the topic modeling section is functional
for data sets under ~5000 documents, you might also need to rerun the script a few times to get a good partition of the data, as sometimes the clustering algorithm gets stuck in a local optimum which has only two clusters
there are also some runtime errors that occur somewhat regularly, another cause to rerun the script. There are some bugs already open on the issues page in the repo
python main.py --data=tests/test_data/usa-vaccine-comments.csv
python main.py --data=tests/test_data/usa-vaccine-comments.csv --save_dir=output
python main.py --data=tests/test_data/usa-vaccine-comments.csv --num_samples=1000
python main.py --data=tests/test_data/usa-vaccine-comments.csv --sequence='1,11,21,31,41,9'
python main.py --save_dir=output --data=tests/test_data/usa-vaccine-comments.csv --num_samples=1000 --sequence='1,11,21,31,41,9'
Your contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make LNLP better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.