Source code classification
- Model: Seq2Vec, Seq2Seq + Seq2Vec
- Embedding: token and token type
- Pretrained embedding: FastText
- CRF: Conditional Random Field
- Configurations
- token_embedding_size: 50, 100, 150
- type_embedding_size: 10
- pretrained_embedding: FastText, Word2Vec, GloVe
- experiment:
- CNN without highway
- CNN with highway
Install dependencies
# clone project
git clone https://github.com/liangshb/source-code-classificaton sccls
cd sccls
# [OPTIONAL] create conda environment
conda create -n sccls python=3.8
conda activate sccls
# install pytorch according to instructions
# https://pytorch.org/get-started/
# example: conda install pytorch cudatoolkit=11.3 -c pytorch
# install requirements
pip install -r requirements.txt