Skip to content

liangshb/source-code-classificaton

Repository files navigation

Source Code Classification

Description

Source code classification

Preprocess

Datasets

Models

Innovation Points

  1. Model: Seq2Vec, Seq2Seq + Seq2Vec
  2. Embedding: token and token type
  3. Pretrained embedding: FastText
  4. CRF: Conditional Random Field
  5. Configurations
    1. token_embedding_size: 50, 100, 150
    2. type_embedding_size: 10
    3. pretrained_embedding: FastText, Word2Vec, GloVe
    4. experiment:
      1. CNN without highway
      2. CNN with highway

How to run

Install dependencies

# clone project
git clone https://github.com/liangshb/source-code-classificaton sccls
cd sccls

# [OPTIONAL] create conda environment
conda create -n sccls python=3.8
conda activate sccls

# install pytorch according to instructions
# https://pytorch.org/get-started/
# example: conda install pytorch cudatoolkit=11.3 -c pytorch

# install requirements
pip install -r requirements.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages