Issue-Classifier

Our submission for NLBSE 2022 in the tool competition track. (Humble flex - We came 2nd! 😎)

Tool Paper Abstract

Recent innovations in natural language processing techniques have led to the development of various tools for assisting software de- velopers. This paper provides a report of our proposed solution to the issue report classification task from the NL-Based Software Engineering workshop. We approach the task of classifying issues on GitHub repositories using BERT-style models. We propose a neural architecture for the problem that utilizes contextual embeddings for the text content in the GitHub issues. Besides, we design additional features for the classification task. We perform a thorough ablation analysis of the designed features and benchmark various BERT-style models for generating textual embeddings. Our proposed solution performs better than the competition organizer’s method and achieves an 𝐹1 score of 0.8653 (Approx 5% increase).

Model Architecture

Setup

Install requirements with

conda env create -f environment.yml

Download data using the Bash script data/get_data.sh

cd data && ./get_data.sh

Training

To train the model in the paper, run this command:

python src/train.py --DATASET_SUFFIX _dropfeature --MODEL_NAME roberta --EMB_MODEL_CHECKPOINT roberta-base --device gpu

Use --device cpu if you do not have access to a GPU.

Predictions

Download the trained RoBERTA model from Google Drive and put it in ./data/save/ directory.
To generate results on the test data, run:

python src/evaluate.py --DATASET_SUFFIX _dropfeature --MODEL_NAME roberta --EMB_MODEL_CHECKPOINT roberta-base --device gpu

This assumes that the trained model is present in /data/save/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arch.png		arch.png
environment.yaml		environment.yaml
results.txt		results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Issue-Classifier

Tool Paper Abstract

Model Architecture

Setup

Training

Predictions

About

Releases

Packages

Contributors 2

Languages

License

Kadam-Tushar/Issue-Classifier

Folders and files

Latest commit

History

Repository files navigation

Issue-Classifier

Tool Paper Abstract

Model Architecture

Setup

Training

Predictions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages