Skip to content

Kadam-Tushar/Issue-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Issue-Classifier

Our submission for NLBSE 2022 in the tool competition track. (Humble flex - We came 2nd! 😎)

Tool Paper Abstract

Recent innovations in natural language processing techniques have led to the development of various tools for assisting software de- velopers. This paper provides a report of our proposed solution to the issue report classification task from the NL-Based Software Engineering workshop. We approach the task of classifying issues on GitHub repositories using BERT-style models. We propose a neural architecture for the problem that utilizes contextual embeddings for the text content in the GitHub issues. Besides, we design additional features for the classification task. We perform a thorough ablation analysis of the designed features and benchmark various BERT-style models for generating textual embeddings. Our proposed solution performs better than the competition organizer’s method and achieves an 𝐹1 score of 0.8653 (Approx 5% increase).

Model Architecture

Setup

  1. Install requirements with
conda env create -f environment.yml
  1. Download data using the Bash script data/get_data.sh
cd data && ./get_data.sh

Training

To train the model in the paper, run this command:

python src/train.py --DATASET_SUFFIX _dropfeature --MODEL_NAME roberta --EMB_MODEL_CHECKPOINT roberta-base --device gpu

Use --device cpu if you do not have access to a GPU.

Predictions

  1. Download the trained RoBERTA model from Google Drive and put it in ./data/save/ directory.

  2. To generate results on the test data, run:

python src/evaluate.py --DATASET_SUFFIX _dropfeature --MODEL_NAME roberta --EMB_MODEL_CHECKPOINT roberta-base --device gpu

This assumes that the trained model is present in /data/save/ directory.