PersonaClassifier

This repository contains the source code and trained model for RoBERTa finetuned on DNLI dataset.

The repository is developed based on D3.

Setup & Installation (TL;DR)

Environmental requirements

Note: The script below may not be sufficient and missing packages need to be configured manually.

python == 3.7.0
torch==1.5.0
transformers==3.1.0
spacy==2.2.4
fairseq==0.9.0 (I downloaded the source code into the root directory)
sentencepiece==0.1.94

Pipeline details

1. Prepare models

Just download the finetuned NLI model and put it to ./persona_nli .

Note: This model is a RoBERTa large MNLI model finetuned on the DialogueNLI dataset.

2. Data Preparation

Evaluating persona consistency

See example data in ./data/consistency_calculation

Predicting persona label

See example data in ./data/persona_labeling

3. Evaluating persona consistency

Goal: To get the confident score of a certain class.

bash consistency_pipeline.sh
or
python cal_consistency_score.py

4. Predicting persona label

Goal: To get the class with the highest confident score.

bash persona_label_pipeline.sh
or
bash cal_persona_label.py --params...
bash get_persona_labeled_dataset.py --params...

5. Counting the persona label

python count_label.py output_file

Some counting results

Interestingly, I found that the model is quite sure that 50% responses don't use any persona as its predicted class distribution is 'sharp'---- the probability of the predicted class is more than an order of magnitude larger than the other two classes.

(D3) bash-4.2$ python count_label.py predictions/test/output-wo-th
[3979, 925, 849, 746, 698, 315]
[0.53, 0.12, 0.11, 0.099, 0.092, 0.042]

(D3) bash-4.2$ python count_label.py predictions/train/output-wo-th
[33159, 8914, 7741, 6983, 6224, 2698]
[0.50, 0.134, 0.118, 0.106, 0.095, 0.041]

(D3) bash-4.2$ python count_label.py predictions/valid/output-wo-th
[3818, 1129, 958, 821, 752, 323]
[0.49, 0.14, 0.12, 0.11, 0.096, 0.041]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PersonaClassifier

Setup & Installation (TL;DR)

Environmental requirements

Pipeline details

1. Prepare models

2. Data Preparation

Evaluating persona consistency

Predicting persona label

3. Evaluating persona consistency

4. Predicting persona label

5. Counting the persona label

Some counting results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
predictions		predictions
README.md		README.md
cal_consistency_score.py		cal_consistency_score.py
cal_persona_label.py		cal_persona_label.py
consistency_pipeline.sh		consistency_pipeline.sh
count_label.py		count_label.py
env.sh		env.sh
get_persona_labeled_dataset.py		get_persona_labeled_dataset.py
persona_label_pipeline.sh		persona_label_pipeline.sh

ChanLiang/PersonaClassifier

Folders and files

Latest commit

History

Repository files navigation

PersonaClassifier

Setup & Installation (TL;DR)

Environmental requirements

Pipeline details

1. Prepare models

2. Data Preparation

Evaluating persona consistency

Predicting persona label

3. Evaluating persona consistency

4. Predicting persona label

5. Counting the persona label

Some counting results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages