Multi-intent-dialoguer

Main Research:

A new framework for spoken language understanding in task-oriented dialgoue systems to approach the naturalness (tackle irregularities). This research project mainly focuses on the motivations of understanding human utterances and machine interactions, then forming knowledge abstraction for downstream policy learning tasks.

Mainly given a dialogue turn, it will return the updated slot-values and satisfaction level.
It has the following features & aims:

NLU:
1. Surface Model:
  
  a. Multi-intent clustering framework
  b. intent+slot-filling
2. Pertinence Model:
  
  a. Evaluation framework
DST:
1. State Tracker

TODOs

Data: Redefine labels in dialogue dataset (some are not clear).
Fine-tune: Train Bert model with single sentence datasets and apply to dialogue datasets for clustering.
Surface: Check dcec convergence.
Surface: Attention words with masking mechanism.
Surface: Fewer labels to train as possible.

1. Data Processing

Data preprocessing pipeline.

Associated files and folders

data/train_data.py
data/dialogue_data.py

Go to config.py to select data type
Run the following to generate raw_data.pkl for the following use.
Single sentence: ATIS/Semantic parsing dataset
```
 python data/train_data.py
```
Dialogue: MultiWOZ2.1 dataset
```
 python data/dialogue_data.py
```

2. Feature Extractor

To extract contextualized representations, the service fine-tune BERT model to generate the pretrained sentence embeddings.

Associated files and folders

bert_finetune.py
bert_nsp.py
finetune_results/
checkpoints/

1) Training

To train single sentence dataset: atis/top semantic

python bert_finetune.py train --datatype=atis
python bert_finetune.py train --datatype=semantic

To train MultiWOZ dataset with next-sentence prediction:

python bert_nsp.py train

2) Testing

There are three modes for testing the atis/top semantic BERT embeddings:

python bert_finetune.py test --mode=[mode_type]

mode_type:

embedding:
For this mode, it will generate and store the sentence embeddings for every training data
data:
For this mode, it will do original text classification based on the dataset
user:
For this mode, you can type in any kind of sentence and it will classify a specific label for the corresponding sentence.

To extract BERT embeddings on single sentence level from dialogue dataset:

python bert_nsp.py test --mode=embedding

For this mode, it will generate and store the sentence embeddings for every training data

3. Surface Model

After obtaining embeddings, we could use them for surface model:

please check here for more details:

intent clustering
intent+slot-filling

4. Pertinence Model

After obtaining embeddings, we could use them for pertinence model.

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
.vscode		.vscode
checkpoints-dcec		checkpoints-dcec
clustering_labels		clustering_labels
clustering_results		clustering_results
data		data
error_analysis		error_analysis
model		model
papers		papers
raw_datasets		raw_datasets
zst		zst
.DS_Store		.DS_Store
.gitignore		.gitignore
Processing.ipynb		Processing.ipynb
README.md		README.md
all_data.py		all_data.py
bert_finetune.py		bert_finetune.py
bert_nsp.py		bert_nsp.py
bert_zsl.py		bert_zsl.py
clustering.py		clustering.py
config.py		config.py
default-3e770755-7719-41fc-8257-a62a8c87bd3b.ipynb		default-3e770755-7719-41fc-8257-a62a8c87bd3b.ipynb
default-767f722f-a171-4be9-b1ab-79f953b385b5.ipynb		default-767f722f-a171-4be9-b1ab-79f953b385b5.ipynb
default-8aad0d57-4a06-414d-b3a1-6a40d3fd845c.ipynb		default-8aad0d57-4a06-414d-b3a1-6a40d3fd845c.ipynb
domain_analysis-1555f3e6-ff6d-4bb5-845a-6d3b7b4d52df.ipynb		domain_analysis-1555f3e6-ff6d-4bb5-845a-6d3b7b4d52df.ipynb
nlu.md		nlu.md
perform_dcec.py		perform_dcec.py
perform_dcec_dialogue.py		perform_dcec_dialogue.py
requirements.txt		requirements.txt
sentence_clustering.py		sentence_clustering.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-intent-dialoguer

TODOs

1. Data Processing

2. Feature Extractor

1) Training

2) Testing

3. Surface Model

4. Pertinence Model

About

Releases

Packages

Languages

waynewu6250/Multi-intent-dialoguer

Folders and files

Latest commit

History

Repository files navigation

Multi-intent-dialoguer

TODOs

1. Data Processing

2. Feature Extractor

1) Training

2) Testing

3. Surface Model

4. Pertinence Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages