Pytorch-Chinese-MultilLabel-Classification

Codes for the project

Requirements

技术路线

/Basic   作为 baseline 的 bert 和 Albert 的实现
/DistillBert 对于 Basic 中模型的蒸馏
/Preprocessor 前处理
/Postprocessor 后处理

模型

Roberta_wwm
Albert_zn
TinyBert
FastBert
BiGRU

模型说明

基础架构

特征向量生成：由原始特征经过特征统计生成输入层：其中 X 表示节点类别，P 表示回答的内容，将两者进行拼接作为最终输入。 Transformer编码层：按照 Bert 和 albert 会有不同输出层：一层全连接网络得到打分输出分类结果。

文本特征挖掘

将每一类的类别信息词和文本信息进行拼接后再进行分类，如 [类别词][SEP][原始文本]

BiGRU模型架构

词的embedding层是直接从teacher model上复制过来的，加快网络收敛速度。网络主干部分采用双向GRU。输出层部分是一个两层全连接网络，输入可以是GRU的输出也可以是GRU最后一个时间步的隐藏单元，分别对应Model A 和B。采用Model B时，因为输入词长短不一，为了得到真正的最后一个字输出的隐藏单元，需要利用for循环进行单时间步的训练，所以速度会慢一些。Model A 和B准确率都能达到90%。损失函数时MSE和交叉熵的结合，分别拟合teacher model的logits 和真正的标签信息。

效果对比

参考文献

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
RoBERTa: A Robustly Optimized BERT Pretraining Approach
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
TinyBERT: Distilling BERT for Natural Language Understanding
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Transformer to CNN: Label-scarce distillation for efficient text classification
Distilling task-specific knowledge from bert into simple neural networks
Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pytorch-Chinese-MultilLabel-Classification

目录

Requirements

技术路线

模型

模型说明

基础架构

文本特征挖掘

BiGRU模型架构

效果对比

参考文献

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pytorch-Chinese-MultilLabel-Classification

目录

Requirements

技术路线

模型

模型说明

基础架构

文本特征挖掘

BiGRU模型架构

效果对比

参考文献