Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

预测的准确性问题 #23

Open
ZBayes opened this issue Jul 9, 2019 · 0 comments
Open

预测的准确性问题 #23

ZBayes opened this issue Jul 9, 2019 · 0 comments

Comments

@ZBayes
Copy link

ZBayes commented Jul 9, 2019

中文版:
python重现了该代码:
https://gitee.com/chashaozgr/noteLibrary/tree/master/nlp_trial/ner/src/bilstm_crf

用的人民日报的数据,python3,tensorflow==1.12

准确率确如readme所示,但是从混淆矩阵看来,由于用了padding的方法进行了预测,所以实际为0类的量(即补充部分)远比其他类多,导致样本标签不均衡,所以准确性不可信,85%+的准确性大部分来源于0类分给0类,如果缩短padding长度,precision会迅速下降。

看看大家有没有什么对策。

English version:
I reproduced the code here:
https://gitee.com/chashaozgr/noteLibrary/tree/master/nlp_trial/ner/src/bilstm_crf

Data source People’s Daily
Environment: python3.6, tensorflow==1.12

The accuracy tested is the same as shown in the readme, but according to the confusion matrix, it does not well as expected. Since the padding method is used for prediction, the number of samples with class 0, the supplementary part is much more than the other classes, resulting in unbalanced sample tags. Therefore, the accuracy is not credible. Moreover, if the padding length is shortened, the precision will drop sharply.

Let's talk about the solutions and new ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant