Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修复第15章随机森林准确率太低的bug: #17

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

WhizZest
Copy link

@WhizZest WhizZest commented Sep 23, 2024

修复bug:

  1. 特征抽样改为“无放回”抽样;
  2. 本系列代码经常把“损失”和“增益”搞混,cart决策树用“损失(基尼不纯度)”来选择最优特征和分裂点,RandomForest却用“增益”,RandomForest类中把min_gain初始化为0将导致cart决策树几乎无法训练,所以这里改为min_gain=float("inf");
  3. 缺失utils.py文件,从第11章拷贝过来;
  4. 保持cart.py文件内容与前两次提交的决策树修改内容一致(第7章决策树和第11章GBDT);

优化:

  1. 使用多进程并行优化训练过程,否则训练太慢了,不便于测试,而且并行训练本就是随机森林的特点;
  2. 与sklearn.RandomForestClassifier对比准确率时,sklearn的参数与我们自定义类的参数值不一致,不利于比较两者的区别,所以给sklearn补充2个参数;

经过上述修改后,准确率与sklearn基本一致

1. 特征抽样改为“无放回”抽样;
2. 本系列代码经常把“损失”和“增益”搞混,cart决策树用“损失(基尼不纯度)”来选择最优特征和分裂点,RandomForest却用“增益”,RandomForest类中把min_gain初始化为0将导致cart决策树几乎无法训练,所以这里改为min_gain=float("inf");
3. 缺失utils.py文件,从第11章拷贝过来;
4. 保持cart.py文件内容与前两次提交的决策树修改内容一致(第7章决策树和第11章GBDT);
# 优化:
1. 使用多进程并行优化训练过程,否则训练太慢了,不便于调试,而且并行训练是随机森林的特点;
2. 与sklearn.RandomForestClassifier对比准确率时,sklearn的参数与我们自定义的参数不一致,不利于比较两者的区别,所以给sklearn补充2个参数;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant