-
Notifications
You must be signed in to change notification settings - Fork 23
/
README
44 lines (40 loc) · 1.79 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
This repo is about machine learning.
TOOLS:
----->roc:roc.py
get the point of roc
----->model_evaluate.py
report the precision, recall and accuarcy.
Classification:
----->DecisionTree : decision_tree.py
http://en.wikipedia.org/wiki/Decision_tree
For detail about how to build a decision tree.
also, if you can read chinese, this aritcal may be helpful to you.
http://zengkui.blog.163.com/blog/static/2123000822012103113739819/
----->LogisticRegression
More information with chinese
http://zengkui.blog.163.com/blog/static/21230008220127111040747/
----->LinearRegression
More detail information with chinese
http://zengkui.blog.163.com/blog/static/21230008220127411250679/
----->K-NN classification algorithm.
Data is represented by vsm and the features in vsm are the words
selected by max info gain. The distance between samples is measured
by cosine.
----->Naive Bayes
Naive Bayes classification algorithm.
Bayes formular : P(C|w) = P(C,w) / P(w) = P(w|C) * P(C) / P(w)
C : the category of articel.
w : the words in articel
Suppose that the probability of occurrence of a word in an article is
independent. We can classify the article by the following formular:
P(C_i|w_1,w_2...w_n) = P(w_1,w_2...w_n|C_i) * P(C_i) / P(w_1,w_2...w_n)
= P(w_1|C_i) * P(w_2|C_i)...P(w_n|C_i) * P(C_i) / (P(w_1) * P(w_2) ...P(w_n))
----->Adaboost
Adaboost classification algorithm.
you can get more detail info with the following link:
http://en.wikipedia.org/wiki/AdaBoost
http://zengkui.blog.163.com/blog/static/21230008220121110111925175/
Cluster:
----->K-means
More detail information with chinese
http://zengkui.blog.163.com/blog/static/2123000822012101784440471/