Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 552 Bytes

README.md

File metadata and controls

11 lines (8 loc) · 552 Bytes

Document classification

Classify documents using Python based on SVM and TF-IDF.

  • Two Python librarys(Pandas and liblinear) are needed. On Windows, you can download the liblinear library from http://www.lfd.uci.edu/~gohlke/pythonlibs/#liblinear

  • The structures of the data files are:

    • The .data files are formatted "docIdx wordIdx count".
    • The .label files are simply a list of label id's.
    • The .map files map from label id's to label names.
  • This demo will give the accuracy near 81.3991% (6109/7505).