Skip to content

rmax-archive/andaag-scikit_helpers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

scikit_helpers

My personal tools for working with scikit/ML. For now it's only my feature building tools with scikit pipelines and pandas dataframes.

pipe = sklearn.pipeline.Pipeline([
    ("features", sklearn.pipeline.FeatureUnion([
                ("single_features", PickFeature(select_columns=["ColumnA","ColumnB"], return_sparse=True)),
                ("text_pipeline", FeatureNamedPipeline([
                    ("pick", PickFeature(select_columns=['Annonseoverskrift'], return_single_row=True)),
                    ("vect", sklearn.feature_extraction.text.CountVectorizer()),
                    ("kbest", sklearn.feature_selection.SelectKBest(sklearn.feature_selection.chi2)),
                	], feature_function='lambda x: np.asarray(x.named_steps["vect"].get_feature_names())[x.named_steps["kbest"].get_support()]'))
             ])),
    ("svc", sklearn.svm.LinearSVC())
    ])
pipe.fit(X, y) #X in this case is a pandas dataframe.
pipe.predict(X)

Now you can easily run a gridsearch tuning parameters for everything from the svc in the bottom, kbest/the vectorizer etc. It also supports pipe.named_steps["features"].get_feature_names().

In this example I convert the entire featureunion to sparse. Of course you can skip the return_sparse function, but then you would have to stick a ToDense() feature after SelectKBest in order for the FeatureUnion to be able to merge them.

About

My personal tools for working with scikit/ML

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%