SkLearn2PMML

Python package for converting Scikit-Learn pipelines to PMML.

Features

This package is a thin Python wrapper around the JPMML-SkLearn library.

News and Updates

The current version is 0.111.1 (28 October, 2024):

pip install sklearn2pmml==0.111.1

See the NEWS.md file.

Prerequisites

Java 1.8 or newer. The Java executable must be available on system path.
Python 2.7, 3.4 or newer.

Installation

Installing a release version from PyPI:

pip install sklearn2pmml

Alternatively, installing the latest snapshot version from GitHub:

pip install --upgrade git+https://github.com/jpmml/sklearn2pmml.git

Usage

A typical workflow can be summarized as follows:

Create a PMMLPipeline object, and populate it with pipeline steps as usual. Class sklearn2pmml.pipeline.PMMLPipeline extends class sklearn.pipeline.Pipeline with the following functionality:

If the PMMLPipeline.fit(X, y) method is invoked with pandas.DataFrame or pandas.Series object as an X argument, then its column names are used as feature names. Otherwise, feature names default to "x1", "x2", .., "x{number_of_features}".
If the PMMLPipeline.fit(X, y) method is invoked with pandas.Series object as an y argument, then its name is used as the target name (for supervised models). Otherwise, the target name defaults to "y".

Fit and validate the pipeline as usual.
Optionally, compute and embed verification data into the PMMLPipeline object by invoking PMMLPipeline.verify(X) method with a small but representative subset of training data.
Convert the PMMLPipeline object to a PMML file in local filesystem by invoking utility method sklearn2pmml.sklearn2pmml(pipeline, pmml_destination_path).

Developing a simple decision tree model for the classification of iris species:

import pandas

iris_df = pandas.read_csv("Iris.csv")

iris_X = iris_df[iris_df.columns.difference(["Species"])]
iris_y = iris_df["Species"]

from sklearn.tree import DecisionTreeClassifier
from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
	("classifier", DecisionTreeClassifier())
])
pipeline.fit(iris_X, iris_y)

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "DecisionTreeIris.pmml", with_repr = True)

Developing a more elaborate logistic regression model for the same:

import pandas

iris_df = pandas.read_csv("Iris.csv")

iris_X = iris_df[iris_df.columns.difference(["Species"])]
iris_y = iris_df["Species"]

from sklearn_pandas import DataFrameMapper
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn2pmml.decoration import ContinuousDomain
from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
	("mapper", DataFrameMapper([
		(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), SimpleImputer()])
	])),
	("pca", PCA(n_components = 3)),
	("selector", SelectKBest(k = 2)),
	("classifier", LogisticRegression(multi_class = "ovr"))
])
pipeline.fit(iris_X, iris_y)
pipeline.verify(iris_X.sample(n = 15))

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "LogisticRegressionIris.pmml", with_repr = True)

Documentation

Integrations:

Training Scikit-Learn GridSearchCV StatsModels pipelines
Converting Scikit-Learn H2O.ai pipelines to PMML
Converting customized Scikit-Learn estimators to PMML
Training Scikit-Learn StatsModels pipelines
Upgrading Scikit-Learn XGBoost pipelines
Training Python-based XGBoost accelerated failure time models
Converting Scikit-Learn PyCaret 3 pipelines to PMML
Training Scikit-Learn H2O.ai pipelines
One-hot encoding categorical features in Scikit-Learn XGBoost pipelines
Training Scikit-Learn TF(-IDF) plus XGBoost pipelines
Converting Scikit-Learn TF(-IDF) pipelines to PMML
Converting Scikit-Learn Imbalanced-Learn pipelines to PMML
Converting logistic regression models to PMML
Stacking Scikit-Learn, LightGBM and XGBoost models
Converting Scikit-Learn GridSearchCV pipelines to PMML
Converting Scikit-Learn TPOT pipelines to PMML
Converting Scikit-Learn LightGBM pipelines to PMML

Extensions:

Extending Scikit-Learn with feature cross-references
Extending Scikit-Learn with UDF expression transformer
Extending Scikit-Learn with CHAID models
Extending Scikit-Learn with prediction post-processing
Extending Scikit-Learn with outlier detector transformer
Extending Scikit-Learn with date and datetime features
Extending Scikit-Learn with feature specifications
Extending Scikit-Learn with GBDT+LR ensemble models
Extending Scikit-Learn with business rules model

Miscellaneous:

Upgrading Scikit-Learn decision tree models
Measuring the memory consumption of Scikit-Learn models
Benchmarking Scikit-Learn against JPMML-Evaluator
Analyzing Scikit-Learn feature importances via PMML

Archived:

Converting Scikit-Learn to PMML

De-installation

Uninstalling:

pip uninstall sklearn2pmml

License

SkLearn2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use SkLearn2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes SkLearn2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

SkLearn2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SkLearn2PMML

Features

News and Updates

Prerequisites

Installation

Usage

Documentation

De-installation

License

Additional information

Files

README.md

Latest commit

History

README.md

File metadata and controls

SkLearn2PMML

Features

News and Updates

Prerequisites

Installation

Usage

Documentation

De-installation

License

Additional information