Skip to content

Commit

Permalink
deploy: 005698a
Browse files Browse the repository at this point in the history
  • Loading branch information
M-Mouhcine committed Jul 8, 2023
0 parents commit 2b63b0c
Show file tree
Hide file tree
Showing 75 changed files with 9,759 additions and 0 deletions.
Empty file added .nojekyll
Empty file.
4 changes: 4 additions & 0 deletions docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 1d10e2497f8b5c2e668e5b79c4cf04e7
tags: 645f666f9bcd5a90fca523b33c5a78b7
Empty file added docs/.nojekyll
Empty file.
Binary file added docs/_images/k-fold-scheme.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_images/results_quickstart_aps_mnist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
306 changes: 306 additions & 0 deletions docs/_sources/api.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
.. _api:

💻 API
=======

The **low-level API** offers more flexibility into defining conformal prediction wrappers.
Let's say we want to fit/calibrate a neural-network interval-estimator with a cross-validation plan;
or that we want to experiment different user-defined nonconformity scores.
In such cases and others, the user can fully construct their wrappers using the
proposed **Predictor-Calibrator-Splitter** paradigm. It boils down to assembling
into :class:`deel.puncc.api.conformalization.ConformalPredictor`:

* Prediction model(s).
* An estimator of nonconformity scores for construction/calibration of prediction intervals.
* A strategy to assign data into fitting and calibration sets (case of inductive CP).

.. contents:: Table of Contents
:depth: 3

API's Modules
*************

.. toctree::
:maxdepth: 2

conformalization
prediction
calibration
splitting
utils
nonconformity_scores
prediction_sets

Overview
********

ConformalPredictor
------------------

:class:`deel.puncc.api.conformalization.ConformalPredictor` is the canvas of conformal prediction procedures.
An object instance is constructed by, as we will explain later, a **predictor**, a **calibrator** and a **splitter**:

.. code-block:: python
# Imports
from sklearn import linear_model
from deel.puncc.api.conformalization import ConformalPredictor
from deel.puncc.api.prediction import BasePredictor
from deel.puncc.api.calibration import BaseCalibrator
from deel.puncc.api.splitting import KFoldSplitter
# Regression linear model
model = linear_model.LinearRegression()
# Definition of a predictor (This will be explained later)
my_predictor = BasePredictor(model) # Predictor
# Definition of a calibrator, built for a given nonconformity scores and a
# procedure to build the prediction sets
## Definition of a custom nonconformity scores function.
## Alternatively, several ready-to-use nonconf scores are provided in
## the module deel.puncc.nonconformity_scores (more on this later)
def my_ncf(y_pred, y_true):
return np.abs(y_pred-y_true)
## Definition of a custom function to build prediction sets.
## Alternatively, several ready-to-use procedure are provided in
## the module deel.puncc.prediction_sets (more on this later)
def my_psf(y_pred, nonconf_scores_quantile):
y_lower = y_pred - nonconf_scores_quantile
y_upper = y_pred + nonconf_scores_quantile
return y_lower, y_upper
## Calibrator construction
my_calibrator = BaseCalibrator(nonconf_score_func=my_ncf,
pred_set_func=my_psf)
# Definition of a K-fold splitter that produces 20 folds of fit/calibration
kfold_splitter = KFoldSplitter(K=20, random_state=42)
# Conformal prediction canvas
conformal_predictor = ConformalPredictor(predictor=my_predictor,
calibrator=my_calibrator,
splitter=kfold_splitter)
:class:`deel.puncc.api.conformalization.ConformalPredictor` implements two methods:


* A :func:`fit` method that fits the predictor model and computes nonconformity scores accordingly to the calibrator and to the data split strategy provided by the splitter

.. code-block:: python
# X_train and y_train are the full training dataset
# The splitter passed as argument to ConformalPredictor assigns data
# to the fit and calibration sets based on the provided splitting strategy
conformal_predictor.fit(X_train, y_train)
* And a :func:`predict` method that estimates for new samples the point predictions and prediction intervals [y_pred_lower, y_pred_upper], w.r.t a chosen error (significance) level :math:`\alpha`

.. code-block:: python
# Coverage target of 1-alpha = 90%
y_pred, y_pred_lower, y_pred_upper = conformal_predictor.predict(X_new, alpha=.1)
The full code snippet of the previous CVplus-like procedure with a randomly generated dataset is provided below:

.. code-block:: python
# Imports
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from deel.puncc.api.conformalization import ConformalPredictor
from deel.puncc.api.prediction import BasePredictor
from deel.puncc.api.calibration import BaseCalibrator
from deel.puncc.api.splitting import KFoldSplitter
from deel.puncc.plotting import plot_prediction_intervals
from deel.puncc import metrics
# Data
## Generate a random regression problem
X, y = make_regression(n_samples=1000, n_features=4, n_informative=2,
random_state=0, shuffle=False)
## Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=.2, random_state=0
)
# Regression linear model
model = linear_model.LinearRegression()
# Definition of a predictor (This will be explained later)
my_predictor = BasePredictor(model) # Predictor
# Definition of a calibrator, built for a given nonconformity scores and a
# procedure to build the prediction sets
## Definition of a custom nonconformity scores function.
## Alternatively, several ready-to-use nonconf scores are provided in
## the module deel.puncc.nonconformity_scores (more on this later)
def my_ncf(y_pred, y_true):
return np.abs(y_pred-y_true)
## Definition of a custom function to build prediction sets.
## Alternatively, several ready-to-use procedure are provided in
## the module deel.puncc.prediction_sets (more on this later)
def my_psf(y_pred, nonconf_scores_quantile):
y_lower = y_pred - nonconf_scores_quantile
y_upper = y_pred + nonconf_scores_quantile
return y_lower, y_upper
## Calibrator construction
my_calibrator = BaseCalibrator(nonconf_score_func=my_ncf,
pred_set_func=my_psf) # Calibrator
# Definition of a K-fold splitter that produces 20 folds of fit/calibration
kfold_splitter = KFoldSplitter(K=20, random_state=42) # Splitter
# Conformal prediction canvas
conformal_predictor = ConformalPredictor(predictor=my_predictor,
calibrator=my_calibrator,
splitter=kfold_splitter)
conformal_predictor.fit(X_train, y_train)
y_pred, y_pred_lower, y_pred_upper = conformal_predictor.predict(X_test, alpha=.1)
# Compute empirical marginal coverage and average width of the prediction intervals
coverage = metrics.regression_mean_coverage(y_test, y_pred_lower, y_pred_upper)
width = metrics.regression_sharpness(y_pred_lower=y_pred_lower,
y_pred_upper=y_pred_upper)
print(f"Marginal coverage: {np.round(coverage, 2)}")
print(f"Average width: {np.round(width, 2)}")
# Figure of the prediction bands
plot_prediction_intervals(
X = X_test[:,0],
y_true=y_test,
y_pred=y_pred,
y_pred_lower=y_pred_lower,
y_pred_upper=y_pred_upper,
sort_X=True,
size=(10, 6),
loc="upper left")
Predictor
---------

The :class:`deel.puncc.api.prediction.BasePredictor` and :class:`deel.puncc.api.prediction.DualPredictor` classes are wrappers of ML/DL models
that aims to expose a standardized interface and guarantee compliance with the `puncc`'s framework.
The predictors have to implement:

* a :func:`fit` method used to train the model. It takes as arguments two iterables X, Y (collection of data such as ndarray and tensors) and any additional configuration of the underlying model (e.g., random seed).
* a :func:`predict` method used to predict targets for a given iterable X. It takes as arguments an iterable X and any additional configuration of the underlying model (e.g., batch size).
* a :func:`copy` method that returns a copy of the predictor (useful in cross validation for example). It has to deepcopy the underlying model.

The constructor of :class:`deel.puncc.api.prediction.BasePredictor` takes in the model to be wrapped, a flag to inform if the model is already trained
and compilation keyword arguments if the underlying model needs to be compiled (such as in keras).

The constructor of :class:`deel.puncc.api.prediction.DualPredictor` is conceptually similar but take as arguments
a list of two models, a list of two trained flags and a list of two compilation kwargs.
Such predictor is useful when the calibration relies of several models (such as upper and lower quantiles in CQR).
Note that the output `y_pred` of the :func:`predict` method are a collection of couples,
where the first (resp. second) axis is associated to the output of the first (resp. second) model.
Specifically, :class:`deel.puncc.api.prediction.MeanVarPredictor` is a subclass of :class:`deel.puncc.api.prediction.DualPredictor` that
trains the first model on the data and the second one to predict the absolute error of the former model.

These three predictor classes cover plenty of use case in conformal prediction.
But if you have a special need, you can subclass :class:`deel.puncc.api.prediction.BasePredictor` or :class:`deel.puncc.api.prediction.DualPredictor` or
even create a predictor from scratch.

Here is an example of situation where you need to define your own predictor:
you have a classification problem and you build a :class:`RandomForestClassifier`
from sklearn. The procedure :ref:`RAPS <theory raps>` to conformalize the classifier requires
a :func:`predict` method that outputs the estimated probability of each class. This is not the case
as :func:`RandomForestClassifier.predict` returns only the most likely class. In this case,
we need to create a predictor in which we redefine the :func:`predict` call:

.. code-block:: python
from sklearn.ensemble import RandomForestClassifier
# Create rf classifier
rf_model = (n_estimators=100, random_state=0)
# Create a wrapper of the random forest model to redefine its predict method
# into logits predictions. Make sure to subclass BasePredictor.
# Note that we needed to build a new wrapper (over BasePredictor) only because
# the predict(.) method of RandomForestClassifier does not predict logits.
# Otherwise, it is enough to use BasePredictor (e.g., neural network with softmax).
class RFPredictor(BasePredictor):
def predict(self, X, **kwargs):
return self.model.predict_proba(X, **kwargs)
# Wrap model in the newly created RFPredictor
rf_predictor = RFPredictor(rf_model)
Calibrator
----------

The calibrator provides a structure to estimate the nonconformity scores
on the calibration set and to compute the prediction sets. At the constructor :class:`deel.puncc.api.calibration.BaseCalibrator`,
one decides which nonconformity score and prediction set functions to use.
Then, the calibrator instance computes **nonconformity scores** (e.g., mean absolute deviation) by calling
:func:`deel.puncc.api.calibration.Calibrator.fit` on the calibration dataset. Based on the estimated quantiles of nonconformity scores,
the method :func:`deel.puncc.api.calibration.BaseCalibrator.calibrate` enables to **construct** and/or **calibrate** prediction sets.

For example, the `BaseCalibrator` in the split conformal prediction procedure
uses the mean absolute deviation as nonconformity score and prediction sets
are built as constant intervals. These two functions are already provided in
:func:`deel.puncc.api.nonconformity_scores.mad` and :func:`deel.puncc.api.prediction_sets.constant_interval`, respectively:

.. code-block:: python
from deel.puncc.api.calibration import BaseCalibrator
from deel.puncc.api import nonconformity_scores
from deel.puncc.api import prediction_sets
## Calibrator construction
my_calibrator = BaseCalibrator(nonconf_score_func=nonconformity_scores.mad,
pred_set_func=prediction_sets.constant_interval)
Alternatively, one can define custom functions and pass them as arguments to the calibrator:

.. code-block:: python
from deel.puncc.api.calibration import BaseCalibrator
## Definition of a custom nonconformity scores function.
## Alternatively, several ready-to-use nonconf scores are provided in
## the module deel.puncc.nonconformity_scores
def my_ncf(y_pred, y_true):
return np.abs(y_pred-y_true)
## Definition of a custom function to build prediction sets.
## Alternatively, several ready-to-use procedure are provided in
## the module deel.puncc.prediction_sets
def my_psf(y_pred, nonconf_scores_quantile):
y_lower = y_pred - nonconf_scores_quantile
y_upper = y_pred + nonconf_scores_quantile
return y_lower, y_upper
## Calibrator construction
my_calibrator = BaseCalibrator(nonconf_score_func=my_ncf,
pred_set_func=my_psf)
Splitter
--------

In conformal prediction, the assignment of data into fit and calibration sets is motivated by two criteria:
data availability and computational resources. If quality data is abundant,
we can split the training samples into disjoint subsets :math:`D_{fit}` and :math:`D_{calib}`.
When data is scarce, a cross-validation strategy is preferred but is more
resource-consuming as different models are trained and nonconformity scores
are computed for different disjoint folds.

The two plans are implemented in :mod:`deel.puncc.api.splitting` module,
and are agnostic to the data structure (which can be ndarrays, tensors and dataframes):

- :class:`deel.puncc.api.splitting.RandomSplitter`: random assignment of samples in :math:`D_{fit}` and :math:`D_{calib}`
- :class:`deel.puncc.api.splitting.KFoldSplitter`: random assignment of samples into K disjoint folds. Note that if K equals the size of training set, the split is identified with the leave-one-out strategy

Additionnaly, if the user already implemented a split plan, the obtained data asignement
is wrapped in :class:`deel.puncc.api.splitting.IdSplitter` to produce iterables.

These methods produce **iterables** that are used by the :class:`ConformalPredictor` instance.
9 changes: 9 additions & 0 deletions docs/_sources/calibration.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _calibration:

Calibration
===========

.. automodule:: calibration
:members:
:undoc-members:
:show-inheritance:
17 changes: 17 additions & 0 deletions docs/_sources/classification.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _classification:

📊 Classification
==================

Currently implemented conformal prediction methods for classification are listed
in this page.

Each of the wrappers conformalize models that are passed as argument in the
object constructor. Such models **need** to implement the :func:`fit`
and :func:`predict` methods.
:doc:`Prediction module <prediction>` from the :doc:`API <api>` ensures the
compliance of models from various ML/DL libraries (such as Keras and scikit-learn) to **puncc**.

.. autoclass:: deel.puncc.classification.RAPS

.. autoclass:: deel.puncc.classification.APS
7 changes: 7 additions & 0 deletions docs/_sources/conformalization.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Conformalization
================

.. automodule:: conformalization
:members:
:undoc-members:
:show-inheritance:
Loading

0 comments on commit 2b63b0c

Please sign in to comment.