Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mouhcine/test n doc #18

Merged
merged 16 commits into from
Jul 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/deploy-doc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: deploy-doc
on: [push, pull_request, workflow_dispatch]
permissions:
contents: write
jobs:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
- name: Install dependencies
run: |
pip install sphinx sphinx_rtd_theme sphinx-autodoc-typehints
- name: Sphinx build
run: |
cd docs
make html
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
with:
publish_branch: gh-pages
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/
force_orphan: true
2 changes: 1 addition & 1 deletion .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ disable=
C0123, # allow use of type()
C0201, # allow iterating the dictionary by calling .keys()
C0206, # allow iterating without .items()
C0302, # allow too many lines in module
C0302, # allow too many lines in module
C0411, # allow custom import order

R0801, # allow similar lines in 2 files
Expand Down
54 changes: 29 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Documentation is available [**online**](https://deel-ai.github.io/puncc/index.ht
## 📚 Table of contents

- [🐾 Installation](#-installation)
- [👨‍🎓 Tutorials](#-tutorials)
- [🚀 QuickStart](#-quickstart)
- [📚 Citation](#-citation)
- [💻 Contributing](#-contributing)
Expand All @@ -41,47 +42,50 @@ Documentation is available [**online**](https://deel-ai.github.io/puncc/index.ht

## 🐾 Installation

It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.
*puncc* requires a version of python higher than 3.8 and several libraries including Scikit-learn and Numpy. It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.

You can directly install the library using pip:

### For users
```bash
pip install -e .[interactive]
pip install git+https://github.com/deel-ai/puncc
```

You can alternatively use the makefile to automatically create a virtual environment
`puncc-user-env` and install user requirements:
You can alternatively clone the repo and use the makefile to automatically create a virtual environment
and install the requirements:

* For users:

```bash
make install-user
```

### For developpers
* For developpers:

```bash
pip install -e .[dev]
make prepare-dev
```

You can alternatively use the makefile to automatically create a virtual environment
`puncc-dev-env` and install the dev requirements:
## 👨‍🎓 Tutorials

```bash
make prepare-dev
```
We highly recommand following the introduction tutorials to get familiar with the library and its API:

* [**Introduction tutorial**](docs/puncc_intro.ipynb)</font> <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TC_BM7JaEYtBIq6yuYB5U4cJjeg71Tch) </sub>

## 🚀 Quickstart
<div align="center">
* [**API tutorial**](docs/api_intro.ipynb) <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1d06qQweM1X1eSrCnixA_MLEZil1vXewj) </sub>

<font size=3>📙 You can find the detailed implementation of the example below in the [**Quickstart Notebook**](docs/quickstart.ipynb)</font>.
You can also familiarize yourself with the architecture of *puncc* to build more efficiently your own conformal prediction methods:

</div>
Let’s consider a simple regression problem on diabetes data provided by Scikit-learn. We want to evaluate the uncertainty associated with the prediction using inductive (or split) conformal prediction.
* [**Architecture overview**](docs/puncc_architecture.ipynb)

## 🚀 Quickstart

### Split Conformal Prediction
Conformal prediction enables to transform point predictions into interval predictions with high probability of coverage. The figure below shows the result of applying the split conformal algorithm on a linear regressor.

<figure style="text-align:center">
<img src="docs/assets/cp_process.png"/>
</figure>

For this example, the prediction intervals are obtained throught the split
conformal prediction method provided by the class
`deel.puncc.regression.SplitCP`, applied on a linear model.
Many conformal prediction algorithms can easily be applied using *puncc*. The code snippet below shows the example of split conformal prediction wrapping a linear model, done in few lines of code:

```python
from sklearn import linear_model
Expand All @@ -104,8 +108,8 @@ lin_reg_predictor = BasePredictor(linear_model, is_trained=False)
split_cp = SplitCP(lin_reg_predictor)

# Fit model (as is_trained` is False) on the fit dataset and
# compute the residuals on the calibration dataset.
# The fit (resp. calibration) subset is randomly sampled from the training
# compute the residuals on the calibration dataset.
# The fit (resp. calibration) subset is randomly sampled from the training
# data and constitutes 80% (resp. 20%) of it (fit_ratio = 80%).
split_cp.fit(X_train, y_train, fit_ratio=.8)

Expand All @@ -114,7 +118,7 @@ split_cp.fit(X_train, y_train, fit_ratio=.8)
y_pred, y_pred_lower, y_pred_upper = split_cp.predict(X_test, alpha=alpha)
```

The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is $95$% (see [Quickstart Notebook](docs/quickstart.ipynb)):
The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is higher than $90$% (see [Introduction tutorial](docs/puncc_intro.ipynb)):

<figure style="text-align:center">
<img src="docs/assets/results_quickstart_split_cp_pi.png" alt="90% Prediction Interval with the Split Conformal Prediction Method"/>
Expand All @@ -128,7 +132,7 @@ The library provides several metrics (`deel.puncc.metrics`) and plotting capabil
- A direct approach to run state-of-the-art conformal prediction procedures. This is what we used in the previous conformal regression example.
- **Low-level API**: a more flexible approach based of full customization of the prediction model, the choice of nonconformity scores and the split between fit and calibration datasets.

A quick comparison of both approaches is provided in the [Quickstart Notebook](docs/quickstart.ipynb) for a simple regression problem.
A quick comparison of both approaches is provided in the [API tutorial](docs/api_intro.ipynb) for a regression problem.

## 📚 Citation

Expand Down
19 changes: 12 additions & 7 deletions deel/puncc/api/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Architecture Overview

The **high-level API** enables a turnkey solution and a fully customized approach to conformal prediction. It is as simple as calling the conformal prediction procedures in `deel.puncc.regression` or `deel.puncc.classification`.
*Puncc* enables a turnkey solution and a fully customized approach to conformal prediction. It is as simple as calling the conformal prediction procedures in `deel.puncc.regression` or `deel.puncc.classification`.

The currently implemented conformal regression procedures are the following:
* `deel.puncc.regression.SplitCP`: Split Conformal Prediction
Expand All @@ -11,15 +11,20 @@ The currently implemented conformal regression procedures are the following:
* `deel.puncc.regression.aEnbPI`: locally adaptive Ensemble Batch Prediction Intervals method

The currently implemented conformal classification procedures are the following:
* `deel.puncc.classification.APS`: Adaptive Prediction Sets.
* `deel.puncc.classification.RAPS`: Regularized Adaptive Prediction Sets. APS is a special case where regularization term is nulled ($\lambda = 0$).

Each of these procedures conformalize point-based or interval-based models that are wrapped in a predictor and passed as argument to the constructor. Wrapping the models in a predictor (`deel.puncc.api.prediction`) enables to work with several ML/DL libraries and data structures.

The **low-level API** offers more flexibility into defining conformal prediction procedures. Let's say we want to fit/calibrate a neural-network interval-estimator with a cross-validation plan; or that we want to experiment different user-defined nonconformity scores. In such cases and others, the user can fully construct their approaches using the proposed **Predictor-Calibrator-Splitter** paradigm. It boils down to assembling into `puncc.api.conformalization.ConformalPredictor`:
The **API** offers more flexibility into defining conformal prediction procedures. Let's say we want to fit/calibrate a neural-network interval-estimator with a cross-validation plan; or that we want to experiment different user-defined nonconformity scores. In such cases and others, the user can fully construct their approaches using the proposed **Predictor-Calibrator-Splitter** paradigm. It boils down to assembling into `puncc.api.conformalization.ConformalPredictor`:
1) a predictor
2) An calibrator defining a nonconformity score and how to construct/calibrate the prediction sets
2) A calibrator defining a nonconformity score and a procedure to construct/calibrate the prediction sets
3) A splitter defining the strategy of data assignement into fitting and calibration sets

<figure style="text-align:center">
<img src="../../../docs/assets/puncc_architecture.png"/>
</figure>

## ConformalPredictor

`deel.puncc.api.conformalization.ConformalPredictor` is the canvas of conformal prediction procedures.
Expand Down Expand Up @@ -179,7 +184,7 @@ The predictors have to implement:
* a `copy` method that returns a copy of the predictor (useful in cross validation for example). It has to deepcopy the underlying model.

The constructor of `deel.puncc.api.prediction.BasePredictor` takes in the model to be wrapped, a flag to inform if the model is already trained
and compilation keyword arguments if the underlying model needs to be compiled (such as in keras).
and compilation keyword arguments if the underlying model needs to be compiled (such as in TensorFlow or PyTorch).

The constructor of `deel.puncc.api.prediction.DualPredictor` is conceptually similar but take as arguments a list of two models, a list of two trained flags and a list of two compilation kwargs.
Such predictor is useful when the calibration relies of several models (such as upper and lower quantiles in CQR).
Expand All @@ -200,9 +205,10 @@ we need to create a predictor in which we redefine the `predict` call:

```python
from sklearn.ensemble import RandomForestClassifier
from deel.puncc.api.prediction import BasePredictor

# Create rf classifier
rf_model = (n_estimators=100, random_state=0)
rf_model = RandomForestClassifier(n_estimators=100, random_state=0)

# Create a wrapper of the random forest model to redefine its predict method
# into logits predictions. Make sure to subclass BasePredictor.
Expand Down Expand Up @@ -261,8 +267,7 @@ def my_psf(y_pred, nonconf_scores_quantile):
return y_lower, y_upper

## Calibrator construction
my_calibrator = BaseCalibrator(nonconf_score_func=my_ncf,
pred_set_func=my_psf)
my_calibrator = BaseCalibrator(nonconf_score_func=my_ncf, pred_set_func=my_psf)
```

## Splitter
Expand Down
5 changes: 5 additions & 0 deletions deel/puncc/api/calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,11 @@ def prediction_set_function(y_pred, scores_quantile):
pred_set_func=prediction_set_function
)

# Generate dummy data and predictions
y_pred_calib = np.random.rand(1000)
y_true_calib = np.random.rand(1000)
y_pred_test = np.random.rand(1000)

# The nonconformity scores are computed by calling the `fit` method
# on the calibration dataset.
calibrator.fit(y_pred=y_pred_calib, y_true=y_true_calib)
Expand Down
8 changes: 5 additions & 3 deletions deel/puncc/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,12 @@
"""
This module implements usual conformal classification wrappers.
"""
import numpy as np
from typing import Iterable
from typing import Optional
from typing import Tuple

import numpy as np

from deel.puncc.api import nonconformity_scores
from deel.puncc.api import prediction_sets
from deel.puncc.api.calibration import BaseCalibrator
Expand Down Expand Up @@ -118,7 +119,8 @@ def predict(self, X, **kwargs):

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
raps_cp.fit(X_fit, y_fit, X_calib, y_calib)
raps_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)


# The predict method infers prediction intervals with respect to
# the significance level alpha = 20%
Expand Down Expand Up @@ -323,7 +325,7 @@ def predict(self, X, **kwargs):

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
aps_cp.fit(X_fit, y_fit, X_calib, y_calib)
aps_cp.(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)

# The predict method infers prediction intervals with respect to
# the significance level alpha = 20%
Expand Down
34 changes: 16 additions & 18 deletions deel/puncc/regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ class SplitCP:

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
split_cp.fit(X_fit, y_fit, X_calib, y_calib)
split_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)

# The predict method infers prediction intervals with respect to
# the significance level alpha = 20%
Expand Down Expand Up @@ -305,7 +305,7 @@ class LocallyAdaptiveCP(SplitCP):

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
lacp.fit(X_fit, y_fit, X_calib, y_calib)
lacp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)

# The predict method infers prediction intervals with respect to
# the significance level alpha = 20%
Expand Down Expand Up @@ -400,11 +400,11 @@ class CQR(SplitCP):

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
crq.fit(X_fit, y_fit, X_calib, y_calib)
crq.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)

# The predict method infers prediction intervals with respect to
# the significance level alpha = 20%
Y_pred, y_pred_lower, y_pred_upper = crq.predict(X_test, alpha=.2)
y_pred, y_pred_lower, y_pred_upper = crq.predict(X_test, alpha=.2)

# Compute marginal coverage and average width of the prediction intervals
coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper)
Expand Down Expand Up @@ -587,6 +587,8 @@ class EnbPI:

Example::

import numpy as np

from deel.puncc.regression import EnbPI
from deel.puncc.api.prediction import BasePredictor

Expand All @@ -607,11 +609,6 @@ class EnbPI:
X, y, test_size=.2, random_state=0
)

# Split train data into fit and calibration
X_fit, X_calib, y_fit, y_calib = train_test_split(
X, y, test_size=.2, random_state=0
)

# Create rf regressor
rf_model = RandomForestRegressor(n_estimators=100, random_state=0)
# Wrap model in a predictor
Expand All @@ -623,12 +620,14 @@ class EnbPI:
agg_func_loo=np.mean,
random_state=0,
)

# The call to `fit` trains the model and computes the nonconformity
# scores on the oob calibration sets
enbpi.fit(X, y)

# The predict method infers prediction intervals with respect to
# the significance level alpha = 20%
Y_pred, y_pred_lower, y_pred_upper = enbpi.predict(
y_pred, y_pred_lower, y_pred_upper = enbpi.predict(
X_test, alpha=.2, y_true=y_test, s=None
)

Expand Down Expand Up @@ -936,29 +935,28 @@ class AdaptiveEnbPI(EnbPI):
X, y, test_size=.2, random_state=0
)

# Split train data into fit and calibration
X_fit, X_calib, y_fit, y_calib = train_test_split(
X, y, test_size=.2, random_state=0
)

# Create two models mu (mean) and sigma (dispersion)
mean_model = RandomForestRegressor(n_estimators=100, random_state=0)
sigma_model = RandomForestRegressor(n_estimators=100, random_state=0)

# Wrap models in a mean/variance predictor
mean_var_predictor = MeanVarPredictor([mean_model, sigma_model])

# CP method initialization
aenbpi = AdaptiveEnbPI(
mean_var_predictor,
B=30,
agg_func_loo=np.mean,
random_state=0,
)

# The call to `fit` trains the model and computes the nonconformity
# scores on the oob calibration sets
aenbpi.fit(X, y)

# The predict method infers prediction intervals with respect to
# the significance level alpha = 20%
Y_pred, y_pred_lower, y_pred_upper = aenbpi.predict(
y_pred, y_pred_lower, y_pred_upper = aenbpi.predict(
X_test, alpha=.2, y_true=y_test, s=None
)

Expand Down Expand Up @@ -997,8 +995,8 @@ def _compute_residuals(self, y_pred, y_true):
def _compute_boot_residuals(self, boot_pred, y_true):
loo_pred = (self._oob_matrix * boot_pred[:, :, 0].T).sum(-1)
loo_sigma = (self._oob_matrix * boot_pred[:, :, 1].T).sum(-1)
Y_pred = np.stack((loo_pred, loo_sigma), axis=-1)
residuals = self._compute_residuals(y_pred=Y_pred, y_true=y_true)
y_pred = np.stack((loo_pred, loo_sigma), axis=-1)
residuals = self._compute_residuals(y_pred=y_pred, y_true=y_true)
return list(residuals)

def _compute_loo_predictions(self, boot_pred):
Expand Down
Loading