deel-ai · M-Mouhcine · Jul 8, 2023 · Apr 18, 2023 · Apr 20, 2023 · Jul 6, 2023
diff --git a/.github/workflows/deploy-doc.yml b/.github/workflows/deploy-doc.yml
@@ -0,0 +1,25 @@
+name: deploy-doc
+on: [push, pull_request, workflow_dispatch]
+permissions:
+    contents: write
+jobs:
+  docs:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions/setup-python@v3
+      - name: Install dependencies
+        run: |
+          pip install sphinx sphinx_rtd_theme sphinx-autodoc-typehints
+      - name: Sphinx build
+        run: |
+          cd docs 
+          make html 
+      - name: Deploy
+        uses: peaceiris/actions-gh-pages@v3
+        if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
+        with:
+          publish_branch: gh-pages
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: docs/
+          force_orphan: true
diff --git a/.pylintrc b/.pylintrc
@@ -7,7 +7,7 @@ disable=
     C0123, # allow use of type()
     C0201, # allow iterating the dictionary by calling .keys()
     C0206, # allow iterating without .items()
-    C0302, # allow too many lines in module 
+    C0302, # allow too many lines in module
     C0411, # allow custom import order
 
     R0801, # allow similar lines in 2 files

diff --git a/README.md b/README.md
@@ -33,6 +33,7 @@ Documentation is available [**online**](https://deel-ai.github.io/puncc/index.ht
 ## 📚 Table of contents
 
 - [🐾 Installation](#-installation)
+- [👨‍🎓 Tutorials](#-tutorials)
 - [🚀 QuickStart](#-quickstart)
 - [📚 Citation](#-citation)
 - [💻 Contributing](#-contributing)
@@ -41,47 +42,50 @@ Documentation is available [**online**](https://deel-ai.github.io/puncc/index.ht
 
 ## 🐾 Installation
 
-It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.
+*puncc* requires a version of python higher than 3.8 and several libraries including Scikit-learn and Numpy. It is recommended to install *puncc* in a virtual environment to not mess with your system's dependencies.
+
+You can directly install the library using pip:
 
-### For users
 ```bash
-pip install -e .[interactive]
+pip install git+https://github.com/deel-ai/puncc
 ```
 
-You can alternatively use the makefile to automatically create a virtual environment
-`puncc-user-env` and install user requirements:
+You can alternatively clone the repo and use the makefile to automatically create a virtual environment
+and install the requirements:
+
+* For users: 
 
 ```bash
 make install-user
 ```
 
-### For developpers
+* For developpers:
 
 ```bash
-pip install -e .[dev]
+make prepare-dev
 ```
 
-You can alternatively use the makefile to automatically create a virtual environment
-`puncc-dev-env` and install the dev requirements:
+## 👨‍🎓 Tutorials
 
-```bash
-make prepare-dev
-```
+We highly recommand following the introduction tutorials to get familiar with the library and its API:
 
+* [**Introduction tutorial**](docs/puncc_intro.ipynb)</font> <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TC_BM7JaEYtBIq6yuYB5U4cJjeg71Tch) </sub>
 
-## 🚀 Quickstart
-<div align="center">
+* [**API tutorial**](docs/api_intro.ipynb) <sub> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1d06qQweM1X1eSrCnixA_MLEZil1vXewj) </sub>
 
-<font size=3>📙 You can find the detailed implementation of the example below in the [**Quickstart Notebook**](docs/quickstart.ipynb)</font>.
+You can also familiarize yourself with the architecture of *puncc* to build more efficiently your own conformal prediction methods:
 
-</div>
-Let’s consider a simple regression problem on diabetes data provided by Scikit-learn. We want to evaluate the uncertainty associated with the prediction using inductive (or split) conformal prediction.
+* [**Architecture overview**](docs/puncc_architecture.ipynb)
+
+## 🚀 Quickstart
 
-### Split Conformal Prediction
+Conformal prediction enables to transform point predictions into interval predictions with high probability of coverage. The figure below shows the result of applying the split conformal algorithm on a linear regressor.
+
+<figure style="text-align:center">
+<img src="docs/assets/cp_process.png"/>
+</figure>
 
-For this example, the prediction intervals are obtained throught the split
-conformal prediction method provided by the class
-`deel.puncc.regression.SplitCP`, applied on a linear model.
+Many conformal prediction algorithms can easily be applied using *puncc*.  The code snippet below shows the example of split conformal prediction wrapping a linear model,  done in few lines of code:
 
 ```python
 from sklearn import linear_model
@@ -104,8 +108,8 @@ lin_reg_predictor =  BasePredictor(linear_model, is_trained=False)
 split_cp = SplitCP(lin_reg_predictor)
 
 # Fit model (as is_trained` is False) on the fit dataset and
-# compute the residuals on the calibration dataset. 
-# The fit (resp. calibration) subset is randomly sampled from the training 
+# compute the residuals on the calibration dataset.
+# The fit (resp. calibration) subset is randomly sampled from the training
 # data and constitutes 80% (resp. 20%) of it (fit_ratio = 80%).
 split_cp.fit(X_train, y_train, fit_ratio=.8)
 
@@ -114,7 +118,7 @@ split_cp.fit(X_train, y_train, fit_ratio=.8)
 y_pred, y_pred_lower, y_pred_upper = split_cp.predict(X_test, alpha=alpha)
 ```
 
-The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is $95$% (see [Quickstart Notebook](docs/quickstart.ipynb)):
+The library provides several metrics (`deel.puncc.metrics`) and plotting capabilities (`deel.puncc.plotting`) to evaluate and visualize the results of a conformal procedure. For a target error rate of $\alpha = 0.1$, the marginal coverage reached in this example on the test set is higher than $90$% (see [Introduction tutorial](docs/puncc_intro.ipynb)):
 
 <figure style="text-align:center">
 <img src="docs/assets/results_quickstart_split_cp_pi.png" alt="90% Prediction Interval with the Split Conformal Prediction Method"/>
@@ -128,7 +132,7 @@ The library provides several metrics (`deel.puncc.metrics`) and plotting capabil
 - A direct approach to run state-of-the-art conformal prediction procedures. This is what we used in the previous conformal regression example.
 - **Low-level API**: a more flexible approach based of full customization of the prediction model, the choice of nonconformity scores and the split between fit and calibration datasets.
 
-A quick comparison of both approaches is provided in the [Quickstart Notebook](docs/quickstart.ipynb) for a simple regression problem.
+A quick comparison of both approaches is provided in the [API tutorial](docs/api_intro.ipynb) for a regression problem.
 
 ## 📚 Citation
 

diff --git a/deel/puncc/api/README.md b/deel/puncc/api/README.md
@@ -1,6 +1,6 @@
 # Architecture Overview
 
-The **high-level API** enables a turnkey solution and a fully customized approach to conformal prediction. It is as simple as calling the conformal prediction procedures in `deel.puncc.regression` or `deel.puncc.classification`.
+*Puncc* enables a turnkey solution and a fully customized approach to conformal prediction. It is as simple as calling the conformal prediction procedures in `deel.puncc.regression` or `deel.puncc.classification`.
 
 The currently implemented conformal regression procedures are the following:
 * `deel.puncc.regression.SplitCP`: Split Conformal Prediction
@@ -11,15 +11,20 @@ The currently implemented conformal regression procedures are the following:
 * `deel.puncc.regression.aEnbPI`: locally adaptive Ensemble Batch Prediction Intervals method
 
 The currently implemented conformal classification procedures are the following:
+* `deel.puncc.classification.APS`: Adaptive Prediction Sets. 
 * `deel.puncc.classification.RAPS`: Regularized Adaptive Prediction Sets. APS is a special case where regularization term is nulled ($\lambda = 0$).
 
 Each of these procedures conformalize point-based or interval-based models that are wrapped in a predictor and passed as argument to the constructor. Wrapping the models in a predictor (`deel.puncc.api.prediction`) enables to work with several ML/DL libraries and data structures.
 
-The **low-level API** offers more flexibility into defining conformal prediction procedures. Let's say we want to fit/calibrate a neural-network interval-estimator with a cross-validation plan; or that we want to experiment different user-defined nonconformity scores. In such cases and others, the user can fully construct their approaches using the proposed **Predictor-Calibrator-Splitter** paradigm. It boils down to assembling into `puncc.api.conformalization.ConformalPredictor`:
+The **API** offers more flexibility into defining conformal prediction procedures. Let's say we want to fit/calibrate a neural-network interval-estimator with a cross-validation plan; or that we want to experiment different user-defined nonconformity scores. In such cases and others, the user can fully construct their approaches using the proposed **Predictor-Calibrator-Splitter** paradigm. It boils down to assembling into `puncc.api.conformalization.ConformalPredictor`:
 1) a predictor
-2) An calibrator defining a nonconformity score and how to construct/calibrate the prediction sets
+2) A calibrator defining a nonconformity score and a procedure to construct/calibrate the prediction sets
 3) A splitter defining the strategy of data assignement into fitting and calibration sets
 
+<figure style="text-align:center">
+    <img src="../../../docs/assets/puncc_architecture.png"/>
+</figure>
+
 ## ConformalPredictor
 
 `deel.puncc.api.conformalization.ConformalPredictor` is the canvas of conformal prediction procedures.
@@ -179,7 +184,7 @@ The predictors have to implement:
 * a `copy` method that returns a copy of the predictor (useful in cross validation for example). It has to deepcopy the underlying model.
 
 The constructor of `deel.puncc.api.prediction.BasePredictor` takes in the model to be wrapped, a flag to inform if the model is already trained
-and compilation keyword arguments if the underlying model needs to be compiled (such as in keras).
+and compilation keyword arguments if the underlying model needs to be compiled (such as in TensorFlow or PyTorch).
 
 The constructor of `deel.puncc.api.prediction.DualPredictor` is conceptually similar but take as arguments a list of two models, a list of two trained flags and a list of two compilation kwargs.
 Such predictor is useful when the calibration relies of several models (such as upper and lower quantiles in CQR).
@@ -200,9 +205,10 @@ we need to create a predictor in which we redefine the `predict` call:
 
 ```python
 from sklearn.ensemble import RandomForestClassifier
+from deel.puncc.api.prediction import BasePredictor
 
 # Create rf classifier
-rf_model = (n_estimators=100, random_state=0)
+rf_model = RandomForestClassifier(n_estimators=100, random_state=0)
 
 # Create a wrapper of the random forest model to redefine its predict method
 # into logits predictions. Make sure to subclass BasePredictor.
@@ -261,8 +267,7 @@ def my_psf(y_pred, nonconf_scores_quantile):
     return y_lower, y_upper
 
 ## Calibrator construction
-my_calibrator = BaseCalibrator(nonconf_score_func=my_ncf,
-                                pred_set_func=my_psf)
+my_calibrator = BaseCalibrator(nonconf_score_func=my_ncf, pred_set_func=my_psf)
 ```
 
 ## Splitter

diff --git a/deel/puncc/api/calibration.py b/deel/puncc/api/calibration.py
@@ -104,6 +104,11 @@ def prediction_set_function(y_pred, scores_quantile):
             pred_set_func=prediction_set_function
         )
 
+        # Generate dummy data and predictions
+        y_pred_calib = np.random.rand(1000)
+        y_true_calib = np.random.rand(1000)
+        y_pred_test = np.random.rand(1000)
+
         # The nonconformity scores are computed by calling the `fit` method
         # on the calibration dataset.
         calibrator.fit(y_pred=y_pred_calib, y_true=y_true_calib)

diff --git a/deel/puncc/classification.py b/deel/puncc/classification.py
@@ -23,11 +23,12 @@
 """
 This module implements usual conformal classification wrappers.
 """
-import numpy as np
 from typing import Iterable
 from typing import Optional
 from typing import Tuple
 
+import numpy as np
+
 from deel.puncc.api import nonconformity_scores
 from deel.puncc.api import prediction_sets
 from deel.puncc.api.calibration import BaseCalibrator
@@ -118,7 +119,8 @@ def predict(self, X, **kwargs):
 
         # The call to `fit` trains the model and computes the nonconformity
         # scores on the calibration set
-        raps_cp.fit(X_fit, y_fit, X_calib, y_calib)
+        raps_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)
+
 
         # The predict method infers prediction intervals with respect to
         # the significance level alpha = 20%
@@ -323,7 +325,7 @@ def predict(self, X, **kwargs):
 
         # The call to `fit` trains the model and computes the nonconformity
         # scores on the calibration set
-        aps_cp.fit(X_fit, y_fit, X_calib, y_calib)
+        aps_cp.(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)
 
         # The predict method infers prediction intervals with respect to
         # the significance level alpha = 20%

diff --git a/deel/puncc/regression.py b/deel/puncc/regression.py
@@ -91,7 +91,7 @@ class SplitCP:
 
         # The call to `fit` trains the model and computes the nonconformity
         # scores on the calibration set
-        split_cp.fit(X_fit, y_fit, X_calib, y_calib)
+        split_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)
 
         # The predict method infers prediction intervals with respect to
         # the significance level alpha = 20%
@@ -305,7 +305,7 @@ class LocallyAdaptiveCP(SplitCP):
 
         # The call to `fit` trains the model and computes the nonconformity
         # scores on the calibration set
-        lacp.fit(X_fit, y_fit, X_calib, y_calib)
+        lacp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)
 
         # The predict method infers prediction intervals with respect to
         # the significance level alpha = 20%
@@ -400,11 +400,11 @@ class CQR(SplitCP):
 
         # The call to `fit` trains the model and computes the nonconformity
         # scores on the calibration set
-        crq.fit(X_fit, y_fit, X_calib, y_calib)
+        crq.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)
 
         # The predict method infers prediction intervals with respect to
         # the significance level alpha = 20%
-        Y_pred, y_pred_lower, y_pred_upper = crq.predict(X_test, alpha=.2)
+        y_pred, y_pred_lower, y_pred_upper = crq.predict(X_test, alpha=.2)
 
         # Compute marginal coverage and average width of the prediction intervals
         coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper)
@@ -587,6 +587,8 @@ class EnbPI:
 
     Example::
 
+        import numpy as np
+
         from deel.puncc.regression import EnbPI
         from deel.puncc.api.prediction import BasePredictor
 
@@ -607,11 +609,6 @@ class EnbPI:
             X, y, test_size=.2, random_state=0
         )
 
-        # Split train data into fit and calibration
-        X_fit, X_calib, y_fit, y_calib = train_test_split(
-            X, y, test_size=.2, random_state=0
-        )
-
         # Create rf regressor
         rf_model = RandomForestRegressor(n_estimators=100, random_state=0)
         # Wrap model in a predictor
@@ -623,12 +620,14 @@ class EnbPI:
             agg_func_loo=np.mean,
             random_state=0,
         )
+
         # The call to `fit` trains the model and computes the nonconformity
         # scores on the oob calibration sets
         enbpi.fit(X, y)
+
         # The predict method infers prediction intervals with respect to
         # the significance level alpha = 20%
-        Y_pred, y_pred_lower, y_pred_upper = enbpi.predict(
+        y_pred, y_pred_lower, y_pred_upper = enbpi.predict(
             X_test, alpha=.2, y_true=y_test, s=None
         )
 
@@ -936,29 +935,28 @@ class AdaptiveEnbPI(EnbPI):
             X, y, test_size=.2, random_state=0
         )
 
-        # Split train data into fit and calibration
-        X_fit, X_calib, y_fit, y_calib = train_test_split(
-            X, y, test_size=.2, random_state=0
-        )
-
         # Create two models mu (mean) and sigma (dispersion)
         mean_model = RandomForestRegressor(n_estimators=100, random_state=0)
         sigma_model = RandomForestRegressor(n_estimators=100, random_state=0)
+
         # Wrap models in a mean/variance predictor
         mean_var_predictor = MeanVarPredictor([mean_model, sigma_model])
+
         # CP method initialization
         aenbpi = AdaptiveEnbPI(
             mean_var_predictor,
             B=30,
             agg_func_loo=np.mean,
             random_state=0,
         )
+
         # The call to `fit` trains the model and computes the nonconformity
         # scores on the oob calibration sets
         aenbpi.fit(X, y)
+
         # The predict method infers prediction intervals with respect to
         # the significance level alpha = 20%
-        Y_pred, y_pred_lower, y_pred_upper = aenbpi.predict(
+        y_pred, y_pred_lower, y_pred_upper = aenbpi.predict(
             X_test, alpha=.2, y_true=y_test, s=None
         )
 
@@ -997,8 +995,8 @@ def _compute_residuals(self, y_pred, y_true):
     def _compute_boot_residuals(self, boot_pred, y_true):
         loo_pred = (self._oob_matrix * boot_pred[:, :, 0].T).sum(-1)
         loo_sigma = (self._oob_matrix * boot_pred[:, :, 1].T).sum(-1)
-        Y_pred = np.stack((loo_pred, loo_sigma), axis=-1)
-        residuals = self._compute_residuals(y_pred=Y_pred, y_true=y_true)
+        y_pred = np.stack((loo_pred, loo_sigma), axis=-1)
+        residuals = self._compute_residuals(y_pred=y_pred, y_true=y_true)
         return list(residuals)
 
     def _compute_loo_predictions(self, boot_pred):