unicef · domdinicola · Oct 11, 2024 · Oct 7, 2024 · Oct 8, 2024 · Oct 11, 2024
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,5 @@
 .*
 ~*
+__pycache__
 
 !.github
-
-__pycache__
diff --git a/docs/.gitignore b/docs/.gitignore
@@ -1,3 +1,5 @@
 !**/.pages
 !.includes
 _theme/.templates
+
+__pycache__
diff --git a/docs/components/.pages b/docs/components/.pages
@@ -3,8 +3,8 @@ nav:
 
   - Aurora: aurora
   - Kobo: kobo
-  - Deduplication: hde
-  - Country Report: reporting
   - Payment Gateway: pg
+  - Country Report: reporting
+  - Deduplication: hde
   - RapidPro: rapidpro
 #  - workspace.md
diff --git a/docs/components/aurora/.pages b/docs/components/aurora/.pages
@@ -1,3 +1,3 @@
 nav:
   - index.md
-  - setup.md
+  - setup
diff --git a/docs/components/aurora/index.md b/docs/components/aurora/index.md
@@ -18,4 +18,4 @@ The strengths of Aurora are:
 
 ## Repository
 
-<https://github.com/unicef/hope-aurora>
+> Repo: <https://github.com/unicef/hope-aurora>
diff --git a/docs/components/aurora/setup/config.md b/docs/components/aurora/setup/config.md
@@ -0,0 +1,7 @@
+# Setup HOPE integration
+
+- Add aurora_token in the user
+- Add aurora_server in the Constance Config
+- Fetch data from Aurora
+- Associate Organizations to Business Areas
+- Associate Projects to Programmes 
diff --git a/docs/components/aurora/setup/docker.md b/docs/components/aurora/setup/docker.md
@@ -0,0 +1,19 @@
+# Build and use your docker
+
+After you have cloned the repo, be sure to have a Reddis and PostgreSQL server running on your machine
+
+    export [email protected]
+    export ADMIN_PASSWORD=password
+    export DATABASE_URL=postgres://postgres:@127.0.0.1:5432/aurora
+    export CACHE_URL=redis://127.0.0.1:6379/1?client_class=django_redis.client.DefaultClient
+
+    cd docker
+
+    make build run
+
+
+## Use provided compose.yml
+
+    docker compose up
+
+navigate to http://localhost:8000/admin/ and login using `[email protected]/password`
diff --git a/docs/components/aurora/setup.md → docs/components/aurora/setup/virtualenv.md b/docs/components/aurora/setup.md → docs/components/aurora/setup/virtualenv.md
@@ -15,39 +15,39 @@ Prerequisites:
 
 ## Create virtualenvironment
 
-2. Checkout code
+1. Checkout code
 
     ```
     git clone https://github.com/unicef/hope-aurora
     git config branch.autosetuprebase always
 
     ```
 
-1. In the shell:
+2. In the shell:
 
     ```
     pdm venv create
     pdm use
     pdm venv activate
     ```
 
-1. Check your virtualenv is properly created
+3. Check your virtualenv is properly created
 
     ```pdm info```
 
 
-1. Install the package
+4. Install the package
 
     ```
      pdm install
      pdm run pre-commit install
     ```
 
 
-1. Add `export PYTHONPATH="$PYTHONPATH:./src"`
+5. Add `export PYTHONPATH="$PYTHONPATH:./src"`
 
 
-1. Check your environment: 
+6. Check your environment: 
 
     `./manage.py env --check` and configure the missing variables.
 
@@ -57,7 +57,7 @@ Prerequisites:
 
             ./manage.py env --develop --config --pattern='export {key}={value}'   
 
-1. Run upgrade command to properly initialize the application: 
+7. Run upgrade command to properly initialize the application: 
 
     `./manage.py upgrade --admin-email ${ADMIN_EMAIL} --admin-password ${ADMIN_PASSWORD}`
 
@@ -83,37 +83,3 @@ echo "unset PS1" >> .envrc
     The first time after you have created or modified the _.envrc_ file you will have to authorize it using:
 
         direnv allow
-
-# Run
-
-To start working with Aurora you can:
-
-
-### Build and use your docker
-
-After you have cloned the repo, be sure to have a Reddis and PostgreSQL server running on your machine
-
-    export [email protected]
-    export ADMIN_PASSWORD=password
-    export DATABASE_URL=postgres://postgres:@127.0.0.1:5432/aurora
-    export CACHE_URL=redis://127.0.0.1:6379/1?client_class=django_redis.client.DefaultClient
-
-    cd docker
-
-    make build run
-
-
-### Use provided compose.yml
-
-    docker compose up
-
-navigate to http://localhost:8000/admin/ and login using `[email protected]/password`
-
-
-### Setup HOPE integration
-
-- Add aurora_token in the user
-- Add aurora_server in the Constance Config
-- Fetch data from Aurora
-- Associate Organizations to Business Areas
-- Associate Projects to Programmes 
diff --git a/docs/components/hde/deduplication_description.md b/docs/components/hde/deduplication_description.md
@@ -0,0 +1 @@
+It provides users with powerful capabilities to identify and remove duplicate records within the system, ensuring that data remains clean, consistent, and reliable.
diff --git a/docs/components/hde/development.md b/docs/components/hde/development.md
@@ -3,6 +3,7 @@
 To develop the service locally, you can utilize the provided `compose.yml` file. This configuration file defines all the necessary services, including the primary application and its dependencies, to create a consistent development environment. By using **Docker Compose**, you can effortlessly spin up the entire application stack, ensuring that all components work seamlessly together.
 
 To build and start the service, along with its dependencies, run the following command:
+
     docker compose up --build
 
 

diff --git a/docs/components/hde/did/workflow.md b/docs/components/hde/did/workflow.md
@@ -1,47 +1,86 @@
-The Image Processing and Duplicate Detection workflow is designed to provide reliable face detection, recognition, and duplicate detection by leveraging a pre-trained deep learning model.
+---
+tags:
+  - Deduplication
+---
+
+# Image Processing and Duplicate Detection
+
+The workflow uses pre-trained models from [OpenCV](https://opencv.org/) for face detection and [dlib](http://dlib.net/) for face recognition and landmark detection. This setup provides a fast, reliable solution for real-time applications, without requiring the training of models from scratch. OpenCV handles face detection using a Caffe-based model, while **dlib**, accessed through the [face_recognition](https://pypi.org/project/face-recognition/) library, manages recognition and duplicate identification.
+
+Future updates will involve custom-trained models to further improve performance.
 
 ## Inference Mode Operation
 
-This application operates strictly in inference mode, which means that it does not perform training but instead relies on a pre-trained model for face recognition tasks. This mode ensures that the application can rapidly deploy face recognition capabilities without the computational cost or time required for training models from scratch.
+This application operates entirely in inference mode, relying on pre-trained models for both face detection and recognition tasks. **OpenCV** handles face detection, and **face_recognition**, a Python wrapper for **dlib**, performs face recognition and duplicate identification. This approach ensures efficient, real-time processing without the need for additional training, allowing the application to quickly deploy its capabilities.
+
+- **OpenCV**: Optimized for fast face detection, ideal for real-time image and video applications.
+- **dlib's face_recognition**: Focuses on generating face embeddings for comparison, providing high accuracy in identification.
+
+By combining OpenCV for detection and dlib for recognition, the system offers a balance of speed and precision.
+
+### Pre-Trained Models Storage
+
+- **OpenCV** uses a pre-trained [Caffe model](https://caffe.berkeleyvision.org/) stored in Azure Blob Storage, automatically downloaded at application startup.
+- **face_recognition** utilizes a pre-trained [dlib model](https://pypi.org/project/face_recognition_models/) stored locally within the container’s library directory.
+
+Administrators can manually update the **Caffe model** via the admin panel, allowing flexible updates or new model versions without altering the application code.
+
+---
+
+## Face Detection and Recognition Models
 
-### Pre-Trained Model Usage.
+### OpenCV Model Details
 
-The pre-trained model is stored in Azure Blob Storage and is automatically downloaded by the application when it starts. This process ensures that the latest version of the model is always available for inference.
-### Manual Model Update.
+OpenCV powers the face detection component using a pre-trained model designed for real-time performance.
 
-In addition to automatic loading, administrators have the option to manually update the model through the admin panel. This feature provides flexibility for applying updates or new models when improvements or changes are required without modifying the underlying code.
+#### Model Components
 
-## Model Details
+- **deploy.prototxt**: Defines the network architecture and parameters for model execution.
+- **res10_300x300_ssd_iter_140000.caffemodel**: Contains trained weights, generated after 140,000 iterations using the **Caffe** framework.
 
-The face recognition capabilities are powered by the [OpenCV](https://github.com/opencv/opencv) library. Currently, the application utilizes an open-source, pre-trained model specifically designed for face detection.
+#### Model Architecture
 
-### Model Components
+- **Res10 Architecture**: A lightweight model that balances speed and accuracy, perfect for real-time detection.
+- **300x300 Input Resolution**: Optimized for face detection at this resolution, ensuring a balance between detail and efficiency.
+- **SSD (Single Shot MultiBox Detector)**: A method that predicts bounding boxes and confidence scores in a single pass, allowing rapid detection of multiple faces in a single image.
 
-- **deploy.prototxt**: This file defines the model architecture, including the network layers and the specific parameters used for each layer. It serves as a blueprint that guides how the model processes input data.
-- **res10_300x300_ssd_iter_140000.caffemodel**: This file contains the trained weights of the model. It was trained using the **Caffe** deep learning framework, with a total of 140,000 iterations, ensuring robustness in face detection tasks.
+### Dlib Model Details
 
-### Model Architecture
+The **dlib** models used for recognition and facial landmark detection include:
 
-- The model follows the **Res10** architecture, which is known for its efficiency in detecting faces. Res10 is a lightweight model that balances speed and accuracy, making it suitable for real-time applications.
-- The model operates with a fixed input resolution of **300x300**, optimizing detection for faces within that scale. This resolution offers a compromise between detail and processing efficiency, allowing the model to quickly identify facial features without excessive computational load.
-- SSD Methodology. The model utilizes the **Single Shot MultiBox Detector (SSD)** methodology, which is a popular approach for object detection. SSD is designed to predict both the bounding boxes and the confidence scores for each object in a single forward pass through the network. By leveraging the SSD approach, the model can efficiently detect multiple faces in a single image, making it suitable for batch processing and applications where rapid detection is required.
+1. **dlib_face_recognition_resnet_model_v1.dat**
 
+    A modified **ResNet-34** model generating **128-dimensional face embeddings** for face recognition, achieving **99.38% accuracy** on the LFW benchmark.
 
-## Worklow Diagram
+2. **mmod_human_face_detector.dat**
+    A **CNN-based Max-Margin Object Detector (MMOD)** for accurate face detection, especially under difficult conditions like varied orientations or lighting.
 
-The workflow diagram illustrates the overall process of Image Processing and Duplicate Detection within the system, showcasing how different components interact to achieve **face detection**, **recognition**, and **duplicate identification**. 
+3. **shape_predictor_5_face_landmarks.dat**
+    Detects **5 key facial landmarks** (eye corners and nose base), optimized for fast face alignment.
+
+4. **shape_predictor_68_face_landmarks.dat**
+    Detects **68 facial landmarks** (eyes, nose, mouth, jawline), used for more detailed facial alignment and analysis.
+
+---
+
+## Workflow Diagram
+
+The workflow diagram illustrates the overall process of image processing and duplicate detection. **OpenCV** is used for face detection, while **face_recognition** (built on **dlib**) handles face recognition and duplicate identification.
 
 ```mermaid
 flowchart LR
-  subgraph DNNManager[DNN Manager]
-      direction TB
-      load_model[Load Model] -- computation <a href="../config/#dnn_backend">backend</a>\ntarget <a href="../config/#dnn_target">device</a>  --> set_preferences[Set Preferences]
-  end
-
   subgraph ImageProcessing[Image Processing]
       direction LR
 
       subgraph FaceDetection[Face Detection]
+
+        subgraph DNNManager[DNN Manager]
+            direction TB
+            load_model[Load Caffe Model] -- computation <a href="../config/#dnn_backend">backend</a>\ntarget <a href="../config/#dnn_target">device</a>  --> set_preferences[Set Preferences]
+        end
+
+          DNNManager --> run_model
+
           direction TB
           load_image[Load Image] -- decoded image as 3D numpy array\n(height, width, channels of BlueGreeRed color space) --> prepare_image[Prepare Image] -- blob 4D tensor\n(normalized size, use <a href="../config/#blob_from_image_scale_factor">scale factor</a> and <a href="../config/#blob_from_image_mean_values">means</a>) --> run_model[Run Model] -- shape (1, 1, N, 7),\n1 image\nN is the number of detected faces\neach face is described by the 7 detection values--> filter_results[Filter Results] -- <a href="../config/#face_detection_confidence">confidence</a> is above the minimum threshold,\n<a href="../config/#nms_threshold">NMS</a> to suppress overlapping bounding boxes --> return_detections[Return Detections]
       end
@@ -57,7 +96,7 @@ flowchart LR
       load_encodings[Load Encodings] --> compare_encodings[Compare Encodings] -- face distance less then <a href="../config/#face_distance_threshold">threshold</a> --> return_duplicates[Return Duplicates]
   end
 
-  DNNManager --> ImageProcessing --> DuplicateFinder
+  ImageProcessing --> DuplicateFinder
   FaceDetection --> FaceRecognition
 
 ```
diff --git a/docs/components/hde/index.md b/docs/components/hde/index.md
@@ -1,7 +1,8 @@
 # Deduplication
 
-Deduplication Engine component of the HOPE ecosystem. It provides users with powerful capabilities to identify and remove duplicate records within the system, ensuring that data remains clean, consistent, and reliable.
+Deduplication Engine component of the HOPE ecosystem.
 
+--8<-- "components/hde/deduplication_description.md"
 
 ## Repository
 

diff --git a/docs/components/hde/setup.md b/docs/components/hde/setup.md
@@ -1,3 +1,8 @@
+---
+tags:
+  - Deduplication
+---
+
 ## Prerequisites
 
 This project utilizes [PDM](https://pdm-project.org/) as the package manager for managing Python dependencies and environments. 
@@ -78,7 +83,7 @@ This backend is used for storing locally downloaded DNN model files and encoded
 #####  FILE_STORAGE_DNN
 This backend is dedicated to storing DNN model files. Ensure that the following two files are present in this storage:
 
-1. *deploy.prototxt*: Defines the model architecture.
+1. *deploy.prototxt.txt*: Defines the model architecture.
 2. *res10_300x300_ssd_iter_140000.caffemodel*: Contains the pre-trained model weights.
 
 The current process involves downloading files from a [GitHub repository](https://github.com/sr6033/face-detection-with-OpenCV-and-DNN) and saving them to this specific Azure Blob Storage using command `django-admin upgrade --with-dnn-setup`, or the specialized`django-admin dnnsetup` command .

diff --git a/docs/components/hde/setup/config.md b/docs/components/hde/setup/config.md
diff --git a/docs/components/hde/setup/docker.md b/docs/components/hde/setup/docker.md
diff --git a/docs/components/hde/setup/virtualenv.md b/docs/components/hde/setup/virtualenv.md
diff --git a/docs/components/hde/troubleshooting.md b/docs/components/hde/troubleshooting.md
@@ -2,4 +2,5 @@ If you encounter issues while running the service, the **admin panel** can be a
 
 To efficiently track and monitor errors within the application, **Sentry** is integrated as the primary tool for error logging and alerting.
 
-For Sentry to work correctly, ensure that the **SENTRY_DSN** environment variable is set.
+!!! warning "Sentry environment"
+    For Sentry to work correctly, ensure that the **SENTRY_DSN** environment variable is set.
diff --git a/docs/components/pg/.pages b/docs/components/pg/.pages
@@ -1,4 +1,4 @@
 nav:
   - index.md
-  - setup.md
+  - Setup: setup
   - Western Union: wu
diff --git a/docs/components/pg/index.md b/docs/components/pg/index.md
@@ -7,7 +7,7 @@ Each FSP can have a different way to interact with the payment gateway with thou
 
 ## Repository
 
-Repo: <https://github.com/unicef/hope-payment-gateway>
+> Repo: <https://github.com/unicef/hope-payment-gateway>
 
 
 ## HOPE / PG Integration API

diff --git a/docs/components/pg/setup.md b/docs/components/pg/setup.md
diff --git a/docs/components/pg/setup/.pages b/docs/components/pg/setup/.pages
@@ -0,0 +1,4 @@
+nav:
+  - index.md
+  - virtualenv.md
+  - docker.md
diff --git a/docs/components/pg/setup/docker.md b/docs/components/pg/setup/docker.md
@@ -0,0 +1 @@
+# Docker
diff --git a/docs/components/pg/setup/index.md b/docs/components/pg/setup/index.md
diff --git a/docs/components/pg/setup/virtualenv.md b/docs/components/pg/setup/virtualenv.md
@@ -0,0 +1,37 @@
+# Virtualenv
+
+
+### System Requirements
+
+- python 3.12
+- [direnv](https://direnv.net/) - not mandatory but strongly recommended
+- [pdm](https://pdm.fming.dev/2.9/)
+
+
+**WARNING**  
+> Hope Payment Gateway implements **security first** policy. It means that configuration default values are "almost" production compliant.
+> 
+> Es. `DEBUG=False` or `SECURE_SSL_REDIRECT=True`. 
+> 
+> Be sure to run `./manage.py env --check` and  `./manage.py env -g all` to check and display your configuration
+
+
+
+### 1. Clone repo and install requirements
+    git clone https://github.com/unicef/hope-payment-gateway 
+    pdm venv create 3.12
+    pdm install
+    pdm venv activate in-project
+    pre-commit install
+
+### 2. configure your environment
+
+Uses `./manage.py env` to check required (and optional) variables to put 
+
+    ./manage.py env --check
+
+
+### 3. Run upgrade to run migrations and initial setup
+
+    ./manage.py upgrade
+
diff --git a/docs/components/reporting/.pages b/docs/components/reporting/.pages
@@ -1,4 +1,5 @@
 nav:
   - index.md
+  - setup
   - glossary.md
   - tmp.md
diff --git a/docs/components/reporting/index.md b/docs/components/reporting/index.md
@@ -11,7 +11,7 @@ This components allows user to produce reports and keep them updated customizing
 
 ## Repository
 
-<https://github.com/unicef/hope-country-report>
+> Repo: <https://github.com/unicef/hope-country-report>
 
 
 ## Features

diff --git a/docs/components/reporting/setup/config.md b/docs/components/reporting/setup/config.md
diff --git a/docs/components/reporting/setup/docker.md b/docs/components/reporting/setup/docker.md
diff --git a/docs/components/reporting/setup/virtualenv.md b/docs/components/reporting/setup/virtualenv.md