Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

d2m #18

Merged
merged 6 commits into from
Oct 11, 2024
Merged

d2m #18

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
.*
~*
__pycache__

!.github

__pycache__
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
!**/.pages
!.includes
_theme/.templates

__pycache__
4 changes: 2 additions & 2 deletions docs/components/.pages
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ nav:

- Aurora: aurora
- Kobo: kobo
- Deduplication: hde
- Country Report: reporting
- Payment Gateway: pg
- Country Report: reporting
- Deduplication: hde
- RapidPro: rapidpro
# - workspace.md
2 changes: 1 addition & 1 deletion docs/components/aurora/.pages
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
nav:
- index.md
- setup.md
- setup
2 changes: 1 addition & 1 deletion docs/components/aurora/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ The strengths of Aurora are:

## Repository

<https://github.com/unicef/hope-aurora>
> Repo: <https://github.com/unicef/hope-aurora>
7 changes: 7 additions & 0 deletions docs/components/aurora/setup/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Setup HOPE integration

- Add aurora_token in the user
- Add aurora_server in the Constance Config
- Fetch data from Aurora
- Associate Organizations to Business Areas
- Associate Projects to Programmes
19 changes: 19 additions & 0 deletions docs/components/aurora/setup/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Build and use your docker

After you have cloned the repo, be sure to have a Reddis and PostgreSQL server running on your machine

export [email protected]
export ADMIN_PASSWORD=password
export DATABASE_URL=postgres://postgres:@127.0.0.1:5432/aurora
export CACHE_URL=redis://127.0.0.1:6379/1?client_class=django_redis.client.DefaultClient

cd docker

make build run


## Use provided compose.yml

docker compose up

navigate to http://localhost:8000/admin/ and login using `[email protected]/password`
Original file line number Diff line number Diff line change
Expand Up @@ -15,39 +15,39 @@ Prerequisites:

## Create virtualenvironment

2. Checkout code
1. Checkout code

```
git clone https://github.com/unicef/hope-aurora
git config branch.autosetuprebase always

```

1. In the shell:
2. In the shell:

```
pdm venv create
pdm use
pdm venv activate
```

1. Check your virtualenv is properly created
3. Check your virtualenv is properly created

```pdm info```


1. Install the package
4. Install the package

```
pdm install
pdm run pre-commit install
```


1. Add `export PYTHONPATH="$PYTHONPATH:./src"`
5. Add `export PYTHONPATH="$PYTHONPATH:./src"`


1. Check your environment:
6. Check your environment:

`./manage.py env --check` and configure the missing variables.

Expand All @@ -57,7 +57,7 @@ Prerequisites:

./manage.py env --develop --config --pattern='export {key}={value}'

1. Run upgrade command to properly initialize the application:
7. Run upgrade command to properly initialize the application:

`./manage.py upgrade --admin-email ${ADMIN_EMAIL} --admin-password ${ADMIN_PASSWORD}`

Expand All @@ -83,37 +83,3 @@ echo "unset PS1" >> .envrc
The first time after you have created or modified the _.envrc_ file you will have to authorize it using:

direnv allow

# Run

To start working with Aurora you can:


### Build and use your docker

After you have cloned the repo, be sure to have a Reddis and PostgreSQL server running on your machine

export [email protected]
export ADMIN_PASSWORD=password
export DATABASE_URL=postgres://postgres:@127.0.0.1:5432/aurora
export CACHE_URL=redis://127.0.0.1:6379/1?client_class=django_redis.client.DefaultClient

cd docker

make build run


### Use provided compose.yml

docker compose up

navigate to http://localhost:8000/admin/ and login using `[email protected]/password`


### Setup HOPE integration

- Add aurora_token in the user
- Add aurora_server in the Constance Config
- Fetch data from Aurora
- Associate Organizations to Business Areas
- Associate Projects to Programmes
1 change: 1 addition & 0 deletions docs/components/hde/deduplication_description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
It provides users with powerful capabilities to identify and remove duplicate records within the system, ensuring that data remains clean, consistent, and reliable.
1 change: 1 addition & 0 deletions docs/components/hde/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
To develop the service locally, you can utilize the provided `compose.yml` file. This configuration file defines all the necessary services, including the primary application and its dependencies, to create a consistent development environment. By using **Docker Compose**, you can effortlessly spin up the entire application stack, ensuring that all components work seamlessly together.

To build and start the service, along with its dependencies, run the following command:

docker compose up --build


Expand Down
85 changes: 62 additions & 23 deletions docs/components/hde/did/workflow.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,86 @@
The Image Processing and Duplicate Detection workflow is designed to provide reliable face detection, recognition, and duplicate detection by leveraging a pre-trained deep learning model.
---
tags:
- Deduplication
---

# Image Processing and Duplicate Detection

The workflow uses pre-trained models from [OpenCV](https://opencv.org/) for face detection and [dlib](http://dlib.net/) for face recognition and landmark detection. This setup provides a fast, reliable solution for real-time applications, without requiring the training of models from scratch. OpenCV handles face detection using a Caffe-based model, while **dlib**, accessed through the [face_recognition](https://pypi.org/project/face-recognition/) library, manages recognition and duplicate identification.

Future updates will involve custom-trained models to further improve performance.

## Inference Mode Operation

This application operates strictly in inference mode, which means that it does not perform training but instead relies on a pre-trained model for face recognition tasks. This mode ensures that the application can rapidly deploy face recognition capabilities without the computational cost or time required for training models from scratch.
This application operates entirely in inference mode, relying on pre-trained models for both face detection and recognition tasks. **OpenCV** handles face detection, and **face_recognition**, a Python wrapper for **dlib**, performs face recognition and duplicate identification. This approach ensures efficient, real-time processing without the need for additional training, allowing the application to quickly deploy its capabilities.

- **OpenCV**: Optimized for fast face detection, ideal for real-time image and video applications.
- **dlib's face_recognition**: Focuses on generating face embeddings for comparison, providing high accuracy in identification.

By combining OpenCV for detection and dlib for recognition, the system offers a balance of speed and precision.

### Pre-Trained Models Storage

- **OpenCV** uses a pre-trained [Caffe model](https://caffe.berkeleyvision.org/) stored in Azure Blob Storage, automatically downloaded at application startup.
- **face_recognition** utilizes a pre-trained [dlib model](https://pypi.org/project/face_recognition_models/) stored locally within the container’s library directory.

Administrators can manually update the **Caffe model** via the admin panel, allowing flexible updates or new model versions without altering the application code.

---

## Face Detection and Recognition Models

### Pre-Trained Model Usage.
### OpenCV Model Details

The pre-trained model is stored in Azure Blob Storage and is automatically downloaded by the application when it starts. This process ensures that the latest version of the model is always available for inference.
### Manual Model Update.
OpenCV powers the face detection component using a pre-trained model designed for real-time performance.

In addition to automatic loading, administrators have the option to manually update the model through the admin panel. This feature provides flexibility for applying updates or new models when improvements or changes are required without modifying the underlying code.
#### Model Components

## Model Details
- **deploy.prototxt**: Defines the network architecture and parameters for model execution.
- **res10_300x300_ssd_iter_140000.caffemodel**: Contains trained weights, generated after 140,000 iterations using the **Caffe** framework.

The face recognition capabilities are powered by the [OpenCV](https://github.com/opencv/opencv) library. Currently, the application utilizes an open-source, pre-trained model specifically designed for face detection.
#### Model Architecture

### Model Components
- **Res10 Architecture**: A lightweight model that balances speed and accuracy, perfect for real-time detection.
- **300x300 Input Resolution**: Optimized for face detection at this resolution, ensuring a balance between detail and efficiency.
- **SSD (Single Shot MultiBox Detector)**: A method that predicts bounding boxes and confidence scores in a single pass, allowing rapid detection of multiple faces in a single image.

- **deploy.prototxt**: This file defines the model architecture, including the network layers and the specific parameters used for each layer. It serves as a blueprint that guides how the model processes input data.
- **res10_300x300_ssd_iter_140000.caffemodel**: This file contains the trained weights of the model. It was trained using the **Caffe** deep learning framework, with a total of 140,000 iterations, ensuring robustness in face detection tasks.
### Dlib Model Details

### Model Architecture
The **dlib** models used for recognition and facial landmark detection include:

- The model follows the **Res10** architecture, which is known for its efficiency in detecting faces. Res10 is a lightweight model that balances speed and accuracy, making it suitable for real-time applications.
- The model operates with a fixed input resolution of **300x300**, optimizing detection for faces within that scale. This resolution offers a compromise between detail and processing efficiency, allowing the model to quickly identify facial features without excessive computational load.
- SSD Methodology. The model utilizes the **Single Shot MultiBox Detector (SSD)** methodology, which is a popular approach for object detection. SSD is designed to predict both the bounding boxes and the confidence scores for each object in a single forward pass through the network. By leveraging the SSD approach, the model can efficiently detect multiple faces in a single image, making it suitable for batch processing and applications where rapid detection is required.
1. **dlib_face_recognition_resnet_model_v1.dat**

A modified **ResNet-34** model generating **128-dimensional face embeddings** for face recognition, achieving **99.38% accuracy** on the LFW benchmark.

## Worklow Diagram
2. **mmod_human_face_detector.dat**
A **CNN-based Max-Margin Object Detector (MMOD)** for accurate face detection, especially under difficult conditions like varied orientations or lighting.

The workflow diagram illustrates the overall process of Image Processing and Duplicate Detection within the system, showcasing how different components interact to achieve **face detection**, **recognition**, and **duplicate identification**.
3. **shape_predictor_5_face_landmarks.dat**
Detects **5 key facial landmarks** (eye corners and nose base), optimized for fast face alignment.

4. **shape_predictor_68_face_landmarks.dat**
Detects **68 facial landmarks** (eyes, nose, mouth, jawline), used for more detailed facial alignment and analysis.

---

## Workflow Diagram

The workflow diagram illustrates the overall process of image processing and duplicate detection. **OpenCV** is used for face detection, while **face_recognition** (built on **dlib**) handles face recognition and duplicate identification.

```mermaid
flowchart LR
subgraph DNNManager[DNN Manager]
direction TB
load_model[Load Model] -- computation <a href="../config/#dnn_backend">backend</a>\ntarget <a href="../config/#dnn_target">device</a> --> set_preferences[Set Preferences]
end

subgraph ImageProcessing[Image Processing]
direction LR

subgraph FaceDetection[Face Detection]

subgraph DNNManager[DNN Manager]
direction TB
load_model[Load Caffe Model] -- computation <a href="../config/#dnn_backend">backend</a>\ntarget <a href="../config/#dnn_target">device</a> --> set_preferences[Set Preferences]
end

DNNManager --> run_model

direction TB
load_image[Load Image] -- decoded image as 3D numpy array\n(height, width, channels of BlueGreeRed color space) --> prepare_image[Prepare Image] -- blob 4D tensor\n(normalized size, use <a href="../config/#blob_from_image_scale_factor">scale factor</a> and <a href="../config/#blob_from_image_mean_values">means</a>) --> run_model[Run Model] -- shape (1, 1, N, 7),\n1 image\nN is the number of detected faces\neach face is described by the 7 detection values--> filter_results[Filter Results] -- <a href="../config/#face_detection_confidence">confidence</a> is above the minimum threshold,\n<a href="../config/#nms_threshold">NMS</a> to suppress overlapping bounding boxes --> return_detections[Return Detections]
end
Expand All @@ -57,7 +96,7 @@ flowchart LR
load_encodings[Load Encodings] --> compare_encodings[Compare Encodings] -- face distance less then <a href="../config/#face_distance_threshold">threshold</a> --> return_duplicates[Return Duplicates]
end

DNNManager --> ImageProcessing --> DuplicateFinder
ImageProcessing --> DuplicateFinder
FaceDetection --> FaceRecognition

```
3 changes: 2 additions & 1 deletion docs/components/hde/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Deduplication

Deduplication Engine component of the HOPE ecosystem. It provides users with powerful capabilities to identify and remove duplicate records within the system, ensuring that data remains clean, consistent, and reliable.
Deduplication Engine component of the HOPE ecosystem.

--8<-- "components/hde/deduplication_description.md"

## Repository

Expand Down
7 changes: 6 additions & 1 deletion docs/components/hde/setup.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
tags:
- Deduplication
---

## Prerequisites

This project utilizes [PDM](https://pdm-project.org/) as the package manager for managing Python dependencies and environments.
Expand Down Expand Up @@ -78,7 +83,7 @@ This backend is used for storing locally downloaded DNN model files and encoded
##### FILE_STORAGE_DNN
This backend is dedicated to storing DNN model files. Ensure that the following two files are present in this storage:

1. *deploy.prototxt*: Defines the model architecture.
1. *deploy.prototxt.txt*: Defines the model architecture.
2. *res10_300x300_ssd_iter_140000.caffemodel*: Contains the pre-trained model weights.

The current process involves downloading files from a [GitHub repository](https://github.com/sr6033/face-detection-with-OpenCV-and-DNN) and saving them to this specific Azure Blob Storage using command `django-admin upgrade --with-dnn-setup`, or the specialized`django-admin dnnsetup` command .
Expand Down
Empty file.
Empty file.
Empty file.
3 changes: 2 additions & 1 deletion docs/components/hde/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ If you encounter issues while running the service, the **admin panel** can be a

To efficiently track and monitor errors within the application, **Sentry** is integrated as the primary tool for error logging and alerting.

For Sentry to work correctly, ensure that the **SENTRY_DSN** environment variable is set.
!!! warning "Sentry environment"
For Sentry to work correctly, ensure that the **SENTRY_DSN** environment variable is set.
2 changes: 1 addition & 1 deletion docs/components/pg/.pages
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
nav:
- index.md
- setup.md
- Setup: setup
- Western Union: wu
2 changes: 1 addition & 1 deletion docs/components/pg/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Each FSP can have a different way to interact with the payment gateway with thou

## Repository

Repo: <https://github.com/unicef/hope-payment-gateway>
> Repo: <https://github.com/unicef/hope-payment-gateway>


## HOPE / PG Integration API
Expand Down
1 change: 0 additions & 1 deletion docs/components/pg/setup.md

This file was deleted.

4 changes: 4 additions & 0 deletions docs/components/pg/setup/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
nav:
- index.md
- virtualenv.md
- docker.md
1 change: 1 addition & 0 deletions docs/components/pg/setup/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Docker
Empty file.
37 changes: 37 additions & 0 deletions docs/components/pg/setup/virtualenv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Virtualenv


### System Requirements

- python 3.12
- [direnv](https://direnv.net/) - not mandatory but strongly recommended
- [pdm](https://pdm.fming.dev/2.9/)


**WARNING**
> Hope Payment Gateway implements **security first** policy. It means that configuration default values are "almost" production compliant.
>
> Es. `DEBUG=False` or `SECURE_SSL_REDIRECT=True`.
>
> Be sure to run `./manage.py env --check` and `./manage.py env -g all` to check and display your configuration



### 1. Clone repo and install requirements
git clone https://github.com/unicef/hope-payment-gateway
pdm venv create 3.12
pdm install
pdm venv activate in-project
pre-commit install

### 2. configure your environment

Uses `./manage.py env` to check required (and optional) variables to put

./manage.py env --check


### 3. Run upgrade to run migrations and initial setup

./manage.py upgrade

1 change: 1 addition & 0 deletions docs/components/reporting/.pages
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
nav:
- index.md
- setup
- glossary.md
- tmp.md
2 changes: 1 addition & 1 deletion docs/components/reporting/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This components allows user to produce reports and keep them updated customizing

## Repository

<https://github.com/unicef/hope-country-report>
> Repo: <https://github.com/unicef/hope-country-report>


## Features
Expand Down
Empty file.
Empty file.
Empty file.
Loading
Loading