Merge pull request #211 from googlesamples/face-landmarker-pi-sample

Face Landmarker sample for Raspberry Pi
google-ai-edge · Aug 18, 2023 · 93f7378 · 93f7378
2 parents ea9d5d3 + e84640e
commit 93f7378
Show file tree

Hide file tree

Showing 9 changed files with 356 additions and 4 deletions.
diff --git a/examples/face_landmarker/raspberry_pi/README.md b/examples/face_landmarker/raspberry_pi/README.md
@@ -0,0 +1,67 @@
+# MediaPipe Face Landmarker example with Raspberry Pi
+
+This example uses [MediaPipe](https://github.com/google/mediapipe) with Python on
+a Raspberry Pi to perform real-time face landmarks detection using images
+streamed from the camera.
+
+## Set up your hardware
+
+Before you begin, you need to
+[set up your Raspberry Pi](https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up)
+with Raspberry 64-bit Pi OS (preferably updated to Buster).
+
+You also need to [connect and configure the Pi Camera](
+https://www.raspberrypi.org/documentation/configuration/camera.md) if you use
+the Pi Camera. This code also works with USB camera connect to the Raspberry Pi.
+
+And to see the results from the camera, you need a monitor connected
+to the Raspberry Pi. It's okay if you're using SSH to access the Pi shell
+(you don't need to use a keyboard connected to the Pi)—you only need a monitor
+attached to the Pi to see the camera stream.
+
+## Install MediaPipe
+
+You can install the required dependencies using the setup.sh script provided with this project.
+
+## Download the examples repository
+
+First, clone this Git repo onto your Raspberry Pi.
+
+Run this script to install the required dependencies and download the task file:
+
+```
+cd mediapipe/examples/face_landmarker/raspberry_pi
+sh setup.sh
+```
+
+## Run the example
+```
+python3 detect.py
+```
+*   You can optionally specify the `model` parameter to set the task file to be used:
+    *   The default value is `face_landmarker.task`
+    *   TensorFlow Lite face landmarker models **with metadata**  
+        * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/face_landmarker/index#models)
+*   You can optionally specify the `numFaces` parameter to the maximum 
+    number of faces that can be detected by the landmarker:
+    *   Supported value: A positive integer.
+    *   Default value: `1`
+*   You can optionally specify the `minFaceDetectionConfidence` parameter to adjust the
+    minimum confidence score for face detection to be considered successful:
+    *   Supported value: A floating-point number.
+    *   Default value: `0.5`
+*   You can optionally specify the `minFacePresenceConfidence` parameter to adjust the 
+    minimum confidence score of hand presence score in the face landmark detection:
+    *   Supported value: A floating-point number.
+    *   Default value: `0.5`
+*   You can optionally specify the `minTrackingConfidence` parameter to adjust the 
+    minimum confidence score for the face tracking to be considered successful:
+    *   Supported value: A floating-point number.
+    *   Default value: `0.5`
+*   Example usage:
+    ```
+    python3 detect.py \
+      --model face_landmarker.task \
+      --numFaces 2 \
+      --minFaceDetectionConfidence 0.5
+    ```
diff --git a/examples/face_landmarker/raspberry_pi/detect.py b/examples/face_landmarker/raspberry_pi/detect.py
@@ -0,0 +1,276 @@
+# Copyright 2023 The MediaPipe Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Main scripts to run face landmarker."""
+
+import argparse
+import sys
+import time
+
+import cv2
+import mediapipe as mp
+
+from mediapipe.tasks import python
+from mediapipe.tasks.python import vision
+from mediapipe.framework.formats import landmark_pb2
+
+mp_face_mesh = mp.solutions.face_mesh
+mp_drawing = mp.solutions.drawing_utils
+mp_drawing_styles = mp.solutions.drawing_styles
+
+# Global variables to calculate FPS
+COUNTER, FPS = 0, 0
+START_TIME = time.time()
+DETECTION_RESULT = None
+
+
+def run(model: str, num_faces: int,
+        min_face_detection_confidence: float,
+        min_face_presence_confidence: float, min_tracking_confidence: float,
+        camera_id: int, width: int, height: int) -> None:
+    """Continuously run inference on images acquired from the camera.
+
+  Args:
+      model: Name of the face landmarker model bundle.
+      num_faces: Max number of faces that can be detected by the landmarker.
+      min_face_detection_confidence: The minimum confidence score for face
+        detection to be considered successful.
+      min_face_presence_confidence: The minimum confidence score of face
+        presence score in the face landmark detection.
+      min_tracking_confidence: The minimum confidence score for the face
+        tracking to be considered successful.
+      camera_id: The camera id to be passed to OpenCV.
+      width: The width of the frame captured from the camera.
+      height: The height of the frame captured from the camera.
+  """
+
+    # Start capturing video input from the camera
+    cap = cv2.VideoCapture(camera_id)
+    cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
+    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
+
+    # Visualization parameters
+    row_size = 50  # pixels
+    left_margin = 24  # pixels
+    text_color = (0, 0, 0)  # black
+    font_size = 1
+    font_thickness = 1
+    fps_avg_frame_count = 10
+
+    # Label box parameters
+    label_background_color = (255, 255, 255)  # White
+    label_padding_width = 1500  # pixels
+
+    def save_result(result: vision.FaceLandmarkerResult,
+                    unused_output_image: mp.Image, timestamp_ms: int):
+        global FPS, COUNTER, START_TIME, DETECTION_RESULT
+
+        # Calculate the FPS
+        if COUNTER % fps_avg_frame_count == 0:
+            FPS = fps_avg_frame_count / (time.time() - START_TIME)
+            START_TIME = time.time()
+
+        DETECTION_RESULT = result
+        COUNTER += 1
+
+    # Initialize the face landmarker model
+    base_options = python.BaseOptions(model_asset_path=model)
+    options = vision.FaceLandmarkerOptions(
+        base_options=base_options,
+        running_mode=vision.RunningMode.LIVE_STREAM,
+        num_faces=num_faces,
+        min_face_detection_confidence=min_face_detection_confidence,
+        min_face_presence_confidence=min_face_presence_confidence,
+        min_tracking_confidence=min_tracking_confidence,
+        output_face_blendshapes=True,
+        result_callback=save_result)
+    detector = vision.FaceLandmarker.create_from_options(options)
+
+    # Continuously capture images from the camera and run inference
+    while cap.isOpened():
+        success, image = cap.read()
+        if not success:
+            sys.exit(
+                'ERROR: Unable to read from webcam. Please verify your webcam settings.'
+            )
+
+        image = cv2.flip(image, 1)
+
+        # Convert the image from BGR to RGB as required by the TFLite model.
+        rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_image)
+
+        # Run face landmarker using the model.
+        detector.detect_async(mp_image, time.time_ns() // 1_000_000)
+
+        # Show the FPS
+        fps_text = 'FPS = {:.1f}'.format(FPS)
+        text_location = (left_margin, row_size)
+        current_frame = image
+        cv2.putText(current_frame, fps_text, text_location,
+                    cv2.FONT_HERSHEY_DUPLEX,
+                    font_size, text_color, font_thickness, cv2.LINE_AA)
+
+        if DETECTION_RESULT:
+            # Draw landmarks.
+            for face_landmarks in DETECTION_RESULT.face_landmarks:
+                face_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
+                face_landmarks_proto.landmark.extend([
+                    landmark_pb2.NormalizedLandmark(x=landmark.x,
+                                                    y=landmark.y,
+                                                    z=landmark.z) for
+                    landmark in
+                    face_landmarks
+                ])
+                mp_drawing.draw_landmarks(
+                    image=current_frame,
+                    landmark_list=face_landmarks_proto,
+                    connections=mp_face_mesh.FACEMESH_TESSELATION,
+                    landmark_drawing_spec=None,
+                    connection_drawing_spec=mp.solutions.drawing_styles
+                    .get_default_face_mesh_tesselation_style())
+                mp_drawing.draw_landmarks(
+                    image=current_frame,
+                    landmark_list=face_landmarks_proto,
+                    connections=mp_face_mesh.FACEMESH_CONTOURS,
+                    landmark_drawing_spec=None,
+                    connection_drawing_spec=mp.solutions.drawing_styles
+                    .get_default_face_mesh_contours_style())
+                mp_drawing.draw_landmarks(
+                    image=current_frame,
+                    landmark_list=face_landmarks_proto,
+                    connections=mp_face_mesh.FACEMESH_IRISES,
+                    landmark_drawing_spec=None,
+                    connection_drawing_spec=mp.solutions.drawing_styles
+                    .get_default_face_mesh_iris_connections_style())
+
+        # Expand the right side frame to show the blendshapes.
+        current_frame = cv2.copyMakeBorder(current_frame, 0, 0, 0,
+                                           label_padding_width,
+                                           cv2.BORDER_CONSTANT, None,
+                                           label_background_color)
+
+        if DETECTION_RESULT:
+          # Define parameters for the bars and text
+          legend_x = current_frame.shape[
+                         1] - label_padding_width + 20  # Starting X-coordinate (20 as a margin)
+          legend_y = 30  # Starting Y-coordinate
+          bar_max_width = label_padding_width - 40  # Max width of the bar with some margin
+          bar_height = 8  # Height of the bar
+          gap_between_bars = 5  # Gap between two bars
+          text_gap = 5  # Gap between the end of the text and the start of the bar
+
+          face_blendshapes = DETECTION_RESULT.face_blendshapes
+
+          if face_blendshapes:
+              for idx, category in enumerate(face_blendshapes[0]):
+                  category_name = category.category_name
+                  score = round(category.score, 2)
+
+                  # Prepare text and get its width
+                  text = "{} ({:.2f})".format(category_name, score)
+                  (text_width, _), _ = cv2.getTextSize(text,
+                                                       cv2.FONT_HERSHEY_SIMPLEX,
+                                                       0.4, 1)
+
+                  # Display the blendshape name and score
+                  cv2.putText(current_frame, text,
+                              (legend_x, legend_y + (bar_height // 2) + 5),
+                              # Position adjusted for vertical centering
+                              cv2.FONT_HERSHEY_SIMPLEX,
+                              0.4,  # Font size
+                              (0, 0, 0),  # Black color
+                              1,
+                              cv2.LINE_AA)  # Thickness
+
+                  # Calculate bar width based on score
+                  bar_width = int(bar_max_width * score)
+
+                  # Draw the bar to the right of the text
+                  cv2.rectangle(current_frame,
+                                (legend_x + text_width + text_gap, legend_y),
+                                (legend_x + text_width + text_gap + bar_width,
+                                 legend_y + bar_height),
+                                (0, 255, 0),  # Green color
+                                -1)  # Filled bar
+
+                  # Update the Y-coordinate for the next bar
+                  legend_y += (bar_height + gap_between_bars)
+
+        cv2.imshow('face_landmarker', current_frame)
+
+        # Stop the program if the ESC key is pressed.
+        if cv2.waitKey(1) == 27:
+            break
+
+    detector.close()
+    cap.release()
+    cv2.destroyAllWindows()
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument(
+        '--model',
+        help='Name of face landmarker model.',
+        required=False,
+        default='face_landmarker.task')
+    parser.add_argument(
+        '--numFaces',
+        help='Max number of faces that can be detected by the landmarker.',
+        required=False,
+        default=1)
+    parser.add_argument(
+        '--minFaceDetectionConfidence',
+        help='The minimum confidence score for face detection to be considered '
+             'successful.',
+        required=False,
+        default=0.5)
+    parser.add_argument(
+        '--minFacePresenceConfidence',
+        help='The minimum confidence score of face presence score in the face '
+             'landmark detection.',
+        required=False,
+        default=0.5)
+    parser.add_argument(
+        '--minTrackingConfidence',
+        help='The minimum confidence score for the face tracking to be '
+             'considered successful.',
+        required=False,
+        default=0.5)
+    # Finding the camera ID can be very reliant on platform-dependent methods.
+    # One common approach is to use the fact that camera IDs are usually indexed sequentially by the OS, starting from 0.
+    # Here, we use OpenCV and create a VideoCapture object for each potential ID with 'cap = cv2.VideoCapture(i)'.
+    # If 'cap' is None or not 'cap.isOpened()', it indicates the camera ID is not available.
+    parser.add_argument(
+        '--cameraId', help='Id of camera.', required=False, default=0)
+    parser.add_argument(
+        '--frameWidth',
+        help='Width of frame to capture from camera.',
+        required=False,
+        default=1280)
+    parser.add_argument(
+        '--frameHeight',
+        help='Height of frame to capture from camera.',
+        required=False,
+        default=960)
+    args = parser.parse_args()
+
+    run(args.model, int(args.numFaces), args.minFaceDetectionConfidence,
+        args.minFacePresenceConfidence, args.minTrackingConfidence,
+        int(args.cameraId), args.frameWidth, args.frameHeight)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/face_landmarker/raspberry_pi/requirements.txt b/examples/face_landmarker/raspberry_pi/requirements.txt
@@ -0,0 +1 @@
+mediapipe
diff --git a/examples/face_landmarker/raspberry_pi/setup.sh b/examples/face_landmarker/raspberry_pi/setup.sh
@@ -0,0 +1,5 @@
+# Install Python dependencies.
+python3 -m pip install pip --upgrade
+python3 -m pip install -r requirements.txt
+
+wget -O face_landmarker.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
diff --git a/examples/gesture_recognizer/raspberry_pi/README.md b/examples/gesture_recognizer/raspberry_pi/README.md
@@ -44,7 +44,7 @@ python3 recognize.py
         * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#models)
         * Custom models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#custom_models) are supported.
 *   You can optionally specify the `numHands` parameter to the maximum 
-    number of hands can be detected by the recognizer:
+    number of hands that can be detected by the recognizer:
     *   Supported value: A positive integer (1-2)
     *   Default value: `1`
 *   You can optionally specify the `minHandDetectionConfidence` parameter to adjust the

diff --git a/examples/gesture_recognizer/raspberry_pi/recognize.py b/examples/gesture_recognizer/raspberry_pi/recognize.py
@@ -197,7 +197,7 @@ def main():
       default='gesture_recognizer.task')
   parser.add_argument(
       '--numHands',
-      help='Max number of hands can be detected by the recognizer.',
+      help='Max number of hands that can be detected by the recognizer.',
       required=False,
       default=1)
   parser.add_argument(

diff --git a/examples/image_classification/raspberry_pi/README.md b/examples/image_classification/raspberry_pi/README.md
@@ -44,7 +44,7 @@ python3 classify.py
     *   TensorFlow Lite image classification models **with metadata**  
         * Models from [TensorFlow Hub](https://tfhub.dev/tensorflow/collections/lite/task-library/image-classifier/1)
         * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/image_classifier/index#models)
-        * Models trained with [TensorFlow Lite Model Maker](https://developers.google.com/mediapipe/solutions/customization/image_classifier) are supported.
+        * Models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/customization/image_classifier) are supported.
 *   You can optionally specify the `maxResults` parameter to limit the list of
     classification results:
     *   Supported value: A positive integer.

diff --git a/examples/object_detection/raspberry_pi/README.md b/examples/object_detection/raspberry_pi/README.md
@@ -56,7 +56,7 @@ visualization.
     *   The default value is `efficientdet_lite0.tflite`
     *   TensorFlow Lite object detection models **with metadata**  
         * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/object_detector/index#models)
-        * Models trained with [TensorFlow Lite Model Maker](https://developers.google.com/mediapipe/solutions/customization/object_detector) are supported.
+        * Models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/customization/object_detector) are supported.
 *   You can optionally specify the `maxResults` parameter to limit the list of
     detection results:
     *   Supported value: A positive integer.

diff --git a/examples/text_classification/raspberry_pi/README.md b/examples/text_classification/raspberry_pi/README.md
@@ -39,6 +39,9 @@ python3 classify.py --inputText "Your text goes here"
 *   You can optionally specify the `model` parameter to set the TensorFlow Lite
     model to be used:
     *   The default value is `classifier.tflite`
+    *   TensorFlow Lite text classification models **with metadata**  
+        * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/text/text_classifier/index#models)
+        * Models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/customization/text_classifier) are supported.
 *   Example usage:
     ```
     python3 classify.py \