diff --git a/examples/face_landmarker/raspberry_pi/README.md b/examples/face_landmarker/raspberry_pi/README.md new file mode 100644 index 00000000..f9eee5ed --- /dev/null +++ b/examples/face_landmarker/raspberry_pi/README.md @@ -0,0 +1,67 @@ +# MediaPipe Face Landmarker example with Raspberry Pi + +This example uses [MediaPipe](https://github.com/google/mediapipe) with Python on +a Raspberry Pi to perform real-time face landmarks detection using images +streamed from the camera. + +## Set up your hardware + +Before you begin, you need to +[set up your Raspberry Pi](https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up) +with Raspberry 64-bit Pi OS (preferably updated to Buster). + +You also need to [connect and configure the Pi Camera]( +https://www.raspberrypi.org/documentation/configuration/camera.md) if you use +the Pi Camera. This code also works with USB camera connect to the Raspberry Pi. + +And to see the results from the camera, you need a monitor connected +to the Raspberry Pi. It's okay if you're using SSH to access the Pi shell +(you don't need to use a keyboard connected to the Pi)—you only need a monitor +attached to the Pi to see the camera stream. + +## Install MediaPipe + +You can install the required dependencies using the setup.sh script provided with this project. + +## Download the examples repository + +First, clone this Git repo onto your Raspberry Pi. + +Run this script to install the required dependencies and download the task file: + +``` +cd mediapipe/examples/face_landmarker/raspberry_pi +sh setup.sh +``` + +## Run the example +``` +python3 detect.py +``` +* You can optionally specify the `model` parameter to set the task file to be used: + * The default value is `face_landmarker.task` + * TensorFlow Lite face landmarker models **with metadata** + * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/face_landmarker/index#models) +* You can optionally specify the `numFaces` parameter to the maximum + number of faces that can be detected by the landmarker: + * Supported value: A positive integer. + * Default value: `1` +* You can optionally specify the `minFaceDetectionConfidence` parameter to adjust the + minimum confidence score for face detection to be considered successful: + * Supported value: A floating-point number. + * Default value: `0.5` +* You can optionally specify the `minFacePresenceConfidence` parameter to adjust the + minimum confidence score of hand presence score in the face landmark detection: + * Supported value: A floating-point number. + * Default value: `0.5` +* You can optionally specify the `minTrackingConfidence` parameter to adjust the + minimum confidence score for the face tracking to be considered successful: + * Supported value: A floating-point number. + * Default value: `0.5` +* Example usage: + ``` + python3 detect.py \ + --model face_landmarker.task \ + --numFaces 2 \ + --minFaceDetectionConfidence 0.5 + ``` diff --git a/examples/face_landmarker/raspberry_pi/detect.py b/examples/face_landmarker/raspberry_pi/detect.py new file mode 100644 index 00000000..df3c948d --- /dev/null +++ b/examples/face_landmarker/raspberry_pi/detect.py @@ -0,0 +1,276 @@ +# Copyright 2023 The MediaPipe Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Main scripts to run face landmarker.""" + +import argparse +import sys +import time + +import cv2 +import mediapipe as mp + +from mediapipe.tasks import python +from mediapipe.tasks.python import vision +from mediapipe.framework.formats import landmark_pb2 + +mp_face_mesh = mp.solutions.face_mesh +mp_drawing = mp.solutions.drawing_utils +mp_drawing_styles = mp.solutions.drawing_styles + +# Global variables to calculate FPS +COUNTER, FPS = 0, 0 +START_TIME = time.time() +DETECTION_RESULT = None + + +def run(model: str, num_faces: int, + min_face_detection_confidence: float, + min_face_presence_confidence: float, min_tracking_confidence: float, + camera_id: int, width: int, height: int) -> None: + """Continuously run inference on images acquired from the camera. + + Args: + model: Name of the face landmarker model bundle. + num_faces: Max number of faces that can be detected by the landmarker. + min_face_detection_confidence: The minimum confidence score for face + detection to be considered successful. + min_face_presence_confidence: The minimum confidence score of face + presence score in the face landmark detection. + min_tracking_confidence: The minimum confidence score for the face + tracking to be considered successful. + camera_id: The camera id to be passed to OpenCV. + width: The width of the frame captured from the camera. + height: The height of the frame captured from the camera. + """ + + # Start capturing video input from the camera + cap = cv2.VideoCapture(camera_id) + cap.set(cv2.CAP_PROP_FRAME_WIDTH, width) + cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height) + + # Visualization parameters + row_size = 50 # pixels + left_margin = 24 # pixels + text_color = (0, 0, 0) # black + font_size = 1 + font_thickness = 1 + fps_avg_frame_count = 10 + + # Label box parameters + label_background_color = (255, 255, 255) # White + label_padding_width = 1500 # pixels + + def save_result(result: vision.FaceLandmarkerResult, + unused_output_image: mp.Image, timestamp_ms: int): + global FPS, COUNTER, START_TIME, DETECTION_RESULT + + # Calculate the FPS + if COUNTER % fps_avg_frame_count == 0: + FPS = fps_avg_frame_count / (time.time() - START_TIME) + START_TIME = time.time() + + DETECTION_RESULT = result + COUNTER += 1 + + # Initialize the face landmarker model + base_options = python.BaseOptions(model_asset_path=model) + options = vision.FaceLandmarkerOptions( + base_options=base_options, + running_mode=vision.RunningMode.LIVE_STREAM, + num_faces=num_faces, + min_face_detection_confidence=min_face_detection_confidence, + min_face_presence_confidence=min_face_presence_confidence, + min_tracking_confidence=min_tracking_confidence, + output_face_blendshapes=True, + result_callback=save_result) + detector = vision.FaceLandmarker.create_from_options(options) + + # Continuously capture images from the camera and run inference + while cap.isOpened(): + success, image = cap.read() + if not success: + sys.exit( + 'ERROR: Unable to read from webcam. Please verify your webcam settings.' + ) + + image = cv2.flip(image, 1) + + # Convert the image from BGR to RGB as required by the TFLite model. + rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_image) + + # Run face landmarker using the model. + detector.detect_async(mp_image, time.time_ns() // 1_000_000) + + # Show the FPS + fps_text = 'FPS = {:.1f}'.format(FPS) + text_location = (left_margin, row_size) + current_frame = image + cv2.putText(current_frame, fps_text, text_location, + cv2.FONT_HERSHEY_DUPLEX, + font_size, text_color, font_thickness, cv2.LINE_AA) + + if DETECTION_RESULT: + # Draw landmarks. + for face_landmarks in DETECTION_RESULT.face_landmarks: + face_landmarks_proto = landmark_pb2.NormalizedLandmarkList() + face_landmarks_proto.landmark.extend([ + landmark_pb2.NormalizedLandmark(x=landmark.x, + y=landmark.y, + z=landmark.z) for + landmark in + face_landmarks + ]) + mp_drawing.draw_landmarks( + image=current_frame, + landmark_list=face_landmarks_proto, + connections=mp_face_mesh.FACEMESH_TESSELATION, + landmark_drawing_spec=None, + connection_drawing_spec=mp.solutions.drawing_styles + .get_default_face_mesh_tesselation_style()) + mp_drawing.draw_landmarks( + image=current_frame, + landmark_list=face_landmarks_proto, + connections=mp_face_mesh.FACEMESH_CONTOURS, + landmark_drawing_spec=None, + connection_drawing_spec=mp.solutions.drawing_styles + .get_default_face_mesh_contours_style()) + mp_drawing.draw_landmarks( + image=current_frame, + landmark_list=face_landmarks_proto, + connections=mp_face_mesh.FACEMESH_IRISES, + landmark_drawing_spec=None, + connection_drawing_spec=mp.solutions.drawing_styles + .get_default_face_mesh_iris_connections_style()) + + # Expand the right side frame to show the blendshapes. + current_frame = cv2.copyMakeBorder(current_frame, 0, 0, 0, + label_padding_width, + cv2.BORDER_CONSTANT, None, + label_background_color) + + if DETECTION_RESULT: + # Define parameters for the bars and text + legend_x = current_frame.shape[ + 1] - label_padding_width + 20 # Starting X-coordinate (20 as a margin) + legend_y = 30 # Starting Y-coordinate + bar_max_width = label_padding_width - 40 # Max width of the bar with some margin + bar_height = 8 # Height of the bar + gap_between_bars = 5 # Gap between two bars + text_gap = 5 # Gap between the end of the text and the start of the bar + + face_blendshapes = DETECTION_RESULT.face_blendshapes + + if face_blendshapes: + for idx, category in enumerate(face_blendshapes[0]): + category_name = category.category_name + score = round(category.score, 2) + + # Prepare text and get its width + text = "{} ({:.2f})".format(category_name, score) + (text_width, _), _ = cv2.getTextSize(text, + cv2.FONT_HERSHEY_SIMPLEX, + 0.4, 1) + + # Display the blendshape name and score + cv2.putText(current_frame, text, + (legend_x, legend_y + (bar_height // 2) + 5), + # Position adjusted for vertical centering + cv2.FONT_HERSHEY_SIMPLEX, + 0.4, # Font size + (0, 0, 0), # Black color + 1, + cv2.LINE_AA) # Thickness + + # Calculate bar width based on score + bar_width = int(bar_max_width * score) + + # Draw the bar to the right of the text + cv2.rectangle(current_frame, + (legend_x + text_width + text_gap, legend_y), + (legend_x + text_width + text_gap + bar_width, + legend_y + bar_height), + (0, 255, 0), # Green color + -1) # Filled bar + + # Update the Y-coordinate for the next bar + legend_y += (bar_height + gap_between_bars) + + cv2.imshow('face_landmarker', current_frame) + + # Stop the program if the ESC key is pressed. + if cv2.waitKey(1) == 27: + break + + detector.close() + cap.release() + cv2.destroyAllWindows() + + +def main(): + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument( + '--model', + help='Name of face landmarker model.', + required=False, + default='face_landmarker.task') + parser.add_argument( + '--numFaces', + help='Max number of faces that can be detected by the landmarker.', + required=False, + default=1) + parser.add_argument( + '--minFaceDetectionConfidence', + help='The minimum confidence score for face detection to be considered ' + 'successful.', + required=False, + default=0.5) + parser.add_argument( + '--minFacePresenceConfidence', + help='The minimum confidence score of face presence score in the face ' + 'landmark detection.', + required=False, + default=0.5) + parser.add_argument( + '--minTrackingConfidence', + help='The minimum confidence score for the face tracking to be ' + 'considered successful.', + required=False, + default=0.5) + # Finding the camera ID can be very reliant on platform-dependent methods. + # One common approach is to use the fact that camera IDs are usually indexed sequentially by the OS, starting from 0. + # Here, we use OpenCV and create a VideoCapture object for each potential ID with 'cap = cv2.VideoCapture(i)'. + # If 'cap' is None or not 'cap.isOpened()', it indicates the camera ID is not available. + parser.add_argument( + '--cameraId', help='Id of camera.', required=False, default=0) + parser.add_argument( + '--frameWidth', + help='Width of frame to capture from camera.', + required=False, + default=1280) + parser.add_argument( + '--frameHeight', + help='Height of frame to capture from camera.', + required=False, + default=960) + args = parser.parse_args() + + run(args.model, int(args.numFaces), args.minFaceDetectionConfidence, + args.minFacePresenceConfidence, args.minTrackingConfidence, + int(args.cameraId), args.frameWidth, args.frameHeight) + + +if __name__ == '__main__': + main() diff --git a/examples/face_landmarker/raspberry_pi/requirements.txt b/examples/face_landmarker/raspberry_pi/requirements.txt new file mode 100644 index 00000000..02769821 --- /dev/null +++ b/examples/face_landmarker/raspberry_pi/requirements.txt @@ -0,0 +1 @@ +mediapipe \ No newline at end of file diff --git a/examples/face_landmarker/raspberry_pi/setup.sh b/examples/face_landmarker/raspberry_pi/setup.sh new file mode 100644 index 00000000..3be13ec6 --- /dev/null +++ b/examples/face_landmarker/raspberry_pi/setup.sh @@ -0,0 +1,5 @@ +# Install Python dependencies. +python3 -m pip install pip --upgrade +python3 -m pip install -r requirements.txt + +wget -O face_landmarker.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task diff --git a/examples/gesture_recognizer/raspberry_pi/README.md b/examples/gesture_recognizer/raspberry_pi/README.md index 7a540c64..7b1b1833 100644 --- a/examples/gesture_recognizer/raspberry_pi/README.md +++ b/examples/gesture_recognizer/raspberry_pi/README.md @@ -44,7 +44,7 @@ python3 recognize.py * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#models) * Custom models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#custom_models) are supported. * You can optionally specify the `numHands` parameter to the maximum - number of hands can be detected by the recognizer: + number of hands that can be detected by the recognizer: * Supported value: A positive integer (1-2) * Default value: `1` * You can optionally specify the `minHandDetectionConfidence` parameter to adjust the diff --git a/examples/gesture_recognizer/raspberry_pi/recognize.py b/examples/gesture_recognizer/raspberry_pi/recognize.py index a2353f38..6da95d78 100644 --- a/examples/gesture_recognizer/raspberry_pi/recognize.py +++ b/examples/gesture_recognizer/raspberry_pi/recognize.py @@ -197,7 +197,7 @@ def main(): default='gesture_recognizer.task') parser.add_argument( '--numHands', - help='Max number of hands can be detected by the recognizer.', + help='Max number of hands that can be detected by the recognizer.', required=False, default=1) parser.add_argument( diff --git a/examples/image_classification/raspberry_pi/README.md b/examples/image_classification/raspberry_pi/README.md index b96cd393..9c6498bd 100644 --- a/examples/image_classification/raspberry_pi/README.md +++ b/examples/image_classification/raspberry_pi/README.md @@ -44,7 +44,7 @@ python3 classify.py * TensorFlow Lite image classification models **with metadata** * Models from [TensorFlow Hub](https://tfhub.dev/tensorflow/collections/lite/task-library/image-classifier/1) * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/image_classifier/index#models) - * Models trained with [TensorFlow Lite Model Maker](https://developers.google.com/mediapipe/solutions/customization/image_classifier) are supported. + * Models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/customization/image_classifier) are supported. * You can optionally specify the `maxResults` parameter to limit the list of classification results: * Supported value: A positive integer. diff --git a/examples/object_detection/raspberry_pi/README.md b/examples/object_detection/raspberry_pi/README.md index 7e72ce85..f960d64c 100644 --- a/examples/object_detection/raspberry_pi/README.md +++ b/examples/object_detection/raspberry_pi/README.md @@ -56,7 +56,7 @@ visualization. * The default value is `efficientdet_lite0.tflite` * TensorFlow Lite object detection models **with metadata** * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/object_detector/index#models) - * Models trained with [TensorFlow Lite Model Maker](https://developers.google.com/mediapipe/solutions/customization/object_detector) are supported. + * Models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/customization/object_detector) are supported. * You can optionally specify the `maxResults` parameter to limit the list of detection results: * Supported value: A positive integer. diff --git a/examples/text_classification/raspberry_pi/README.md b/examples/text_classification/raspberry_pi/README.md index 68d8ca25..18ef5949 100644 --- a/examples/text_classification/raspberry_pi/README.md +++ b/examples/text_classification/raspberry_pi/README.md @@ -39,6 +39,9 @@ python3 classify.py --inputText "Your text goes here" * You can optionally specify the `model` parameter to set the TensorFlow Lite model to be used: * The default value is `classifier.tflite` + * TensorFlow Lite text classification models **with metadata** + * Models from [MediaPipe Models](https://developers.google.com/mediapipe/solutions/text/text_classifier/index#models) + * Models trained with [MediaPipe Model Maker](https://developers.google.com/mediapipe/solutions/customization/text_classifier) are supported. * Example usage: ``` python3 classify.py \