Receive customizable, AI-powered notifications when someone arrives in your driveway.
driveway-monitor
accepts an RTSP video stream (or, for testing purposes, a video file) and uses the YOLOv8 model to track objects in the video. When an object meets your notification criteria (highly customizable; see "Configuration" below), driveway-monitor
will notify you via Ntfy. The notification includes a snapshot of the object that triggered the notification and provides options to mute notifications for a period of time.
The YOLO computer vision model can run on your CPU or on NVIDIA or Apple Silicon GPUs. It would be possible to use a customized model, and in fact I originally planned to refine my own model based on YOLOv8, but it turned out that the pretrained YOLOv8 model seems to work fine.
Optionally, driveway-monitor
can also use an instance of Ollama to provide a detailed description of the object that triggered the notification.
This short video gives an overview of the end result. A notification is received; clicking the "Mute" button results in another notifiation with options to extend the mute time period or unmute the system. Tapping on the notification would open an image of me in my driveway; this isn't shown in the video for privacy reasons.
python3 main.py [-h] [--config CONFIG] [--video VIDEO] [--debug]
The main.py
program only takes a few options on the CLI. Most configuration is done via a JSON config file (see "Configuration" below).
--config CONFIG
: Path to your JSON config file.--debug
: Enable debug logging.-h, --help
: Show help and exit.--print
: Print notifications to stdout instead of sending them via Ntfy.--video VIDEO
: Path to the video file or RTSP stream to process. Required.
Due to the Python 3.12 requirement and the annoyance of maintaining a virtualenv, I recommend running this application via Docker. The following images are available:
cdzombak/driveway-monitor:*-amd64-cuda
: NVIDIA image for amd64 hostscdzombak/driveway-monitor:*-amd64-cpu
: CPU-only image for amd64 hostscdzombak/driveway-monitor:*-arm64
: image for arm64 hosts (e.g. Apple Silicon and Raspberry Pi 4/5)
Note
To run the model on an Apple Silicon GPU, you'll need to set up a Python virtualenv and run driveway-monitor
directly, not via Docker. See "Running with Python" below.
Running a one-off driveway-monitor
process with Docker might look like:
docker run --rm -v ./config.json:/config.json:ro cdzombak/driveway-monitor:1-amd64-cpu --config /config.json --video "rtsps://192.168.0.77:7441/abcdef?enableSrtp" --debug
This docker-compose.yml
file runs driveway-monitor
on an amd64 host, with NVIDIA GPU support. Note that your config file is mapped into the container at /config.json
.
---
services:
driveway-monitor:
image: cdzombak/driveway-monitor:1-amd64-cuda
volumes:
- ./config.json:/config.json:ro
- ./enrichment-prompts:/enrichment-prompts:ro
command:
[
"--debug",
"--config",
"/config.json",
"--video",
"rtsp://192.168.0.77:7441/ca55e77e",
]
ports:
- 5550:5550
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all
restart: always
See Ultralytics' docs on setting up Docker with NVIDIA support. In case that URL changes, the relevant instructions as of 2024-05-23 are copied here:
First, verify that the NVIDIA drivers are properly installed by running:
nvidia-smiNow, let's install the NVIDIA Docker runtime to enable GPU support in Docker containers:
Add NVIDIA package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - distribution=$(lsb_release -cs) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listInstall NVIDIA Docker runtime
sudo apt-get update sudo apt-get install -y nvidia-docker2Restart Docker service to apply changes
sudo systemctl restart dockerVerify NVIDIA Runtime with Docker
Run
docker info | grep -i runtime
to ensure that nvidia appears in the list of runtimes.
Note
Requires Python 3.12 or later.
Clone the repository, change into its directory, set up a virtualenv with the project's requirements, and run main.py
:
git clone https://github.com/cdzombak/driveway-monitor.git
cd driveway-monitor
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python3 ./main.py --config /path/to/config.json --video 'rtsp://example.com/mystream'
If you're running on Apple Silicon, you should see a log message at startup informing you the model is using the mps
device.
This section briefly explains the different components of the program. In particular, this understanding will help you effectively configure driveway-monitor
.
(Configuration key: model
.)
The prediction process consumes a video stream frame-by-frame and feeds each frame to the YOLOv8 model. The model produces predictions of objects in the frame, including their classifications (e.g. "car") and rectangular bounding boxes. These predictions are passed to the tracker process.
(Configuration keys: tracker
and notification_criteria
.)
The tracker process aggregates the model's predictions over time, building tracks that represent the movement of individual objects in the video stream. Every time a track is updated with a prediction from a new frame, the tracker evaluates the track against the notification criteria. If the track meets the criteria, a notification is triggered.
(Configuration key: enrichment
.)
Enrichment is an optional feature that uses an Ollama model to generate a more detailed description of the object that triggered a notification. If the Ollama model succeeds, the resulting description is included in the notification's message.
To use enrichment, you'll need a working Ollama setup with a multimodal model installed. driveway-monitor
does not provide this, since it's not necessary for the core feature set, and honestly it provides little additional value.
The best results I've gotten (which still are not stellar) are using the LLaVA 13b model. This usually returns a result in under 3 seconds (when running on a 2080 Ti). On a CPU or less powerful GPU, consider llava:7b
, llava-llama3
, or just skip enrichment altogether.
You can change the timeout for Ollama enrichment to generate a response by setting enrichment.timeout_s
in your config. If you want to use enrichment, I highly recommend setting an aggressive timeout to ensure driveway-monitor
's responsiveness.
Using enrichment requires providing a prompt file for each YOLO object classification (e.g. car
, truck
, person
) you want to enrich. This allows giving different instructions to your Ollama model for people vs. cars, for example. The enrichment-prompts
directory provides a useful set of prompt files to get you started.
When running driveway-monitor
in Docker, keep in mind that your enrichment prompt files must be mounted in the container, and the paths in your config file must reflect the paths inside the container.
(Configuration key: notifier
.)
The notifier receives notification triggers from the tracker process and sends via Ntfy to a configured server and topic. Notifications include a snapshot of the object that triggered the notification and provides options to mute notifications for a period of time.
The notifier also debounces notifications, preventing multiple notifications for the same type of object within a short time period; and allows muting all notifications for a period of time.
(Configuration key: web
.)
The web server provides a few simple endpoints for viewing notification photos, muting driveway-monitor
's notifications, and checking the program's health.
Quite a number of parameters are set by a JSON configuration file. An example is provided in this repo (config.example.json
) which demonstrates most, but not all, of these options.
The file is a single JSON object containing the following keys, or a subset thereof. All keys in the JSON file are optional; if a key is not present, the default value will be used. Each key refers to another object configuring a specific part of the driveway-monitor
program:
model
: Configures video capture and the AI model. (See Predict Settings.)confidence
: Sets the minimum confidence threshold for detections. Objects detected with confidence below this threshold will be disregarded. Adjusting this value can help reduce false positives.device
: Specifies the device for inference (e.g.,cpu
,cuda:0
or0
).half
: Use half precision (FP16) to speed up inference.healthcheck_ping_url
: URL to ping with a GET request when the program starts and everyliveness_tick_s
seconds.iou
: Intersection Over Union (IoU) threshold for Non-Maximum Suppression (NMS). Lower values result in fewer detections by eliminating overlapping boxes, useful for reducing duplicates.liveness_tick_s
: Specifies the interval to log a liveness message and pinghealthcheck_ping_url
(if that field is set).max_det
: Maximum number of detections per frame.
tracker
: Configures the system that builds tracks from the model's detections over time.inactive_track_prune_s
: Specifies the number of seconds after which an inactive track is pruned. This prevents incorrectly adding a new prediction to an old track.track_connect_min_overlap
: Minimum overlap percentage of a prediction box with the average of the last 2 boxes in an existing track for the prediction to be added to that track.
enrichment
: Configures the subsystem that enriches notifications via the Ollama API.enable
: Whether to enable enrichment via Ollama. Defaults tofalse
.endpoint
: Complete URL to the Ollama/generate
endpoint, e.g.http://localhost:11434/api/generate
.keep_alive
: Ask Ollama to keep the model in memory for this long after the request. String, formatted like60m
. See the Ollama API docs.model
: The name of the Ollama model to use, e.g.llava
orllava:13b
.prompt_files
: Map ofYOLO classification name
→path
. Each path is a file containing the prompt to give Ollama along with an image of that YOLO classification.timeout_s
: Timeout for the Ollama request, in seconds. This includes connection/network time and the time Ollama takes to generate a response.
notifier
: Configures how notifications are sent.debounce_threshold_s
: Specifies the number of seconds to wait after a notification before sending another one for the same type of object.default_priority
: Default priority for notifications. (See Ntfy docs on Message Priority.)image_method
: Method for adding images to notifications. By default, the image URL is added both as a "click" action and as an attachment. Set this toclick
orattach
to use only one of those methods.images_cc_dir
: Directory to which notification images are written. This is optional; if not set, nothing is saved to disk.priorities
: Map ofclassification name
→priority
. Allows customizing notification priority for specific object types.req_timeout_s
: Request timeout for sending notifications.server
: The Ntfy server to send notifications to.token
: Ntfy auth token (beginning withtk_
).topic
: Ntfy topic to send to.
notification_criteria
: Configures the criteria for sending notifications.classification_allowlist
: List of object classifications to allow. If this list is non-empty, only objects with classifications in this list will be considered for notifications.classification_blocklist
: List of object classifications to block. Objects with classifications in this list will not be considered for notifications.min_track_length_s
: Minimum number of seconds a track must cover before it can trigger a notification.min_track_length_s_per_classification
: Map ofclassification name
→seconds
. Allows customizingmin_track_length_s
on a per-classification basis.track_cel
: CEL expression to evaluate for each track. If the expression evaluates totrue
, a notification will be sent. (See "Thenotification_criteria.track_cel.track_cel
expression" below.)
web
: Configures the embedded web server.bind_to
: IP address to bind the web server to.external_base_url
: External base URL for the web server (e.g.http://me.example-tailnet.ts.net:5550
). Used to generate URLs in notifications.port
: Port to bind the web server to.
This field is a string representing a CEL expression that will be evaluated for each track. If the expression evaluates to true
, a notification will be sent.
The expression has access to a single variable, track
. This variable is a dictionary (aka map) with the following keys:
classification
(string): classification of the object in the track.predictions
: list ofPrediction
s from the model included in this track, each of which have:box
: theBox
for this predictionclassification
(string): classification of the objectt
: the timestamp of the prediction
first_t
: timestamp of the first prediction in the tracklast_t
: timestamp of the last prediction in the tracklength_t
: duration of the track, in seconds (this is justlast_t - first_t
)first_box
: the first prediction'sBox
last_box
: the most recent prediction'sBox
total_box
: the smallestBox
that covers all predictions in the trackaverage_box
: the average of everyBox
in the track
Each Box
has:
a
: top-leftPoint
b
: bottom-rightPoint
center
: the centerPoint
w
: width of the boxh
: height of the boxarea
: area of the box
Finally, each Point
has:
x
: x-coordinatey
: y-coordinate
Coordinates are floats between 0
and 1
, on both axes.
The origin for box coordinates (0, 0)
is the top-left corner of the frame. Coordinate (1, 1)
is the lower-right corner of the frame:
■───────────────────────────────────────┐
│(0, 0) │
│ │
│ │
│ │
│ │
│ ■(0.5, 0.5) │
│ │
│ │
│ │
│ │
│ │
│ (1, 1)│
└───────────────────────────────────────■
A track's movement vector is calculated from the center of the prediction box at the start of the track, to the center of the most recent prediction box in the track. It has three properties: length
, direction
, and direction360
.
The length of the vector, as a float between 0
and ≅1.414
.
A length of 1
would cover the entire frame vertically or horizontally; a length of 1.414
would cover the entire frame diagonally, from corner to corner:
(0, 0)
■─┬──────────────────────────┐
│ └──┐ │
│ └──┐ │
│ └──┐ │
│ └─┐ │
│ length │
│ ≅1.414 │
│ └─┐ │
│ └─┐ │
│ └─┐ │
│ └─┐ │
│ └─┐ │
│ └▶│(1, 1)
└────────────────────────────■
(0, 0)
■───────┬────────────────────┐
│ │ │
│ │ │
│ │ │
│ │ length = 1 │
├───────┼────────────────────▶
│ │ │
│ │ │
│ │ │
│ │ │
│ length = 1 │
│ │ │
│ │ │
└───────▼────────────────────■(1, 1)
The direction of the vector, in degrees, from [-180, 180)
.
0°
is straight to the right of frame; 90º
is straight up; -180º
is straight left; -90º
is straight down:
direction of a vector from point a to b:
┌───┐
│ b │
┌───┐ └─▲─┘ ┌───┐
│ b │ │ │ b │
└─▲─┘ 90º └─▲─┘
╲ │ ╱
135º╲ │ ╱45º
╳───┴───╳
┌───┐ │ │ ┌───┐
│ b ◀──────┤ a │──0º──▶ b │
└───┘-180º │ │ └───┘
╳───┬───╳
-135º╱ │ ╲-45º
╱ │ ╲
┌─▼─┐ -90º ┌─▼─┐
│ b │ │ │ b │
└───┘ ┌─▼─┐ └───┘
│ b │
└───┘
The direction of the vector, in degrees, from [0, 360)
. This is just direction + 180
, but depending on the video in your use case, it may be more convenient to work with this figure instead of direction
.
0°
is straight to the left of frame; 90º
is straight down; 180º
is straight right; 270º
is straight up:
direction360 of a vector from point a to b:
┌───┐
│ b │
┌───┐ └─▲─┘ ┌───┐
│ b │ │ │ b │
└─▲─┘ 270º └─▲─┘
╲ │ ╱
315º╲ │ 225º
╳───┴───╳
┌───┐ │ │ 180º┌───┐
│ b ◀──────┤ a │──────▶ b │
└───┘ 0º │ │ └───┘
╳───┬───╳
45º ╱ │ ╲135º
╱ │ ╲
┌─▼─┐ 90º ┌─▼─┐
│ b │ │ │ b │
└───┘ ┌─▼─┐ └───┘
│ b │
└───┘
Here's an example expression that I use at home:
track.last_box.b.y > 0.4 && track.movement_vector.length > 0.4 && track.movement_vector.direction < 25 && track.movement_vector.direction > -80
This example limits notifications to tracks that meet all the following criteria:
- The most recent prediction box's bottom-right corner is in the lower 60% of the frame (i.e. that corner's Y coordinate is greater than 0.4).
- The vector from start to end of the track has moved across at least 40% of the frame.
- The direction of the movement vector is between 25° and -80° (down and to the right).
The Ntfy web interface and iPhone app (maybe other clients, too, I'm not sure) will not send POST requests to insecure, http://
endpoints. I highly recommend running driveway-monitor
behind a reverse proxy with HTTPS.
To make drivewway-monitor
's API securely accessible wherever you are and provide an HTTPS endpoint, I use and recommend Tailscale. On the machine running driveway-monitor
, you can use a single Tailscale command to make the API available over HTTPS. In this example, I have driveway-monitor
running on my-machine
and its API listening at port 5550 (the default). I'll tell Tailscale to make the API available over HTTPS at https://my-machine.my-tailnet.ts.net:5559
:
tailscale serve --bg --tls-terminated-tcp 5559 tcp://localhost:5550
driveway-monitor
is designed to facilite monitoring via, for example, Uptime Kuma. I recommend monitoring two things: the model's liveness and the API's health endpoint.
You could also monitor that the Docker container is running (or that the Python script is running, under systemd or similar), but that's less valuable.
If model.healthcheck_ping_url
is set in your config, the model will send a GET request to model.healthcheck_ping_url
every model.liveness_tick_s
seconds. You can use this in conjunction with an Uptime Kuma "push" monitor to be alerted when the model stops pinging the URL.
GET the /health
endpoint on the web server (web.external_base_url/health
). This endpoint returns an HTTP 200 with the JSON content {"status": "ok"}
as long as the server is running.
GNU GPL v3; see LICENSE in this repo.
Chris Dzombak.