Comparative analysis of the models

NOTE: This analysis is pending updating to include model 4, and also to evaluate the quality of the clusters that were not originally taken into account.

From a previous analysis of Branko Kokanovic we know that there are some linearity between the area of the images and memory consumption of CNN models, so we take as reference the area of the image.

Procedure followed here:

Have an instance of nextcloud with this application working. Install Netdata to obtain statistics of your installation. Install jq to parse the Json answers and GNU bc to perform some basic calculations.
We create a new user to the Nexcloud instance. For example only 'user'.
Upload some images with faces to analyze. For this analysis 274 images of The Big Bang Theory.. 😉
Turn off all services that consume PHP (Apache, nginx, php-fpm). Sorry, but that makes a difference of 500mb. 😅 I.e: sudo systemctl stop httpd php-fpm
Install the model that want to analyze. sudo -u apache php occ face:setup --model 2
Run the background task to analyze the photos also controlling the time consumed. time sudo -u apache php occ face:background_job -u user
Meanwhile observe the memory consumption in Netdata. Don't worry, we actually automate it. 😅
Get the statistics of the background task.sudo -u apache php occ face:stats -u user

Finally:

The number of faces found (This should be the important data.) You obtained it from the statistics of the background task.
The maximum memory consumption was obtained with the naked eye with Netdata.
The average time consumed by each image is obtained by dividing in total time consumption by the amount of images.

Well. As it can take a long time, and I admit that it is not so easy, we do everything automatically. 😅

Automated analysis

[matias@nube nextcloud]$ cat test-facerecognition.sh 
#!/usr/bin/env bash
NETDATAHOST='http://localhost:19999'
NCUSER='user'

MINMODEL='1'
MAXMODEL='4'

MINSIDE='800'
MAXSIDE='2600'
DELTASIDE='200'

for ((i=${MINMODEL} ; i <= ${MAXMODEL} ; i++)); do

	php occ face:setup --model ${i}

	OUTPUTFILE="model-${i}.csv"
	echo "AREA,AVGTIME,MAXMEMORY,FACES,PERSONS" > ${OUTPUTFILE}

	for ((j=${MINSIDE} ; j <= ${MAXSIDE} ; j+=${DELTASIDE})); do
		let area=$((j ** 2))

		php occ face:reset --all -u ${NCUSER}

		AFTER=$(date +%s)
		php occ face:background_job -u ${NCUSER} --max_image_area ${area}
		BEFORE=$(date +%s)

		MAXMEMORY=$(curl -s "${NETDATAHOST}/api/v1/data?chart=apps.mem&before=${BEFORE}&after=${AFTER}&dimensions=php" | jq '.data[][1]' | sort -Vr | head -n1)

		php occ face:background_job -u ${NCUSER}

		ALLTIME=$((${BEFORE}-${AFTER}))
		IMAGES=$(php occ face:stats --json -u ${NCUSER} | jq '.[].images')
		FACES=$(php occ face:stats --json -u ${NCUSER} | jq '.[].faces')
		PERSONS=$(php occ face:stats --json -u ${NCUSER} | jq '.[].persons')
		AVGTIME=$(echo "scale=3; ${ALLTIME}/${IMAGES}" | bc -l)

		echo "${area},${AVGTIME},${MAXMEMORY},${FACES},${PERSONS}" >> ${OUTPUTFILE}

		sleep 1
	done
done

Run as:

sudo -u apache bash test-facerecognition.sh #Replace with your service user!. ie www-data

Result:

It results in 3 csv files with the main statistics which we share here:

[matias@nube ~]$ cat model-1.csv 
AREA,AVGTIME,MAXMEMORY,FACES,PERSONS
160000,2.905,793.3242,599,338
250000,4.412,1135.3633,779,445
360000,6.135,1536.0508,889,454
490000,8.226,2037.832,955,492
640000,10.992,2622.656,965,497
810000,13.832,3275.984,975,484
1000000,17.018,3979.73,975,473
1210000,20.427,4752.957,988,498
1440000,24.284,5628.828,988,502
1690000,28.518,6522.82,988,503
[matias@nube ~]$ cat model-2.csv 
AREA,AVGTIME,MAXMEMORY,FACES,PERSONS
160000,2.937,805.0469,599,338
250000,4.514,1172.1367,779,445
360000,6.306,1548.707,889,454
490000,8.463,2053.75,955,492
640000,11.040,null,965,497
810000,13.846,3278.324,975,484
1000000,16.948,3986.047,975,473
1210000,20.135,4764.246,988,498
1440000,24.262,5632.656,988,502
1690000,28.167,6575.504,988,499
[matias@nube ~]$ cat model-3.csv 
AREA,AVGTIME,MAXMEMORY,FACES,PERSONS
160000,3.021,112.94141,144,84
250000,4.091,122.96875,273,117
360000,5.543,101.48047,384,166
490000,7.175,130.17188,458,192
640000,9.124,116.25781,537,210
810000,11.288,119.52344,598,235
1000000,13.594,118.92969,647,266
1210000,16.394,116.90625,684,272
1440000,18.799,120.18359,696,286
1690000,16.901,116.23828,710,290

Contextualizing result

The idea is to compare the models, so have to compare the same columns.

Number of faces detected

Area	Model 1	Model 2	Model 3
640.000	601	599	537
1.000.000	785	779	647
1.440.000	897	889	696
1.960.000	957	955	727
2.560.000	967	965	738
3.240.000	976	975	735
4.000.000	980	975	730
4.840.000	987	988	735
5.760.000	993	988	737
6.760.000	994	988	735

Faces Detected

Maximum memory consumption

Area	Model 1	Model 2	Model 3
640.000	748,92	805,05	120,95
1.000.000	1.076,11	1.172,14	102,18
1.440.000	1.474,38	1.548,71	113,62
1.960.000	1.972,52	2.053,75	128,54
2.560.000	2.584,01	2.584,92	134,02
3.240.000	3230,22	3.278,32	142,68
4.000.000	3.943,64	3.986,05	146,13
4.840.000	4.707,66	4.764,25	150,22
5.760.000	5.580,12	5.632,66	162,84
6.760.000	6.480,95	6.575,50	183,54

Max. Memory

Processing time

Area	Model 1	Model 2	Model 3
640.000	2,91	2,94	9,18
1.000.000	4,43	4,51	13,59
1.440.000	6,33	6,31	16,40
1.960.000	8,34	8,46	25,37
2.560.000	11,20	11,04	32,65
3.240.000	13,73	13,85	34,58
4.000.000	17,09	16,95	50,51
4.840.000	20,86	20,14	60,07
5.760.000	24,76	24,26	54,68
6.760.000	28,96	28,17	63,98

Processing time

First conclusions

The model 1 and model 2, practically result in exactly the same results. ~~Therefore, between both models we recommend model 2 that offers more information with the same requirements.~~ EDIT: Recently discovered some clustering errors of model 2 that were not taken into account in this analysist, and therefore it is discouraged. So, Model 1 is recommend.
It is true that Model 3 practically does not consume memory, but with the same image it is much slower and only finds 73% of faces.

However, the most interesting thing IMHO is to compare when HOG ties the worst CNN result.

Area	Width	Heigh	Faces (Model 2)	Memory (Model 2)	Time (Model 2)	Faces (Model 3)	Memory (Model 3)	Time (Model 3)
160.000	462	346	599	805,05	2,94	144	112,94	3,02
810.000	1.039	779	975	3.278,32	13,85	598	119,52	11,29

With an image of 462x346 CNN results in the same amount of faces as HOG with 1039x975
With an image of 462x346 CNN has an average time of 3.02 seconds against 11.28 of HOG with 1039x975. (Almost 4 times slower to get the same results.)
With an image of 462x346 CNN use 805,04 MB of ram against 117Mb of HOG for any size.

So if it is for speed, I would recommend CNN, which with less area offers similar results in much less time. The CNN model memory grows according to the size of the image but is controllable.

Also we can do the same analysis comparing the best HOG result against the one that offers the same result in CNN

Area	Width	Heigh	Faces (Model 2)	Memory (Model 2)	Time (Model 2)	Faces (Model 3)	Memory (Model 3)	Time (Model 3)
250.000	577	433	779	1.172,14	4,51	273	122,97	4,09
1.690.000	1.501	1.126	988	6.575,50	28,17	710	116,24	16,90

With an image of only 577x433 CNN offers even better results than HOG with 1501x1126
With an image of 577x433 CNN has an average time of 4.51 seconds against 16.90 of HOG with 1501x1126. Again almost 4 times slower to get the same results.
With an image of 577x433 CNN use 1.172,14 MB of ram against 117Mb of HOG for any size. HOG still has a ridiculous memory consumption, but 1172 MB of CNN is more than acceptable. Therefore, in terms of memory obviously HOG is much better, but CNN is fully usable.

Finally, in general I would recommend using CNN68 with small images, and HOG with larger images only if it is absolutely necessary..