JSON objects returns and DBASE operations displays in deeper detail. #45

vriez · 2018-10-30T14:42:28Z

The main achievement of this work goes to the file_match_to_objs function creation. The function is called through the -O option and outputs a json file, example:

JSON Returns

python audfprint.py --density 100 -O --skip-existing --find-time-range match --dbase fpdbase.pklz tests/data/query.mp3

_Wed Oct 31 00:46:34 2018 Reading hash table fpdbase.pklz
Read fprints for 14 files ( 39724 hashes) from fpdbase.pklz (0.00% dropped)
{'track': 'tests/data/Nine_Lives/05-Full_Circle.mp3', 'query_match_length': 4.481451247165533, 'query_match_start_at': 0.25541950113378686, 'track_match_start_at': 0.25541950113378686, 'coverage': 0.5610389412786497, 'fingerprinting_duration': 0.0, 'query_duration': 5.642448979591837, 'total_fingerprints_analyzed': 54, 'total_tracks_analyzed': 0, 'confidence': 0.9074074074074074}_

WHY? When using audfprint to process queries (stream) coming from multiple Kafka topics, receiving JSON objects has been a significant improvement.

In order to silent the first two lines, some further decoupling effort would have to be made, for this is located within the hash_table.

DBASE

python audfprint.py --density 100 --skip-existing new --dbase fpdbase.pklz tests/data/Nine_Lives/*.mp3

_ingesting # 0 : track: tests/data/Nine_Lives/01-Nine_Lives.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0 
`...`
ingesting # 12 : track: tests/data/Nine_Lives/13-Fallen_Angels.mp3, duration[sec]: 10.057143, density[hashes/sec]: 109.0 
Added 39724 hashes (88.1 hashes/sec)
Saved fprints for 14 files ( 39724 hashes) to fpdbase.pklz (0.00% dropped)_

WHY? When comparing multiple dbases pragmatically, if one wants to compare the results against the fingerprinting technique, a plausible and minimalist solution would be to keep it stored within the hash_table .
Explanation: duration[sec] and density[hashes/sec] have so been named so it can be accessed as a key, without any further parsing. I agree that it would look better without the [*], but it may not be obvious.

python audfprint.py --density 100 --skip-existing --find-time-range list --dbase fpdbase.pklz

Wed Oct 31 00:52:38 2018 Reading hash table fpdbase.pklz
Read fprints for 14 files ( 39724 hashes) from fpdbase.pklz (0.00% dropped)
track: 'tests/data/Nine_Lives/01-Nine_Lives.mp3', hash_count[units]: 1114, duration[s]: 10.057143211364746, real_density: 110.76704155322763, fingerprinted_density: 100.0
`...`
track: 'tests/data/Nine_Lives/13-Fallen_Angels.mp3', hash_count[units]: 1095, duration[s]: 10.057143211364746, real_density: 108.87783707431262, fingerprinted_density: 100.0

WHY? In case one needs to really get a closer look at the fingerprinting process. As the effective/real fingerprinting density and the desired density --density may vary, is seems reasonable to display such information. Not least, the duration may also be an important property to be displayed as the track naming can be automated and hence become humanly hard to understand the outputs.

CREDITS / REFERENCE

The naming for the JSON keys/properties has been inspired by:
https://github.com/AddictedCS/soundfingerprinting

- matplotlib has been added - The paths to the songs where misspelled

…ack has been fingerprinted.

…well as added the -O|--json option for the `match` command.

…nformation by default.

…ht fixes have been made on for the hash_table content printing.

…rictly the same as displayed by the `match` command.

dpwe

Thanks for all the suggested changes.

I like the idea of JSON output, but maybe have it write to a file, rather than as an alternative STDOUT?

It makes sense to store the analysis density in the hash table, but not separately for each track, I think it should be a global property of the database. I don't think we want databases where the density is different for different reference items.

The match stats report seems very specific to your interests, I'm not sure if we should add them to the main release.

vriez · 2018-11-07T13:39:37Z

Thank you very much for taking the time, Dan.

Yes, I agree that the report is quite specific, it just may interest others either learning or with similar goals.

The purposes of obtaining a JSON are mainly:

Trying to bring a convention
Convenience in converting the return object into a DML query.

So, writing directly to a file could be a useful option for some and use cases may vary a lot. I would say that just returning the plain object an allowing the user to decide how it is to be used could be more pleasant/functional.

The issue with the density only being a global property is that it is often the case when tracks fingerprinted under the same --density into the same --dbase, for example:

(venv) vitor@hal9000:~/Projects/audfprint$ python audfprint.py --density 100 --skip-existing new --dbase fpdbase.pklz tests/data/Nine_Lives/*.mp3
ingesting # 0 : track: tests/data/Nine_Lives/01-Nine_Lives.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0 
ingesting # 1 : track: tests/data/Nine_Lives/02-Falling_In_Love.mp3, duration[sec]: 10.057143, density[hashes/sec]: 97.0 
ingesting # 2 : track: tests/data/Nine_Lives/03-Hole_In_My_Soul.mp3, duration[sec]: 10.057143, density[hashes/sec]: 99.0 
ingesting # 3 : track: tests/data/Nine_Lives/04-Taste_Of_India.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0 
ingesting # 4 : track: tests/data/Nine_Lives/05-Full_Circle.mp3, duration[sec]: 10.057143, density[hashes/sec]: 99.0 
ingesting # 5 : track: tests/data/Nine_Lives/06-Something_s_Gotta_Give.mp3, duration[sec]: 10.057143, density[hashes/sec]: 113.0 
ingesting # 6 : track: tests/data/Nine_Lives/07-Ain_t_That_A_Bitch.mp3, duration[sec]: 10.057143, density[hashes/sec]: 113.0 
ingesting # 7 : track: tests/data/Nine_Lives/08-The_Farm.mp3, duration[sec]: 10.057143, density[hashes/sec]: 107.0 
ingesting # 8 : track: tests/data/Nine_Lives/09-Crash.mp3, duration[sec]: 10.057143, density[hashes/sec]: 108.0 
ingesting # 9 : track: tests/data/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3, duration[sec]: 10.057143, density[hashes/sec]: 107.0 
ingesting # 10 : track: tests/data/Nine_Lives/11-Pink.mp3, duration[sec]: 10.057143, density[hashes/sec]: 90.0 
ingesting # 11 : track: tests/data/Nine_Lives/12-Attitude_Adjustment.mp3, duration[sec]: 10.057143, density[hashes/sec]: 105.0 
ingesting # 12 : track: tests/data/Nine_Lives/13-Fallen_Angels.mp3, duration[sec]: 10.057143, density[hashes/sec]: 109.0 
ingesting # 13 : track: tests/data/Nine_Lives/californication.mp3, duration[sec]: 321.123265, density[hashes/sec]: 80.0 
Added 39724 hashes (88.1 hashes/sec)
Saved fprints for 14 files ( 39724 hashes) to fpdbase.pklz (0.00% dropped)

and when listing; note that the density is already being stored as a global property of the hash table:

(venv) vitor@hal9000:~/Projects/audfprint$ python audfprint.py --skip-existing list --dbase fpdbase.pklz
Thu Nov  8 00:25:52 2018 Reading hash table fpdbase.pklz
Read fprints for 14 files ( 39724 hashes) from fpdbase.pklz (0.00% dropped)
track: 'tests/data/Nine_Lives/01-Nine_Lives.mp3', hash_count[units]: 1114, duration[s]: 10.057143211364746, real_density: 110.76704155322763, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/02-Falling_In_Love.mp3', hash_count[units]: 978, duration[s]: 10.057143211364746, real_density: 97.24431475678333, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/03-Hole_In_My_Soul.mp3', hash_count[units]: 994, duration[s]: 10.057143211364746, real_density: 98.83522379165912, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/04-Taste_Of_India.mp3', hash_count[units]: 1113, duration[s]: 10.057143211364746, real_density: 110.6676097385479, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/05-Full_Circle.mp3', hash_count[units]: 991, duration[s]: 10.057143211364746, real_density: 98.5369283476199, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/06-Something_s_Gotta_Give.mp3', hash_count[units]: 1139, duration[s]: 10.057143211364746, real_density: 113.25283692022107, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/07-Ain_t_That_A_Bitch.mp3', hash_count[units]: 1132, duration[s]: 10.057143211364746, real_density: 112.5568142174629, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/08-The_Farm.mp3', hash_count[units]: 1079, duration[s]: 10.057143211364746, real_density: 107.28692803943682, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/09-Crash.mp3', hash_count[units]: 1081, duration[s]: 10.057143211364746, real_density: 107.4857916687963, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3', hash_count[units]: 1073, duration[s]: 10.057143211364746, real_density: 106.69033715135839, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/11-Pink.mp3', hash_count[units]: 907, duration[s]: 10.057143211364746, real_density: 90.18465591452195, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/12-Attitude_Adjustment.mp3', hash_count[units]: 1050, duration[s]: 10.057143211364746, real_density: 104.40340541372443, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/13-Fallen_Angels.mp3', hash_count[units]: 1095, duration[s]: 10.057143211364746, real_density: 108.87783707431262, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/californication.mp3', hash_count[units]: 25978, duration[s]: 321.1232604980469, real_density: 80.89728523467706, fingerprinted_density: 100.0

The fact that the density does not strictly stick to the --density value is totally acceptable, as the algorithm relies on the overlap, hop and window parameters to try to ensure that the resulting density equals the actual wanted density.

ezavesky · 2019-01-15T23:58:05Z

Many parts of this change seemed useful, what was the remaining complication? Was it individual density specification, added stats, or something else?

vriez added 16 commits September 13, 2018 16:06

requirements.txt and Makefile enhanced:

11cac3e

- matplotlib has been added - The paths to the songs where misspelled

JSON object being returned instead of a string.

c2d9ddf

objects being displayed in compliance with the --find-time-range option

02bb61c

checking results backwards

08547f8

displaying all retrieved track hashes.

a2273a5

a bit more grepeable output for the list option.

0142aee

Now storing the ’fingeprinted_density’, the density with which the tr…

3a8c62a

…ack has been fingerprinted.

a bit more grepeable output for the new option.

dc842ec

Improved the displayed messages within list and new commands, as …

55ae5d3

…well as added the -O|--json option for the `match` command.

--verbose option needs to be double-checked, it displays additional i…

416c8ba

…nformation by default.

Added the minimal Unix tools in order to run the app. Also, some slig…

48991f9

…ht fixes have been made on for the hash_table content printing.

Some suggestions on DOWNLOAD AND INSTALLATION

b8b1262

Some improvements made on the Object descriptors explanations.

b8282fc

-O added onto README.md

bf3dccc

fix around --verbose.

a91497d

no longer saving query matches into a .csv file since they are not st…

8283d72

…rictly the same as displayed by the `match` command.

dpwe reviewed Nov 5, 2018

View reviewed changes

ezavesky mentioned this pull request Jan 16, 2019

Test objects returns/json #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON objects returns and DBASE operations displays in deeper detail. #45

JSON objects returns and DBASE operations displays in deeper detail. #45

vriez commented Oct 30, 2018

dpwe left a comment

vriez commented Nov 7, 2018

ezavesky commented Jan 15, 2019

JSON objects returns and DBASE operations displays in deeper detail. #45

Are you sure you want to change the base?

JSON objects returns and DBASE operations displays in deeper detail. #45

Conversation

vriez commented Oct 30, 2018

JSON Returns

DBASE

CREDITS / REFERENCE

dpwe left a comment

Choose a reason for hiding this comment

vriez commented Nov 7, 2018

ezavesky commented Jan 15, 2019