Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON objects returns and DBASE operations displays in deeper detail. #45

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

vriez
Copy link

@vriez vriez commented Oct 30, 2018

The main achievement of this work goes to the file_match_to_objs function creation. The function is called through the -O option and outputs a json file, example:

JSON Returns

python audfprint.py --density 100 -O --skip-existing --find-time-range match --dbase fpdbase.pklz tests/data/query.mp3

_Wed Oct 31 00:46:34 2018 Reading hash table fpdbase.pklz
Read fprints for 14 files ( 39724 hashes) from fpdbase.pklz (0.00% dropped)
{'track': 'tests/data/Nine_Lives/05-Full_Circle.mp3', 'query_match_length': 4.481451247165533, 'query_match_start_at': 0.25541950113378686, 'track_match_start_at': 0.25541950113378686, 'coverage': 0.5610389412786497, 'fingerprinting_duration': 0.0, 'query_duration': 5.642448979591837, 'total_fingerprints_analyzed': 54, 'total_tracks_analyzed': 0, 'confidence': 0.9074074074074074}_

WHY? When using audfprint to process queries (stream) coming from multiple Kafka topics, receiving JSON objects has been a significant improvement.

In order to silent the first two lines, some further decoupling effort would have to be made, for this is located within the hash_table.

DBASE

python audfprint.py --density 100 --skip-existing new --dbase fpdbase.pklz tests/data/Nine_Lives/*.mp3

_ingesting # 0 : track: tests/data/Nine_Lives/01-Nine_Lives.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0 
`...`
ingesting # 12 : track: tests/data/Nine_Lives/13-Fallen_Angels.mp3, duration[sec]: 10.057143, density[hashes/sec]: 109.0 
Added 39724 hashes (88.1 hashes/sec)
Saved fprints for 14 files ( 39724 hashes) to fpdbase.pklz (0.00% dropped)_

WHY? When comparing multiple dbases pragmatically, if one wants to compare the results against the fingerprinting technique, a plausible and minimalist solution would be to keep it stored within the hash_table .
Explanation: duration[sec] and density[hashes/sec] have so been named so it can be accessed as a key, without any further parsing. I agree that it would look better without the [*], but it may not be obvious.

python audfprint.py --density 100 --skip-existing --find-time-range list --dbase fpdbase.pklz

Wed Oct 31 00:52:38 2018 Reading hash table fpdbase.pklz
Read fprints for 14 files ( 39724 hashes) from fpdbase.pklz (0.00% dropped)
track: 'tests/data/Nine_Lives/01-Nine_Lives.mp3', hash_count[units]: 1114, duration[s]: 10.057143211364746, real_density: 110.76704155322763, fingerprinted_density: 100.0
`...`
track: 'tests/data/Nine_Lives/13-Fallen_Angels.mp3', hash_count[units]: 1095, duration[s]: 10.057143211364746, real_density: 108.87783707431262, fingerprinted_density: 100.0

WHY? In case one needs to really get a closer look at the fingerprinting process. As the effective/real fingerprinting density and the desired density --density may vary, is seems reasonable to display such information. Not least, the duration may also be an important property to be displayed as the track naming can be automated and hence become humanly hard to understand the outputs.

CREDITS / REFERENCE

The naming for the JSON keys/properties has been inspired by:
https://github.com/AddictedCS/soundfingerprinting

Copy link
Owner

@dpwe dpwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the suggested changes.

I like the idea of JSON output, but maybe have it write to a file, rather than as an alternative STDOUT?

It makes sense to store the analysis density in the hash table, but not separately for each track, I think it should be a global property of the database. I don't think we want databases where the density is different for different reference items.

The match stats report seems very specific to your interests, I'm not sure if we should add them to the main release.

@vriez
Copy link
Author

vriez commented Nov 7, 2018

Thank you very much for taking the time, Dan.

Yes, I agree that the report is quite specific, it just may interest others either learning or with similar goals.

The purposes of obtaining a JSON are mainly:

  • Trying to bring a convention
  • Convenience in converting the return object into a DML query.

So, writing directly to a file could be a useful option for some and use cases may vary a lot. I would say that just returning the plain object an allowing the user to decide how it is to be used could be more pleasant/functional.

The issue with the density only being a global property is that it is often the case when tracks fingerprinted under the same --density into the same --dbase, for example:

(venv) vitor@hal9000:~/Projects/audfprint$ python audfprint.py --density 100 --skip-existing new --dbase fpdbase.pklz tests/data/Nine_Lives/*.mp3
ingesting # 0 : track: tests/data/Nine_Lives/01-Nine_Lives.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0 
ingesting # 1 : track: tests/data/Nine_Lives/02-Falling_In_Love.mp3, duration[sec]: 10.057143, density[hashes/sec]: 97.0 
ingesting # 2 : track: tests/data/Nine_Lives/03-Hole_In_My_Soul.mp3, duration[sec]: 10.057143, density[hashes/sec]: 99.0 
ingesting # 3 : track: tests/data/Nine_Lives/04-Taste_Of_India.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0 
ingesting # 4 : track: tests/data/Nine_Lives/05-Full_Circle.mp3, duration[sec]: 10.057143, density[hashes/sec]: 99.0 
ingesting # 5 : track: tests/data/Nine_Lives/06-Something_s_Gotta_Give.mp3, duration[sec]: 10.057143, density[hashes/sec]: 113.0 
ingesting # 6 : track: tests/data/Nine_Lives/07-Ain_t_That_A_Bitch.mp3, duration[sec]: 10.057143, density[hashes/sec]: 113.0 
ingesting # 7 : track: tests/data/Nine_Lives/08-The_Farm.mp3, duration[sec]: 10.057143, density[hashes/sec]: 107.0 
ingesting # 8 : track: tests/data/Nine_Lives/09-Crash.mp3, duration[sec]: 10.057143, density[hashes/sec]: 108.0 
ingesting # 9 : track: tests/data/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3, duration[sec]: 10.057143, density[hashes/sec]: 107.0 
ingesting # 10 : track: tests/data/Nine_Lives/11-Pink.mp3, duration[sec]: 10.057143, density[hashes/sec]: 90.0 
ingesting # 11 : track: tests/data/Nine_Lives/12-Attitude_Adjustment.mp3, duration[sec]: 10.057143, density[hashes/sec]: 105.0 
ingesting # 12 : track: tests/data/Nine_Lives/13-Fallen_Angels.mp3, duration[sec]: 10.057143, density[hashes/sec]: 109.0 
ingesting # 13 : track: tests/data/Nine_Lives/californication.mp3, duration[sec]: 321.123265, density[hashes/sec]: 80.0 
Added 39724 hashes (88.1 hashes/sec)
Saved fprints for 14 files ( 39724 hashes) to fpdbase.pklz (0.00% dropped)

and when listing; note that the density is already being stored as a global property of the hash table:

(venv) vitor@hal9000:~/Projects/audfprint$ python audfprint.py --skip-existing list --dbase fpdbase.pklz
Thu Nov  8 00:25:52 2018 Reading hash table fpdbase.pklz
Read fprints for 14 files ( 39724 hashes) from fpdbase.pklz (0.00% dropped)
track: 'tests/data/Nine_Lives/01-Nine_Lives.mp3', hash_count[units]: 1114, duration[s]: 10.057143211364746, real_density: 110.76704155322763, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/02-Falling_In_Love.mp3', hash_count[units]: 978, duration[s]: 10.057143211364746, real_density: 97.24431475678333, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/03-Hole_In_My_Soul.mp3', hash_count[units]: 994, duration[s]: 10.057143211364746, real_density: 98.83522379165912, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/04-Taste_Of_India.mp3', hash_count[units]: 1113, duration[s]: 10.057143211364746, real_density: 110.6676097385479, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/05-Full_Circle.mp3', hash_count[units]: 991, duration[s]: 10.057143211364746, real_density: 98.5369283476199, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/06-Something_s_Gotta_Give.mp3', hash_count[units]: 1139, duration[s]: 10.057143211364746, real_density: 113.25283692022107, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/07-Ain_t_That_A_Bitch.mp3', hash_count[units]: 1132, duration[s]: 10.057143211364746, real_density: 112.5568142174629, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/08-The_Farm.mp3', hash_count[units]: 1079, duration[s]: 10.057143211364746, real_density: 107.28692803943682, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/09-Crash.mp3', hash_count[units]: 1081, duration[s]: 10.057143211364746, real_density: 107.4857916687963, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3', hash_count[units]: 1073, duration[s]: 10.057143211364746, real_density: 106.69033715135839, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/11-Pink.mp3', hash_count[units]: 907, duration[s]: 10.057143211364746, real_density: 90.18465591452195, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/12-Attitude_Adjustment.mp3', hash_count[units]: 1050, duration[s]: 10.057143211364746, real_density: 104.40340541372443, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/13-Fallen_Angels.mp3', hash_count[units]: 1095, duration[s]: 10.057143211364746, real_density: 108.87783707431262, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/californication.mp3', hash_count[units]: 25978, duration[s]: 321.1232604980469, real_density: 80.89728523467706, fingerprinted_density: 100.0

The fact that the density does not strictly stick to the --density value is totally acceptable, as the algorithm relies on the overlap, hop and window parameters to try to ensure that the resulting density equals the actual wanted density.

@ezavesky
Copy link

Many parts of this change seemed useful, what was the remaining complication? Was it individual density specification, added stats, or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants