-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON objects returns and DBASE operations displays in deeper detail. #45
base: master
Are you sure you want to change the base?
Conversation
- matplotlib has been added - The paths to the songs where misspelled
…ack has been fingerprinted.
…well as added the -O|--json option for the `match` command.
…nformation by default.
…ht fixes have been made on for the hash_table content printing.
…rictly the same as displayed by the `match` command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the suggested changes.
I like the idea of JSON output, but maybe have it write to a file, rather than as an alternative STDOUT?
It makes sense to store the analysis density in the hash table, but not separately for each track, I think it should be a global property of the database. I don't think we want databases where the density is different for different reference items.
The match stats report seems very specific to your interests, I'm not sure if we should add them to the main release.
Thank you very much for taking the time, Dan. Yes, I agree that the report is quite specific, it just may interest others either learning or with similar goals. The purposes of obtaining a JSON are mainly:
So, writing directly to a file could be a useful option for some and use cases may vary a lot. I would say that just returning the plain object an allowing the user to decide how it is to be used could be more pleasant/functional. The issue with the density only being a global property is that it is often the case when tracks fingerprinted under the same --density into the same --dbase, for example: (venv) vitor@hal9000:~/Projects/audfprint$ python audfprint.py --density 100 --skip-existing new --dbase fpdbase.pklz tests/data/Nine_Lives/*.mp3
ingesting # 0 : track: tests/data/Nine_Lives/01-Nine_Lives.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0
ingesting # 1 : track: tests/data/Nine_Lives/02-Falling_In_Love.mp3, duration[sec]: 10.057143, density[hashes/sec]: 97.0
ingesting # 2 : track: tests/data/Nine_Lives/03-Hole_In_My_Soul.mp3, duration[sec]: 10.057143, density[hashes/sec]: 99.0
ingesting # 3 : track: tests/data/Nine_Lives/04-Taste_Of_India.mp3, duration[sec]: 10.057143, density[hashes/sec]: 111.0
ingesting # 4 : track: tests/data/Nine_Lives/05-Full_Circle.mp3, duration[sec]: 10.057143, density[hashes/sec]: 99.0
ingesting # 5 : track: tests/data/Nine_Lives/06-Something_s_Gotta_Give.mp3, duration[sec]: 10.057143, density[hashes/sec]: 113.0
ingesting # 6 : track: tests/data/Nine_Lives/07-Ain_t_That_A_Bitch.mp3, duration[sec]: 10.057143, density[hashes/sec]: 113.0
ingesting # 7 : track: tests/data/Nine_Lives/08-The_Farm.mp3, duration[sec]: 10.057143, density[hashes/sec]: 107.0
ingesting # 8 : track: tests/data/Nine_Lives/09-Crash.mp3, duration[sec]: 10.057143, density[hashes/sec]: 108.0
ingesting # 9 : track: tests/data/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3, duration[sec]: 10.057143, density[hashes/sec]: 107.0
ingesting # 10 : track: tests/data/Nine_Lives/11-Pink.mp3, duration[sec]: 10.057143, density[hashes/sec]: 90.0
ingesting # 11 : track: tests/data/Nine_Lives/12-Attitude_Adjustment.mp3, duration[sec]: 10.057143, density[hashes/sec]: 105.0
ingesting # 12 : track: tests/data/Nine_Lives/13-Fallen_Angels.mp3, duration[sec]: 10.057143, density[hashes/sec]: 109.0
ingesting # 13 : track: tests/data/Nine_Lives/californication.mp3, duration[sec]: 321.123265, density[hashes/sec]: 80.0
Added 39724 hashes (88.1 hashes/sec)
Saved fprints for 14 files ( 39724 hashes) to fpdbase.pklz (0.00% dropped) and when listing; note that the density is already being stored as a global property of the hash table: (venv) vitor@hal9000:~/Projects/audfprint$ python audfprint.py --skip-existing list --dbase fpdbase.pklz
Thu Nov 8 00:25:52 2018 Reading hash table fpdbase.pklz
Read fprints for 14 files ( 39724 hashes) from fpdbase.pklz (0.00% dropped)
track: 'tests/data/Nine_Lives/01-Nine_Lives.mp3', hash_count[units]: 1114, duration[s]: 10.057143211364746, real_density: 110.76704155322763, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/02-Falling_In_Love.mp3', hash_count[units]: 978, duration[s]: 10.057143211364746, real_density: 97.24431475678333, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/03-Hole_In_My_Soul.mp3', hash_count[units]: 994, duration[s]: 10.057143211364746, real_density: 98.83522379165912, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/04-Taste_Of_India.mp3', hash_count[units]: 1113, duration[s]: 10.057143211364746, real_density: 110.6676097385479, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/05-Full_Circle.mp3', hash_count[units]: 991, duration[s]: 10.057143211364746, real_density: 98.5369283476199, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/06-Something_s_Gotta_Give.mp3', hash_count[units]: 1139, duration[s]: 10.057143211364746, real_density: 113.25283692022107, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/07-Ain_t_That_A_Bitch.mp3', hash_count[units]: 1132, duration[s]: 10.057143211364746, real_density: 112.5568142174629, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/08-The_Farm.mp3', hash_count[units]: 1079, duration[s]: 10.057143211364746, real_density: 107.28692803943682, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/09-Crash.mp3', hash_count[units]: 1081, duration[s]: 10.057143211364746, real_density: 107.4857916687963, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3', hash_count[units]: 1073, duration[s]: 10.057143211364746, real_density: 106.69033715135839, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/11-Pink.mp3', hash_count[units]: 907, duration[s]: 10.057143211364746, real_density: 90.18465591452195, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/12-Attitude_Adjustment.mp3', hash_count[units]: 1050, duration[s]: 10.057143211364746, real_density: 104.40340541372443, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/13-Fallen_Angels.mp3', hash_count[units]: 1095, duration[s]: 10.057143211364746, real_density: 108.87783707431262, fingerprinted_density: 100.0
track: 'tests/data/Nine_Lives/californication.mp3', hash_count[units]: 25978, duration[s]: 321.1232604980469, real_density: 80.89728523467706, fingerprinted_density: 100.0 The fact that the density does not strictly stick to the --density value is totally acceptable, as the algorithm relies on the overlap, hop and window parameters to try to ensure that the resulting density equals the actual wanted density. |
Many parts of this change seemed useful, what was the remaining complication? Was it individual density specification, added stats, or something else? |
The main achievement of this work goes to the
file_match_to_objs
function creation. The function is called through the -O option and outputs a json file, example:JSON Returns
python audfprint.py --density 100 -O --skip-existing --find-time-range match --dbase fpdbase.pklz tests/data/query.mp3
WHY? When using
audfprint
to process queries (stream) coming from multiple Kafka topics, receiving JSON objects has been a significant improvement.In order to silent the first two lines, some further decoupling effort would have to be made, for this is located within the
hash_table
.DBASE
python audfprint.py --density 100 --skip-existing new --dbase fpdbase.pklz tests/data/Nine_Lives/*.mp3
WHY? When comparing multiple dbases pragmatically, if one wants to compare the results against the fingerprinting technique, a plausible and minimalist solution would be to keep it stored within the
hash_table
.Explanation: duration[sec] and density[hashes/sec] have so been named so it can be accessed as a key, without any further parsing. I agree that it would look better without the [*], but it may not be obvious.
python audfprint.py --density 100 --skip-existing --find-time-range list --dbase fpdbase.pklz
WHY? In case one needs to really get a closer look at the fingerprinting process. As the effective/real fingerprinting density and the desired density
--density
may vary, is seems reasonable to display such information. Not least, the duration may also be an important property to be displayed as the track naming can be automated and hence become humanly hard to understand the outputs.CREDITS / REFERENCE
The naming for the JSON keys/properties has been inspired by:
https://github.com/AddictedCS/soundfingerprinting