-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fuzzy find-duplicates #4
Comments
Hi @openbrian - Nice to hear from you. Do you have a more concrete example of what you're thinking? I don't think it is fuzzy in a way that I would expect, but it does work on a per file basis, which means that it will help identify two directories that share a lot of common content. Namely, when you run it if you have have two sub-directories where there is a lot of overlap, you'll see that in the output. When I originally wrote this, I was trying to clean up from having as many as 6 copies of some pictures in my photo library due to some copying and backing up that went weird, so it definitely helped me in that way. Hope that helps answer your question, if it doesn't, please let me know. |
Ian,
It sounds like the use cases are different. You know the 6 folders that have similar content. For me, I want an app that finds similar folders. I will then inspect the folders manually, or use diff -r, or whatever.
What I'm looking for is a way to characterize a folder and use this characterization and some sort of distance function to find similar folders. It would be like the opposite of hashing, as a hash dramatically changes for each small difference.
I'm going to check out your code anyway.
Cheers,
Brian
Brian
--
Brian DeRocher
…On October 25, 2021 2:31:11 PM EDT, Ian Lee ***@***.***> wrote:
Hi @openbrian - Nice to hear from you.
Do you have a more concrete example of what you're thinking?
I don't think it is fuzzy in a way that I would expect, but it does
work on a per file basis, which means that it will help identify two
directories that share a lot of common content. Namely, when you run it
if you have have two sub-directories where there is a lot of overlap,
you'll see that in the output.
When I originally wrote this, I was trying to clean up from having as
many as 6 copies of some pictures in my photo library due to some
copying and backing up that went weird, so it definitely helped me in
that way.
Hope that helps answer your question, if it doesn't, please let me
know.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#4 (comment)
|
Ah ok, I think what I'm describing does do that, but in a very manual, "you as the user have to stare at the output and make sense of it via pattern matching" sort of way. If you find a way to do what you're describing though, this could provide the start of the code to do that. If you wanted to submit a pull request, I'll definitely take a look at it. |
Ian,
Just saw your PyCon lightning talk from 2016 where you mention find-duplicates. I was wondering if it has a fuzzy search, meaning you copied a folder of files, and then changes some of those files. For example, you make a "backup" folder and you have backups of backups.
I'm looking for a way to identify similar folders. Do you think find-duplicates can do anything like this?
Thanks,
Brian
The text was updated successfully, but these errors were encountered: