Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzzy find-duplicates #4

Open
openbrian opened this issue Oct 20, 2021 · 3 comments
Open

fuzzy find-duplicates #4

openbrian opened this issue Oct 20, 2021 · 3 comments

Comments

@openbrian
Copy link

Ian,

Just saw your PyCon lightning talk from 2016 where you mention find-duplicates. I was wondering if it has a fuzzy search, meaning you copied a folder of files, and then changes some of those files. For example, you make a "backup" folder and you have backups of backups.

I'm looking for a way to identify similar folders. Do you think find-duplicates can do anything like this?

Thanks,
Brian

@IanLee1521
Copy link
Owner

Hi @openbrian - Nice to hear from you.

Do you have a more concrete example of what you're thinking?

I don't think it is fuzzy in a way that I would expect, but it does work on a per file basis, which means that it will help identify two directories that share a lot of common content. Namely, when you run it if you have have two sub-directories where there is a lot of overlap, you'll see that in the output.

When I originally wrote this, I was trying to clean up from having as many as 6 copies of some pictures in my photo library due to some copying and backing up that went weird, so it definitely helped me in that way.

Hope that helps answer your question, if it doesn't, please let me know.

@openbrian
Copy link
Author

openbrian commented Oct 26, 2021 via email

@IanLee1521
Copy link
Owner

Ah ok, I think what I'm describing does do that, but in a very manual, "you as the user have to stare at the output and make sense of it via pattern matching" sort of way.

If you find a way to do what you're describing though, this could provide the start of the code to do that. If you wanted to submit a pull request, I'll definitely take a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants