Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytx][ncmec] NCMEC fetch implementation unable to make progress #1679

Open
Dcallies opened this issue Nov 4, 2024 · 0 comments
Open

[pytx][ncmec] NCMEC fetch implementation unable to make progress #1679

Dcallies opened this issue Nov 4, 2024 · 0 comments
Assignees
Labels
ncmec Pertaining to the NCMEC Hash API or cybertips python-threatexchange Items related to the threatexchange python tool / library

Comments

@Dcallies
Copy link
Contributor

Dcallies commented Nov 4, 2024

There exists some data in some NCMEC environments that have a lot of records all the same second. The current logic will keep fetching until it gets enough data to advance the checkpoint, of which the current smallest granularity is one second. Unfortunately, the amount of data that needs to be fetched in some cases is quite large, and frequently busts storage solutions (especially on HMA).

The fix is to store the "next" URL when the fetch granularity is one second. It's unclear what the behavior of the NCMEC database will be in this circumstance (if it's based on an offset, this may cause records to be skipped in some cases).

If we want to be double defensive, we can invalidate the next URL if it was stored more than say ~1 day ago.

@Dcallies Dcallies added python-threatexchange Items related to the threatexchange python tool / library ncmec Pertaining to the NCMEC Hash API or cybertips labels Nov 4, 2024
@Dcallies Dcallies self-assigned this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ncmec Pertaining to the NCMEC Hash API or cybertips python-threatexchange Items related to the threatexchange python tool / library
Projects
None yet
Development

No branches or pull requests

1 participant