Implement a time-based sharding approach to data collection #19
Labels
data
Issue relates to the tooling data collected from data sources
enhancement
New feature or request
User Story
As a tooling developer I want data to be collected consistently and without failing due to rate limits applied at any source code repository platform.
Detailed Requirement
GitHub (obviously) applies rate limits on API calls, which we rely on heavily to collect data. As we expand the number of topics we are collecting we need to be cognisant of the limits and amend our approach to spread the collection period over multiple hours.
There's a few approaches:
Option 3 seems feasible. The most sensible option seems to be:
This approach should scale as we collect more data. The main thing to be aware of is the overall build time limits, although that should be "OK" as we have a fair amount of head room for the time being.
The text was updated successfully, but these errors were encountered: