Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
To facilitate one-off runs on Berkeley's servers and confirm files are downloading as expected, I've added a runner and cli method to the library to read the
asset_url
and save it locally. Rather than creating separatescrape
methods, this centralizes the logic torunner.py
. Note: This logic satisfies basic one-off scraper needs. Additional platform-specific download logic may be required c.f. Laserfiche #50clean-scraper/clean/runner.py
Lines 86 to 130 in 4f9211d
clean-scraper/clean/cli.py
Lines 110 to 169 in 4f9211d
Summary of Changes
usage.md
contributing.md
Related Issues
This should reduce the number of steps required for pull request approval. Users can ssh to Berkeley's server and initiate downloads.
How to Review
Please review the documentation. A separate PR can contain additional code changes to the download logic. This PR is meant to ensure documentation is up-to-date and contributors are aware of the new pattern.
cc @naumansharifwork