Skip to content

Commit

Permalink
docs: add download_agency notes
Browse files Browse the repository at this point in the history
  • Loading branch information
newsroomdev committed Oct 22, 2024
1 parent 1170a90 commit 4f9211d
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 7 deletions.
7 changes: 4 additions & 3 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,8 @@ When coding a new scraper, there are a few important conventions to follow:
- If it's a new state folder, add an empty `__init__.py` to the folder
- Create a `Site` class inside the agency's scraper module with the following attributes/methods:
- `name` - Official name of the agency
- `scrape_meta` - generates a CSV with metadata about videos and other available files (file name, URL, and size at minimum)
- `scrape` - uses the CSV generated by `scrape_meta` to download videos and other files
- `scrape_meta` - generates a JSON with metadata about videos and other available files (file name, URL, and size at minimum)
- `download_agency` - uses the JSON generated by `scrape_meta` to download videos and other files

Below is a pared down version of San Diego's [Site](https://github.com/biglocalnews/clean-scraper/blob/main/clean/ca/san_diego_pd.py) class to illustrate these conventions.

Expand Down Expand Up @@ -278,6 +278,7 @@ Options:
Commands:
list List all available agencies and their slugs.
scrape-meta Command-line interface for generating metadata CSV about...
download_agency Downloads assets retrieved in scrape-meta
```

Running a state is as simple as passing arguments to the appropriate subcommand.
Expand All @@ -292,7 +293,7 @@ pipenv run python -m clean.cli list
pipenv run python -m clean.cli scrape-meta ca_san_diego_pd

# Trigger file downloads using agency slug
pipenv run python -m clean.cli scrape ca_san_diego_pd
pipenv run python -m clean.cli download_agency ca_san_diego_pd
```

For more verbose logging, you can ask the system to show debugging information.
Expand Down
9 changes: 5 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ You can then run a scraper for an agency using its slug:
clean-scraper scrape-meta ca_san_diego_pd
```

> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `scrape` subcommand.
> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `download_agency` subcommand.
To use the `clean` library in Python, import an agency's scraper and run it directly.

```python
from clean.ca import san_diego_pd

san_diego_pd.scrape()
san_diego_pd.download_agency()
```

## Configuration
Expand All @@ -56,6 +56,7 @@ Options:
--help Show this message and exit.

Commands:
list List all available agencies and their slugs.
scrape-meta Command-line interface for downloading CLEAN files.
list List all available agencies and their slugs.
scrape-meta Command-line interface for generating metadata CSV about...
download_agency Downloads assets retrieved in scrape-meta
```

0 comments on commit 4f9211d

Please sign in to comment.