Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag. #41

berkes · 2021-03-03T15:51:26Z

Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag.

Details

This spider should get a list of instances where it starts (seeds) and follow across instances to fetch toots/updates for a certain hashtag (e.g. #vacancy, #job etc.).

Deliverable

It should try to denormalize toots. When instance "example.com" has a toot by '@[email protected]" and "example.org" has this toot too, it should appear only once in the datafile.
If an update is manually re-tooted (i.e. text copied as a new update) it may appear multiple times. Denormalizing based on content of an update is not important.
Boosts and or replies should be ignored (for now).
If tooling is required to setup the environment (pipenv etc) a command should be presented how to get this running for devs and CI.
It should be one command, so that integration is easy. Preferably a command that runs and then stops over a deamon.
ScraPy is preferered as other parts of this project use that already.

berkes · 2021-03-17T19:25:43Z

I've experimented with the mastodon API through elefren.

The preliminary result is a project called hunter2.

Usage: target/debug/hunter2 [options]

Options:
    -h, --help          print this help menu
    -r, --register      register hunter2 with your instance.
    -f, --follow        follow live updates.
    -p, --past          fetch past updates.

Using this, I've filled an initial MeiliSearch index. It now runs on 178.62.220.231 (This will change, will go down, and will be replaced with a proper, https backed, domain-named, instance).

berkes added task fedifind Issues related to the intermediate "Fedi Find" project. labels Mar 3, 2021

berkes added this to the Search milestone Mar 3, 2021

berkes added the scrapy label Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag. #41

Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag. #41

berkes commented Mar 3, 2021

berkes commented Mar 17, 2021

Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag. #41

Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag. #41

Comments

berkes commented Mar 3, 2021

Details

Deliverable

berkes commented Mar 17, 2021