Skip to content

dhilst/scrapjobs.go

Repository files navigation

scrapjobs

Scrapes over tech jobs websites, and provides a nice search interface for it.

You can try https://scrapjobs.xyz/ live!

Usage

scrapjobs demo

The search is triggered on key pressed and throttled to preserve the backend. On the backend we simply execute the search and return the results as JSON. In the frontend again we swap generate HTML dynamically and replace .innerHTML.

Keybinds

  • / to jump to the search bar
  • <tab> to cycle between links and tag inputs

Search

  • rust backend use simple words as terms
  • golang backend -java use - before a word to remove it from the search
  • golang #new use # before a words to search for a tag

Local tags

  • Use !sometag in the >___ input aside each vacancy to add a local tag.
  • Use -sometag in the >___ input aside each vacancy to iremove a local tag !sometag
    • Local tags are saved in the browser, use it to bookkeep your applications.

The data ETL

To populate the database I downloaded the jobs from public available sources, I didn't used any authenticated session to download the data, the datata is not shared in the repository for obvious reasons anyway.

To download the data I do it in 4 steps

  1. Run the getLinks scrappers: node scrap.mjs getLinks. This scrapper outptus a list of links in JSON format. It must open the browser, get the links and output as JSON.
  2. Run the dowloadData scrappers .. links json .. | node scrap downloadLinks rustjobs rust rustjobs othertag. These scrappers get the links from the previous step, download the data from each link and save in the output/. folder. It outputs the files saved as a JSON string array
  3. Run the metadata.ipynb notebook to populate the metadata. This notebook analyses the descriptions of the vacancies and generate metadata that is saved in the database. This metadata includes the remote status for the position for example. To run the notebook use ipython3 '%run metadata.ipynb'.
  4. Run the importer. .. output json files .. | (cd tools/; go run .) This is not a scrapper. It reads the list of files generated in the previous step and load it into the database. The new tag is cleared for old entries, and new entries are inserted with the new tag setted.

Running all the steps:

# 1. Get the links
node scrap.mjs getLinks > links.json

# 2. Get the jobs
node scrap.mjs downloadJobs < links.json > jobs.json

# 3. Enrich the metadata
.venv/bin/python3 metadata.py < jobs.json > jsonAnnotated.json

# 4. Load in the DB
tools/tools < jsonAnnotated.json

This will scrap and insert new entries in the database, updating the new tags accordingly.

The Backend

The backend lives in backend/web-service-gin folder. I'm using gin framework. The backend has only one endpoints where the user can send a query or the websocket at /ws/server. The / loads index.html a pure HTML client that render searches interactively.

The Database

I'm using Postgres with full-text search. At each keystroke in the frontend (after being throttled), the search is sent to the backend. The backend then interprets thje proper query, run and returns the JSON.

The database schema is in db.sql for now, you need it in order to create the database.

TODO

  • Improve the first-time setup
  • Deploy automation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published