Scrapes over tech jobs websites, and provides a nice search interface for it.
You can try https://scrapjobs.xyz/ live!
The search is triggered on key pressed and throttled to preserve the backend.
On the backend we simply execute the search and return the results as JSON. In
the frontend again we swap generate HTML dynamically and replace .innerHTML
.
/
to jump to the search bar<tab>
to cycle between links and tag inputs
rust backend
use simple words as termsgolang backend -java
use-
before a word to remove it from the searchgolang #new
use#
before a words to search for a tag
- Use
!sometag
in the>___
input aside each vacancy to add a local tag. - Use
-sometag
in the>___
input aside each vacancy to iremove a local tag!sometag
- Local tags are saved in the browser, use it to bookkeep your applications.
To populate the database I downloaded the jobs from public available sources, I didn't used any authenticated session to download the data, the datata is not shared in the repository for obvious reasons anyway.
To download the data I do it in 4 steps
- Run the getLinks scrappers:
node scrap.mjs getLinks
. This scrapper outptus a list of links in JSON format. It must open the browser, get the links and output as JSON. - Run the dowloadData scrappers
.. links json .. | node scrap downloadLinks rustjobs rust rustjobs othertag
. These scrappers get the links from the previous step, download the data from each link and save in theoutput/.
folder. It outputs the files saved as a JSON string array - Run the
metadata.ipynb
notebook to populate the metadata. This notebook analyses the descriptions of the vacancies and generate metadata that is saved in the database. This metadata includes the remote status for the position for example. To run the notebook useipython3 '%run metadata.ipynb'
. - Run the importer.
.. output json files .. | (cd tools/; go run .)
This is not a scrapper. It reads the list of files generated in the previous step and load it into the database. Thenew
tag is cleared for old entries, and new entries are inserted with thenew
tag setted.
Running all the steps:
# 1. Get the links
node scrap.mjs getLinks > links.json
# 2. Get the jobs
node scrap.mjs downloadJobs < links.json > jobs.json
# 3. Enrich the metadata
.venv/bin/python3 metadata.py < jobs.json > jsonAnnotated.json
# 4. Load in the DB
tools/tools < jsonAnnotated.json
This will scrap and insert new entries in the database, updating the
new
tags accordingly.
The backend lives in backend/web-service-gin
folder. I'm using gin framework.
The backend has only one endpoints where the user can send a query or the websocket
at /ws/server
. The /
loads index.html
a pure HTML client that
render searches interactively.
I'm using Postgres with full-text search. At each keystroke in the frontend (after being throttled), the search is sent to the backend. The backend then interprets thje proper query, run and returns the JSON.
The database schema is in db.sql
for now, you need it in order to create the database.
- Improve the first-time setup
- Deploy automation