Crawler engine to ingest WBW gsheets into typesense-server for real-time search.
You need typesense key with write access to ingest data into server.
cp .env.example .env
yarn install
Modify TYPESENSE_HOST
, TYPESENSE_PORT
, TYPESENSE_PROTOCOL
and TYPESENSE_KEY
afterwards. Then you're good to go.
IMPORTANT:
Google sheets must be published to web in order to be crawled
Crawler will read all scripts in metadata
directory to intepret sheet structure. Each script represent an index and must contains:
- schema : typesense schema object. See here for reference.
- sheetId : a public google-sheets ID i.e.
https://docs.google.com/spreadsheets/u/1/d/<SHEET_ID>/view
- indexId : typesense's index name. Must have
wbw-
prefix. - worksheet : List of worksheets in given gsheets
A field named
order
must be defined manually in the metadata with dataint32
data type as sortable field
Every data row, id
and sheet
fields will be added to mark which worksheet it's originated.
Index will be made automatically when it's not present. To prevent server rejection, crawling process will be executed sequentially (not in parallel).
We have two flags to make development easier:
--test_script=SCRIPTNAME
will only execute given script in themetadata/
directory--dry_run
to run as dry run mode / not inserting data into typesense
Please refers to wargabantuwarga contributing guidelines