Demo showing how to index AstraDB data into Glean
You can follow this tutorial fully in a google collab or follow the instructions below to run locally
ℹ️ Astra Reference documentation
✅ 1.1.a
: Create an Astra ACCOUNT
Access https://astra.datastax.com and register with Google
or Github
account.
✅ 1.1.b
: Create an Astra Database
Get to the databases dashboard (by clicking on Databases in the left-hand navigation bar, expanding it if necessary), and click the [Create Database]
button on the right.
- ℹ️ Fields Description
Field | Description |
---|---|
Vector Database vs Serverless Database | Choose Vector Database In june 2023, Cassandra introduced the support of vector search to enable Generative AI use cases. |
Database name | It does not need to be unique, is not used to initialize a connection, and is only a label (keep it between 2 and 50 characters). It is recommended to have a database for each of your applications. The free tier is limited to 5 databases. |
Cloud Provider | Choose whatever you like. Click a cloud provider logo, pick an Area in the list and finally pick a region. We recommend choosing a region that is closest to you to reduce latency. In free tier, there is very little difference. |
Cloud Region | Pick region close to you available for selected cloud provider and your plan. |
If all fields are filled properly, clicking the "Create Database" button will start the process.
It should take a couple of minutes for your database to become Active
.
✅ 1.1.c
: Create an Astra TOKEN
To connect to your database, you need the API Endpoint and a token. The api endpoint is available on the database screen, there is a little icon to copy the URL in your clipboard. (it should look like https://<db-id>-<db-region>.apps.astra.datastax.com
).
To get a token click the [Generate Token]
button on the right. It will generate a token that you can copy to your clipboard.
✅ 2.1.a
: Create and activate a virtual environment
python3 -m venv venv
macOS
source venv/bin/activate
Windows
venv\Scripts\activate
✅ 2.1.b
:Install the dependencies
pip install astrapy==1.4.1 --no-deps
pip install -r requirements.txt
✅ 2.1.c
: Edit.env
Copy .env.example
as .env
# Astra Configuration
export ASTRA_DB_APPLICATION_TOKEN=<change_me>
export ASTRA_DB_API_ENDPOINT=<change_me>
export ASTRA_DB_COLLECTION_NAME="plain_collection"
# Glean Configuration
export GLEAN_CUSTOMER=<you>
export GLEAN_DATASOURCE_NAME=<change_me>
export GLEAN_API_TOKEN=<change_me>
✅ 2.1.d
:Run the script
python3 astra-glean-import-job.py