- 01. Create Astra Account
- 02. Create Astra Token
- 03. Copy the token
- 04. Open Gitpod
- 05. Setup CLI
- 06. Create Database
- 07. Setup env variables
- 08. Register to OpenAI
- 09. Setup Project
- 10. Vector Search
- 11. Retrieve Augmented Generation
βΉοΈ Account creation tutorial is available in awesome astra
click the image below or go to https://astra.datastax./com
βΉοΈ Token creation tutorial is available in awesome astra
-
Locate
Settings(#1) in the menu on the left, then
Token Management` (#2) -
Select the role
Organization Administrator
before clicking[Generate Token]
The Token is in fact three separate strings: a Client ID
, a Client Secret
and the token
proper. You will need some of these strings to access the database, depending on the type of access you plan. Although the Client ID, strictly speaking, is not a secret, you should regard this whole object as a secret and make sure not to share it inadvertently (e.g. committing it to a Git repository) as it grants access to your databases.
{
"ClientId": "ROkiiDZdvPOvHRSgoZtyAapp",
"ClientSecret": "fakedfaked",
"Token":"AstraCS:fake"
}
You can also leave the windo open to copy the value in a second.
βοΈ Right Click and select open as a new Tab...
In gitpod, in a terminal window:
- Login
astra login --token AstraCS:fake
- Validate your are setup
astra org
Output
gitpod /workspace/workshop-beam (main) $ astra org +----------------+-----------------------------------------+ | Attribute | Value | +----------------+-----------------------------------------+ | Name | [email protected] | | id | f9460f14-9879-4ebe-83f2-48d3f3dce13c | +----------------+-----------------------------------------+
βΉοΈ You can notice we enabled the Vector Search capability
- Create db
workshop_beam
and wait for the DB to become active
astra db create demo-genai -k genai --vector --if-not-exists
π» Output
[INFO] Database 'demo-genai' does not exist. Creating database 'demo-genai' with keyspace 'genai' [INFO] Enabling vector search for database demo-genai [INFO] Database 'demo-genai' and keyspace 'genai' are being created. [INFO] Database 'demo-genai' has status 'PENDING' waiting to be 'ACTIVE' ... [INFO] Database 'demo-genai' has status 'ACTIVE' (took 112341 millis) [OK] Database 'demo-genai' is ready.
- List databases
astra db list
π» Output
+--------------------------+--------------------------------------+-----------+-------+---+-----------+ | Name | id | Regions | Cloud | V | Status | +--------------------------+--------------------------------------+-----------+-------+---+-----------+ | demo-genai | 9e54ff00-57e2-47ed-8699-f94d5dd11b6f | us-east1 | gcp | β | ACTIVE | +--------------------------+--------------------------------------+-----------+-------+---+-----------+
- Describe your db
astra db describe demo-genai
π» Output
+------------------+-----------------------------------------+ | Attribute | Value | +------------------+-----------------------------------------+ | Name | demo-genai | | id | 9e54ff00-57e2-47ed-8699-f94d5dd11b6f | | Status | ACTIVE | | Cloud | GCP | | Regions | us-east1 | | Default Keyspace | genai | | Creation Time | 2023-09-12T08:55:36Z | | | | | Keyspaces | [0] genai | | | | | | | | Regions | [0] us-east1 | | | | +------------------+-----------------------------------------+
- Create
.env
file with variables
astra db create-dotenv demo-genai
- Display the file
cat .env
- Load env variables
set -a
source .env
set +a
env | grep ASTRA
- Access to OpenAI platform and register.
- In your profile, go to
View API KEYS
, create a new key and copy the value in your clipboard. You have a free trial for a month of so.
EXPORT OPENAI_API_KEY=<key>
This command will allows to validate that Java , maven and lombok are working as expected and you can connect.
Note: To create the project i simply when with the astra sdk arachetype as follow
mvn archetype:generate \ -DarchetypeGroupId=com.datastax.astra \ -DarchetypeArtifactId=spring-boot-3x-archetype \ -DarchetypeVersion=0.6.9 \ -DinteractiveMode=false \ -DgroupId=com.datastax.demo \ -DartifactId=genai-demo \ -Dversion=1.0-SNAPSHOT
and added the vector dependency:
<dependency> <groupId>com.datastax.astra</groupId> <artifactId>astra-sdk-vector</artifactId> <version>${astra-sdk-starter.version}</version> </dependency>
- Run connection test:
mvn test -Dtest=ConnectionTest#shouldBeConnectedTest
- Run OpenAI Test:
mvn test -Dtest=OpenAiTest#shouldTestOpenAICreateEmbeddings
- Ingest data
mvn test -Dtest=GenerativeAITest#shouldIngestDocuments
- Open a cqlsh (in a new terminal)
astra db cqlsh genai-demo -k genai
select row_id, metadata_s, blob_text, vector from philosophers
- Similarity Search
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotes
- Similarity Search + MetaData (by Author)
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotesFilteredByAuthor
- Similarity Search + MetaData (by Tags)
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotesFilteredByTags
- Similarity Search with a threshold
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotesWithThreshold
The Full Monty.....
mvn test -Dtest=GenerativeAITest#shouldGenerateQuotesWithRag
- Check list of running db
astra db list
- Resume Db if needed (or create a new once)
astra db resume langchain4j
astra db create langchain4j --if-not-exists
- Make sure you setup the env variables (
$ASTRA_APPLICATION_TOKEN
)
astra db create-dotenv langchain4j
set -a
source .env
set +a
env | grep ASTRA
Go the application.yaml
and check values are correct for your
astra:
database:
name: langchain4j
keyspace: langchain4j
table: langchain4j
@Test
@DisplayName("02. Should Ingest a document")
@EnabledIfEnvironmentVariable(named = "ASTRA_DB_APPLICATION_TOKEN", matches = "Astra.*")
@EnabledIfEnvironmentVariable(named = "OPENAI_API_KEY", matches = "sk.*")
void should_Ingest_Document() {
Document document = FileSystemDocumentLoader.loadDocument(path, DocumentType.TXT);
DocumentSplitter splitter = DocumentSplitters
.recursive(100, 10,
new OpenAiTokenizer(GPT_3_5_TURBO));
EmbeddingStoreIngestor.builder()
.documentSplitter(splitter)
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build().ingest(document);
}
@Test
@DisplayName("03. Should Chat Completion")
@EnabledIfEnvironmentVariable(named = "ASTRA_DB_APPLICATION_TOKEN", matches = "Astra.*")
@EnabledIfEnvironmentVariable(named = "OPENAI_API_KEY", matches = "sk.*")
void should_chat_completion(){
.. //check code in the class
}