Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vector-search-tutorial.adoc #1241

Merged
merged 3 commits into from
Oct 7, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ hazelcast:

```
+
* `hazelcast.partition.count: Vector search performs better with fewer partitions. On the other hand, fewer partitions means larger partitions, which can cause problems during migration. A discussion of the tradeoffs can be found here:
* `hazelcast.partition.count`: Vector search performs better with fewer partitions. On the other hand, fewer partitions means larger partitions, which can cause problems during migration. A discussion of the tradeoffs can be found here:
(https://docs.hazelcast.com/hazelcast/latest/data-structures/vector-search-overview#partition-count-impact).
* `jet`: This is the Hazelcast stream processing engine. Hazelcast pipelines are a scalable way to rapidly ingest or process large amounts of data. This example uses a pipeline to compute embeddings and load them into a vector collection, so stream processing must be enabled.
* `vector-collection`: If you are using a vector collection, you must configure the index settings. There are no defaults. In this case, the name of the collection is `images` and it has one index, which is called `semantic-search`. The dimension and distance metric are dependent on the embedding being used. The `dimension` must match the size of the vectors produced by the embedding. The `metric` defines the algorithm used to compute the distance between 2 vectors and it must match the one used to train the embedding. This tutorial uses the CLIP sentence transformer for embeddings. CLIP uses a dimension of 512 and cosine distance metric (literally the cosine of the angle between 2 vectors, adjusted to be non-negative). For more detail on supported options, see xref:data-structures:vector-collections.adoc[].
Expand Down Expand Up @@ -156,7 +156,7 @@ A solution pipeline is available in the

You need to use a Jupyter notebook for the remaining steps.

. Start the Jupyter process inside Docker.
. Retrieve the login URL for Jupyter from the logs
+
```sh
docker compose logs jupyter
Expand Down
Loading