This guide assumes you have already set up a SageMaker Studio environment as described in the blog post. Follow these steps to set up your environment and run the code:
-
Clone the repository:
git clone https://github.com/aws-samples/finetune-bge-embeddings-blog.git cd finetune-bge-embeddings-blog
-
Create the Conda environment:
conda env create -f environment.yml
This step may take several minutes to complete.
-
Activate the environment:
conda init source ~/.bashrc conda activate ft-embedding-blog
-
Add the new Conda environment to Jupyter:
python -m ipykernel install --user --name=ft-embedding-blog
-
Open the Jupyter notebook:
- From the SageMaker Studio Launcher, open the repository folder named
finetune-bge-embeddings-blog
. - Open the file
finetune-bge-embeddings.ipynb
.
- From the SageMaker Studio Launcher, open the repository folder named
-
Select the correct kernel:
- From the "Kernel" dropdown menu in the notebook, select "Change Kernel...".
- Choose "ft-embedding-blog".
- If you don't see the kernel, try refreshing your browser.
-
You're now ready to run the code in the notebook. Follow the instructions in each cell to generate synthetic data, fine-tune the BGE model, evaluate its performance, and deploy it using Amazon SageMaker.
Note: Make sure you have the necessary permissions and quotas set up in your AWS account to use Amazon Bedrock and SageMaker services as described in the blog post.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.