-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuning updates to the QnA RAG Demo #154
base: develop
Are you sure you want to change the base?
Conversation
Point to the Superset tutorial in the README file. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Make minor fixes in the README files of the Bike Sharing and Superset demos and tutorials: * Bike Sharing: Fix the path pointing to the location of the demo. * Superset: Fix the last instruction in the procedure section. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Fix various grammar and syntax errors in the README file. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Fix grammar and syntax errors in the first Notebook of the Superset tutorial. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Fix grammar and syntax errors in the second Notebook. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Fix grammar and syntax errors in the third Notebook of the Superset tutorial. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Signed-off-by: Dimitris Poulopoulos <[email protected]>
Remove the instructions that create a separate conda Python environment for this tutorial. The Wind Turbine tutorial runs as is in the default data-science Jupyter image. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Add a troubleshooting guide that add instructions on how to use a proxy within a terminal environment, a Notebook, and KServe. This is needed if the cluster is deployed behind a proxy. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Add a troubleshooting sections in the README file to account for the case where EzUA is deployed behind a proxy. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Document the branching model of the repository on the main README file. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Fix the section that defines the environment variables of the vector store ISVC in the second Notebook. Closes #123 Signed-off-by: Dimitris Poulopoulos <[email protected]>
Remove the instruction that mentions the `wind-turbine` conda kernel from the README file, as it is no longer valid. Instead, use the default Python 3 environment. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Change the application image version to `0.2.1` and increase the memory resource to 1 Gi. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Connect the repository with the internal CI/CD pipeline to automate the process of building and pushing the docker images to the public registry. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Add two base images for building a Jupyter Server Docker image: * A base image that installs any basic utilities and packages. * A Jupyter image that provides a basic JupyterLab installation. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Extend the Makefile to build and push the base Jupyter images for the demos and tutorials. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Move the Question-Answering demo to a new `rag-demos` directory. This change marks the beginning of a new demo section based on LLMs and vector databases. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Introduce a new GPU variant of the Question-Answering demo, mirroring its CPU counterpart but leveraging GPUs for both the Embeddings model and the LLM model Inference Services (ISVCs). Using KServe with the Triton Inference Server backend, significantly boosts performance. Triton supports different backends itself, from simple Python scripts to TensorRT engines. The LLM ISVC runs the Llama 2 7B variant on the TensorRT-LLM backend. The Embeddings model ISVC runs BGE-M3, fine-tuned with data from the EzUA, EzDF, MLDE, and MLDM docs, ensuring optimized response accuracy and speed. Signed-off-by: Dimitris Poulopoulos <[email protected]>
Extend the Makefile to build and push the Docker images for the Question-Answering GPU demo. The images for this demo are: * An image for the Vector Store component. * An image for the LLM transformer component. * An image for the front-end (i.e., Gradio). * An image for the customized Triton Inference Server. Signed-off-by: Dimitris Poulopoulos <[email protected]>
@AlexanderOllman any changes to the notebook for the finetuning? |
Hey @tybritten, Thank you for submitting this PR; it's concise and to the point—well done. Here's my feedback. Please make the necessary amendments, so we can proceed with merging:
|
I'm not sure I'm following on the commit content/messages approach. Normally the PR is squashed and merged resulting in all the changes in a single commit. |
A PR may consist of varying numbers of commits, depending on the scope of the feature it addresses. For minor bug fixes, a single commit might suffice. However, the introduction of a significant new feature could involve dozens of commits. However, this is orthogonal to the commit message. Regardless of whether your PR comprises a single commit or multiple, the commit message(s) should always be descriptive, clearly outlining the changes made. Thus, it's entirely feasible to consolidate your changes into a single, well-documented commit this time. My suggestion is to squash everything into a single commit, document in the commit message what this demo is about, and force push it. Then, we can merge it. |
Provide a clear and concise description of the content changes you're proposing. List all the changes you are making
to the content.
Checklist:
Reviewer's Tasks (for maintainers reviewing this PR):