Real-Time-3-pipeline-LLM-Financial-Advisor 🔋🔋🔋

Introduction

A Production-Ready LLMOps system based on live financial data and consists of multiple MLOps and RAG pipelines. Following the 3-pipeline architecture, it consists of:

Training Pipeline : Loads a pretrained model on a curated dataset (synthetic data generation pipeline to be added soon) and finetunes it on serverless GPU provider, uses an experiment tracker to log training curves and checkpoints to the model registry.
Streaming Pipeline : Collects data from a live source API in batches, processes it and populates a vectorDB with the contextual data. The streaming pipeline then can be deployed to any virtual machine provider.
Inference Pipeline : Downloads the best model from registry , creates a prompt template from user question, chat-history and vectorDB context, feeds it into the model using a RAG framework and logs the prompt/response pair on the experiment tracker. A ReSTful endpoint is deployed on the serverless GPU provider for the inference pipeline.

Dependencies 🛠️

HuggingFace-TRL for QLoRA SFT training.
WandB for experiment tracking and model registry.
Beam for serverless GPU compute.
Alpaca API for historical and real-time access to equities, stocks, and crypto data.
ByteWax for document processing and embeddings.
Qdrant Cloud for storing the embeddings in the cloud vectorDB.
AWS for deploying the streaming pieline on EC2, and storing the container image in ECR.
LangChain for creating sequential context extracting and response generation chains.

Architecture 📐

Training Pipeline

Setup instructions given in pipelines/training_pipeline.

The dataset is uploaded to beam volume and the training script runs on 1xA10Gi to finetune [NousResearch/Nous-Hermes-2-Mistral-7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO).

The training curves are logged to WandB

Best Model is stored in Model Registry via a callback at the end of training Loop.

Streaming Pipeline

Setup instructions given in pipelines/streaming_pipeline.

The Alpaca API provides 24/7 data access, which is processed and embedded with bytewax and then dumped into Qdrant Cloud DB in batches. This RAG pipeline is then Dockerized and then deployed to AWS EC2 via Github Actions CI/CD Pipeline. See cd_streaming_pipeline.yaml for more.

Inference Pipeline

Setup instructions given in pipelines/inference_pipeline. The Langchain chains for context retrieval and response generation is deployed on Beam serverless as a ReSTful API Then the model is prompted via a CuRL request:

Upcoming 🔜

The Synthetic Data generation pipeline (via Distilabel) for training the model will be uploaded soon!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
pipelines		pipelines
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time-3-pipeline-LLM-Financial-Advisor 🔋🔋🔋

Introduction

Dependencies 🛠️

Architecture 📐

Training Pipeline

Streaming Pipeline

Inference Pipeline

Upcoming 🔜

📫 Get in Touch

About

Releases

Packages

Languages

License

MuhammadBinUsman03/Real-Time-3-pipeline-LLM-Financial-Advisor

Folders and files

Latest commit

History

Repository files navigation

Real-Time-3-pipeline-LLM-Financial-Advisor 🔋🔋🔋

Introduction

Dependencies 🛠️

Architecture 📐

Training Pipeline

Streaming Pipeline

Inference Pipeline

Upcoming 🔜

📫 Get in Touch

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages