Skip to content

3-Pipeline LLMOps Financial advisor. Steaming pipeline deployed on AWS, 24/7 collects, embeds live-data into QdrantDB. Training pipeline finetunes model on serverless GPU and logs best model on WandB Registry.Inference pipeline downloads best model from registry for inference, utilizes Langchain to maintain history and context retrieval.

License

Notifications You must be signed in to change notification settings

MuhammadBinUsman03/Real-Time-3-pipeline-LLM-Financial-Advisor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time-3-pipeline-LLM-Financial-Advisor 🔋🔋🔋

Introduction

A Production-Ready LLMOps system based on live financial data and consists of multiple MLOps and RAG pipelines. Following the 3-pipeline architecture, it consists of:

  • Training Pipeline : Loads a pretrained model on a curated dataset (synthetic data generation pipeline to be added soon) and finetunes it on serverless GPU provider, uses an experiment tracker to log training curves and checkpoints to the model registry.
  • Streaming Pipeline : Collects data from a live source API in batches, processes it and populates a vectorDB with the contextual data. The streaming pipeline then can be deployed to any virtual machine provider.
  • Inference Pipeline : Downloads the best model from registry , creates a prompt template from user question, chat-history and vectorDB context, feeds it into the model using a RAG framework and logs the prompt/response pair on the experiment tracker. A ReSTful endpoint is deployed on the serverless GPU provider for the inference pipeline.

Dependencies 🛠️

  • HuggingFace-TRL for QLoRA SFT training.
  • WandB for experiment tracking and model registry.
  • Beam for serverless GPU compute.
  • Alpaca API for historical and real-time access to equities, stocks, and crypto data.
  • ByteWax for document processing and embeddings.
  • Qdrant Cloud for storing the embeddings in the cloud vectorDB.
  • AWS for deploying the streaming pieline on EC2, and storing the container image in ECR.
  • LangChain for creating sequential context extracting and response generation chains.

Architecture 📐

Architecture

Training Pipeline

Setup instructions given in pipelines/training_pipeline.

The dataset is uploaded to beam volume and the training script runs on 1xA10Gi to finetune [NousResearch/Nous-Hermes-2-Mistral-7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO). Beam-Train

The training curves are logged to WandB WandB-Logs

Best Model is stored in Model Registry via a callback at the end of training Loop. Model Registry

Streaming Pipeline

Setup instructions given in pipelines/streaming_pipeline.

The Alpaca API provides 24/7 data access, which is processed and embedded with bytewax and then dumped into Qdrant Cloud DB in batches. This RAG pipeline is then Dockerized and then deployed to AWS EC2 via Github Actions CI/CD Pipeline. See cd_streaming_pipeline.yaml for more. CI/CD

Inference Pipeline

Setup instructions given in pipelines/inference_pipeline. The Langchain chains for context retrieval and response generation is deployed on Beam serverless as a ReSTful API infer Then the model is prompted via a CuRL request: prompt

Upcoming 🔜

The Synthetic Data generation pipeline (via Distilabel) for training the model will be uploaded soon!

📫 Get in Touch

LinkedIn Hugging Face Medium X Substack

About

3-Pipeline LLMOps Financial advisor. Steaming pipeline deployed on AWS, 24/7 collects, embeds live-data into QdrantDB. Training pipeline finetunes model on serverless GPU and logs best model on WandB Registry.Inference pipeline downloads best model from registry for inference, utilizes Langchain to maintain history and context retrieval.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published