Develop data transformation pipeline for optimal RAG and fine-tuning performance #24

branhoff · 2024-02-01T05:35:46Z

Description

In this paper RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
, a rough structure is outlined for transforming data into a usable state for RAG and fine-tuning enhancements of an LLM.

This ticket seeks to develop an initial process for transforming data that we scrape (car diagnostic manuals for instance) into a Q&A format.

Data should be in jsonl formats
Data should be structured as Q&A's
Q&A's should be reviewed and filtered by LLM's according to the criteria laid out in the paper
The implementation should be generic enough that this process should be easily repeatable.

The text was updated successfully, but these errors were encountered:

branhoff added the enhancement New feature or request label Feb 1, 2024