Skip to content

phelixzhen/Algea-VE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Algea-VE: A Tiny Multimodal Language Model with Only 0.8B Parameters

中文

This repository is a modification of llava-phi for inference based on the mipha model

Installation

1. Create an environment and install the required packages

conda create -n algea-ve python==3.10 -y
conda activate algea-ve
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

2. Download the model weights

Download the algea-ve weights from huggingface

3. Use the command line for inference

python -m mipha.serve.cli \
--model-path /path/to/your/model \
--image-file "path/to/your/img" \
--conv-mode phi

Training

Algea-ve is trained on the LAION-CC-SBU dataset using algea-550M-base as the base model and fine-tuned on llava_v1_5_mix665k. It uses CLIP ViT-L/14-336 as the visual encoder. The model is very small, requiring only 32GB of VRAM for fine-tuning and 3GB for inference.

If you encounter difficulties using mipha to train your own model, this repository retains the code modifications made to the original mipha project during training, which may be helpful.

Due to insufficient training of the base model, the current model has some issues with hallucinations and repetition. To address this, I am training a new model that will maintain the same size but offer better performance. Please star and follow this project for updates.

Acknowledgements

llava-phi

llava

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published