Algea-VE: A Tiny Multimodal Language Model with Only 0.8B Parameters

This repository is a modification of llava-phi for inference based on the mipha model

Installation

1. Create an environment and install the required packages

conda create -n algea-ve python==3.10 -y
conda activate algea-ve
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

2. Download the model weights

Download the algea-ve weights from huggingface

3. Use the command line for inference

python -m mipha.serve.cli \
--model-path /path/to/your/model \
--image-file "path/to/your/img" \
--conv-mode phi

Training

Algea-ve is trained on the LAION-CC-SBU dataset using algea-550M-base as the base model and fine-tuned on llava_v1_5_mix665k. It uses CLIP ViT-L/14-336 as the visual encoder. The model is very small, requiring only 32GB of VRAM for fine-tuning and 3GB for inference.

If you encounter difficulties using mipha to train your own model, this repository retains the code modifications made to the original mipha project during training, which may be helpful.

Due to insufficient training of the base model, the current model has some issues with hallucinations and repetition. To address this, I am training a new model that will maintain the same size but offer better performance. Please star and follow this project for updates.

Acknowledgements

llava-phi

llava

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
images		images
llava_phi		llava_phi
mipha		mipha
scripts		scripts
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Algea-VE: A Tiny Multimodal Language Model with Only 0.8B Parameters

This repository is a modification of llava-phi for inference based on the mipha model

Installation

1. Create an environment and install the required packages

2. Download the model weights

3. Use the command line for inference

Training

Acknowledgements

About

Releases

Packages

Languages

phelixzhen/Algea-VE

Folders and files

Latest commit

History

Repository files navigation

Algea-VE: A Tiny Multimodal Language Model with Only 0.8B Parameters

This repository is a modification of llava-phi for inference based on the mipha model

Installation

1. Create an environment and install the required packages

2. Download the model weights

3. Use the command line for inference

Training

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages