Releases: FlagOpen/FlagEmbedding
1.3.2
We have completely updated the BGE code repository, including the following key improvements:
Inference Code
- Added
FlagAutoModel
andFlagAutoReranker
, making it easier to utilize the models.
Inference Optimization
- Implemented multi-GPU support.
- Introduced dynamic batch sizing to prevent out-of-memory (OOM) issues.
- Optimized padding to improve efficiency.
Evaluation Code
- Integrated support for common evaluation datasets to enhance user convenience.
- Provided a custom evaluation interface, adhering to specified data organization standards, to simplify the evaluation process.
Project Structure Organization
- Reorganized the project to streamline processes related to inference, fine-tuning, and evaluation.
Release BGE-M3 and Activation Beacon
BGE-M3
A new member of the BGE model series! BGE-M3 stands for Multi-linguality, Multi-granularities (input length up to 8192), and Multi-Functionality (unification of dense, lexical, multi-vec retrieval). It is the first embedding model which supports all three retrieval methods.
For more details please refer to Technical Report and Code.
Activation Beacon
An effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by x100 times. We extend the context length of Llama-2-chat-7b from 4K to 400K.
For more details please refer to paper and code
Feedback is welcome
Release LM-Cocktail
LM-Cocktail
Merge language models (e.g., Llama, bge) to improve the general ability of models.
This method can be used to:
- Mitigate the Problem of Catastrophic Forgetting
- Improve the performance of new tasks without fine-tuning
- Approximate multitask learning or model ensemble
FlagEmbedding 1.1.2
Create the first release #131
FlagEmbedding
- Update Embedding Models
bge-*-v1.5
:- alleviate the issue of the similarity distribution
- the new models can do retrieval tasks without instruction, but still recommend using instruction which can have a better performance.
- New Models
bge-reranker-*
: cross-encoders that can rerank the top-k retrieved results - Specify using normalization in the configuration for sentence-transformers, thanks to skirres.
Now users have no need to setnormalize_embeddings=True
manually when using sentence-transformers.
C-MTEB
- Add two cross-lingual retrieval tasks: T2RerankingZh2En and T2RerankingEn2Zh.