31 Oct 16:23

545999961

d76e51c

1.3.2 Latest

Latest

We have completely updated the BGE code repository, including the following key improvements:

Inference Code

Added FlagAutoModel and FlagAutoReranker, making it easier to utilize the models.

Inference Optimization

Implemented multi-GPU support.
Introduced dynamic batch sizing to prevent out-of-memory (OOM) issues.
Optimized padding to improve efficiency.

Evaluation Code

Integrated support for common evaluation datasets to enhance user convenience.
Provided a custom evaluation interface, adhering to specified data organization standards, to simplify the evaluation process.

Project Structure Organization

Reorganized the project to streamline processes related to inference, fine-tuning, and evaluation.

Assets 2

02 Feb 05:57

staoxiao

BGE-M3&Beacon

7a5eed7

Release BGE-M3 and Activation Beacon

BGE-M3

A new member of the BGE model series! BGE-M3 stands for Multi-linguality, Multi-granularities (input length up to 8192), and Multi-Functionality (unification of dense, lexical, multi-vec retrieval). It is the first embedding model which supports all three retrieval methods.

For more details please refer to Technical Report and Code.

Activation Beacon

An effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by x100 times. We extend the context length of Llama-2-chat-7b from 4K to 400K.

For more details please refer to paper and code

Feedback is welcome

Assets 2

24 Nov 09:22

staoxiao

lm-cocktail

61923ac

Release LM-Cocktail

LM-Cocktail

Merge language models (e.g., Llama, bge) to improve the general ability of models.
This method can be used to:

Mitigate the Problem of Catastrophic Forgetting
Improve the performance of new tasks without fine-tuning
Approximate multitask learning or model ensemble

More details please refer to paper and code

Assets 2

28 Sep 07:43

staoxiao

1.1

5277f8b

FlagEmbedding 1.1.2

Create the first release #131

FlagEmbedding

Update Embedding Models bge-*-v1.5:
- alleviate the issue of the similarity distribution
- the new models can do retrieval tasks without instruction, but still recommend using instruction which can have a better performance.
New Models bge-reranker-*: cross-encoders that can rerank the top-k retrieved results
Specify using normalization in the configuration for sentence-transformers, thanks to skirres.
Now users have no need to set normalize_embeddings=True manually when using sentence-transformers.

C-MTEB

Add two cross-lingual retrieval tasks: T2RerankingZh2En and T2RerankingEn2Zh.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Code

Inference Optimization

Evaluation Code

Project Structure Organization

BGE-M3

Activation Beacon

Feedback is welcome

LM-Cocktail

FlagEmbedding

C-MTEB

Releases: FlagOpen/FlagEmbedding

1.3.2

Inference Code

Inference Optimization

Evaluation Code

Project Structure Organization

Release BGE-M3 and Activation Beacon

BGE-M3

Activation Beacon

Feedback is welcome

Release LM-Cocktail

LM-Cocktail

FlagEmbedding 1.1.2

FlagEmbedding

C-MTEB