Skip to content

Releases: FlagOpen/FlagEmbedding

1.3.2

31 Oct 16:23
d76e51c
Compare
Choose a tag to compare

We have completely updated the BGE code repository, including the following key improvements:

Inference Code

  • Added FlagAutoModel and FlagAutoReranker, making it easier to utilize the models.

Inference Optimization

  • Implemented multi-GPU support.
  • Introduced dynamic batch sizing to prevent out-of-memory (OOM) issues.
  • Optimized padding to improve efficiency.

Evaluation Code

  • Integrated support for common evaluation datasets to enhance user convenience.
  • Provided a custom evaluation interface, adhering to specified data organization standards, to simplify the evaluation process.

Project Structure Organization

  • Reorganized the project to streamline processes related to inference, fine-tuning, and evaluation.

Release BGE-M3 and Activation Beacon

02 Feb 05:57
Compare
Choose a tag to compare

BGE-M3

A new member of the BGE model series! BGE-M3 stands for Multi-linguality, Multi-granularities (input length up to 8192), and Multi-Functionality (unification of dense, lexical, multi-vec retrieval). It is the first embedding model which supports all three retrieval methods.

For more details please refer to Technical Report and Code.

Activation Beacon

An effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by x100 times. We extend the context length of Llama-2-chat-7b from 4K to 400K.

For more details please refer to paper and code

Feedback is welcome

Release LM-Cocktail

24 Nov 09:22
Compare
Choose a tag to compare

LM-Cocktail

Merge language models (e.g., Llama, bge) to improve the general ability of models.
This method can be used to:

  • Mitigate the Problem of Catastrophic Forgetting
  • Improve the performance of new tasks without fine-tuning
  • Approximate multitask learning or model ensemble

More details please refer to paper and code

FlagEmbedding 1.1.2

28 Sep 07:43
Compare
Choose a tag to compare

Create the first release #131

FlagEmbedding

  • Update Embedding Models bge-*-v1.5:
    • alleviate the issue of the similarity distribution
    • the new models can do retrieval tasks without instruction, but still recommend using instruction which can have a better performance.
  • New Models bge-reranker-*: cross-encoders that can rerank the top-k retrieved results
  • Specify using normalization in the configuration for sentence-transformers, thanks to skirres.
    Now users have no need to set normalize_embeddings=True manually when using sentence-transformers.

C-MTEB

  • Add two cross-lingual retrieval tasks: T2RerankingZh2En and T2RerankingEn2Zh.