Skip to content

A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany

Notifications You must be signed in to change notification settings

johko/awesome-german-open-source-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 

Repository files navigation

Banner Image

Awesome German Open Source Machine Learning

Awesome

This repo contains an overview of open source machine learning projects and companies providing these, which are based in Germany.

A criteria for getting listed here, is that the roots of the project are in Germany or at least a big part of the developers working on the project/for the company are located within Germany.

Contributions to this list are very welcome 🤗 Be it corrections, additions or suggestions - feel free to open an Issue or Pull Request.

Categories

Following are different categories of Machine Learning and the corresponding projects, including links to their social media representations

📖 Natural Language Processing (NLP)

The projects listed here all provide frameworks to perform Natural Language Processing tasks.

Name                 Description Links

Explosion
Best known for the spaCy library, one of the most popular Python packages for everything NLP. It pays off to check their profile for other great repos like spacy-llm and curated-transformers Explosion on Github Website of Explosion Explosion on Hugging Face Explosion on LinkedIn Explosion on X

deepset
Another young Berlin based company, best known for their LLM framework Haystack. Their first step into the limelight was by training a BERT based German language model deepset on Github Website of deepsetdeepset on Hugging Face deepset on LinkedIn deepset on X deepset community on discord

flair
Developed at the Humboldt University Berlin, flair is a simple and powerful framework for state-of-the-art NLP. flair on GitHub website of flair flair on Hugging Face

deepL
Based in Cologne, DeepL provides a great machine translation quality, especially for German, since 2017. With their open source library, their technology is easily integrated into every python project. DeepL on GitHub website of deepL DeepL on X DeepL on LinkedIn

small-text
Originating from a research project at Leipzig University, small-text offers a modular and comprehensive Python library for building experiments and applications focused on active learning for text classification. small-text on GitHub website of small-text

👁️👁️ Computer Vision

Here you can find projects that mainly focus on solving Computer Vision problems, which includes tasks like image classification, object detection, object segmentation.

Name                  Description Links

Mobius Labs
A relatively small and unknown company, but their repos are definitely worth checking out - especially Half-Quadratic Quantization and Aana Mobius Labs on GitHub website of Mobius Labs Mobius Labs on Hugging Face Mobius Labs on LinkedIn Mobius Labs on X

⚗️ Generative AI

The projects here are focused on Generative AI tasks like LLMs, text-to-image, text-to-video, text-to-audio or similar. Some of the projects/companies listed here might not have popular repositories on GitHub, but instead are releasing ML models with freely accessible weights (mostly on Hugging Face).

Name                  Description Links

OpenGPT-X
This project is tightly connected to Occiglot and backed by some big companies and institutions (e.g. Fraunhofer, dfki, Ionos) and is dedicated to create multilingual LLMs with a focus on open source. OpenGPT-X on GitHub website of OpenGPT-X OpenGPT-X on Hugging Face OpenGPT-X on LinkedIn OpenGPT-X on X

Black Forest Labs
Just announced at the beginning of August '24, this company has already stirred up the AI community with their text-to-image model family called flux Black Forest Labs on GitHub website of Black Forest Labs Black Forest Labs on Hugging Face Black Forest Labs on LinkedIn Black Forest Labs on X

Vago Solutions
Vago Solutions mainly focuses on creating German LLMs (called SauerkrautLM) and have already made more than 20 of those LLMs accessible in their Hugging Face repository website of Vago Solutions Vago Solutions on Hugging Face Vago Solutions on LinkedIn Vago Solutions on X

💾 Data Collection and Preprocessing

Name                  Description Links

dltHub
dltHub are the creators of data load tool (dlt). While dlt might not strictly be a Machine Learning library, I still decided to include it here, as it eases the pain of data collection, which is an integral part of the ML lifecycle. dltHub on GitHub website of dltHub dltHub on LinkedIn dltHub on X dlt Slack server

Trafilatura
Originally released to collect data for linguistic research and lexicography at the Berlin-Brandenburg Academy of Sciences, Trafilatura is now widely used in AI, NLP and LLMs. Trafilatura on GitHub Documentation page of Trafilatura

🛠️ MLOps

Building, Training and Deploying Machine Learning models can be a real struggle in today's overflowing ML landscape. These projects are trying to take the biggest efforts and frustration out of the process.

Name                Description Links

ZenML
The company from Munich developed a framework to let you build, train and deploy ML pipelines in a simple and reproducible way. ZenML on GitHub website of ZenML ZenML on Hugging Face ZenML on LinkedIn ZenML on X ZenML Slack Server

dstack
And another MLOps centred company, originating from Munich. dstack specializes on making it easy to build, train and deploy your ML models on different cloud providers dstack on GitHub website of dstack dstack on Hugging Face dstack on LinkedIn dstack on X dstack discord server

Flower Labs
Flower Labs offer an open source framework for federated learning, which can be especially helpful when working with distributed and sensitive data. Flower Labs on GitHub website of Flower Labs Flower Labs on Hugging Face Flower Labs on LinkedIn Flower Labs on X

AIME
While the core business of AIME is about selling HPC Servers, workstations and GPU Cloud space, they have also open-sourced a series of projects for hosting and serving ML models, e.g. aime-ml-containers, aime-api-server AIME on GitHub website of AIME AIME on Hugging Face Flower Labs on LinkedIn Flower Labs on X

🔍 Search and Embed

These companies and projects mainly focus on Neural Search applications and connected topics like Multimodal embeddings.

Name Description Links

Jina AI
Jina AI has a big output of open source libraries for a lot of uses cases, but is best known for its library, simply called jina, that let's you build and deploy Multimodal ML applications. Jina AI on GitHub website of Jina AI Jina AI on Hugging Face Jina AI on LinkedIn Jina AI on X Jina AI discord server

Qdrant
Straight from the vibrant Berlin based start-up scene, Qdrant specializes on neural search applications and multimodal embeddings. They also have a lively discord community. Qdrant on GitHub website of Qdrant Qdrant on Hugging Face Qdrant on LinkedIn Qdrant on X Qdrant discord server

mixedbread.ai
Still very new to the scene, but they have already released an amazing Sentence Embedding model. mixedbread.ai on GitHub website of mixedbread.ai mixedbread.ai on Hugging Face mixedbread.ai on LinkedIn mixedbread.ai on X mixedbread.ai discord server

🤖 General Machine Learning

Here are all the projects that don't fit into one of the other categories (or in more than one).

Name Description Links

Superduper
Freshly rebranded (formerly SuperDuperDB), the team from Superduper aims to make every database and storage capable of AI, without needing specialized vector databases or the like. Superduper on GitHub website of Superduper Superduper on Hugging Face Superduper on LinkedIn Superduper on X Superduper Slack server

LAION e.V.
LAION is a non-profit organization with the aim to create free and open-source models and datasets. They have a big community and already released many interesting projects, like Open Assistant and CLAP. LAION on GitHub website of LAION LAION on Hugging Face LAION discord server

💡 Research Projects

Name                  Description Links

Occiglot
Occiglot is an collective of researchers, who want to develop open-source language models for and by Europe. Although not entirely rooted in Germany, it is heavily funded by German institutions and many active researchers are from Germany. Occiglot on GitHub website of Superduper Occiglot on Hugging Face Occiglot on X Occiglot discord server

And last but not least a little shout-out to Johannes Rieke and his great (albeit a little outdated) collection of Berlin based Machine Learning start-ups 😉

About

A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published