🤖 Our goal is to establish and cultivate a comprehensive collection of projects, demonstrating the remarkable versatility and potential of llm applications.
- 🦄LLMs
- 🏆 Benchmarks Leaderboard
- 💬ChatBot
- 🗣️Voice
- 🎵Music
- 🌄Image
- 🧸3D Model
- 🎥Video
- 🕸️Search Engine
- 👩🏽💻Develop Assistant
- 🧠AI Agent
- 🤼Multi-Agent Collaboration
- 💻Terminal
- 📰Web Sites
- 🗜️Hardware
- ⌨️Prompt Engineering
- 🤯LLMs Inference And Serving
- 📋Others
*
, it means the project is neither open source nor has it released any applications yet.
- Command-R: Command-R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.
- Grok-1: Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.
- Mistral: Mistral AI releases Open Source LLMs, including Mistral 7B, Mistral 8x7B and Codestral.
- DBRX: DBRX is an open, general-purpose LLM created by Databricks.
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.
- OpenChat: Advancing Open-source Language Models with Imperfect Data
- WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions
- CodeGemma-7b: An official Google release for code LLMs.
- Awesome-Chinese-LLM: Includes many Open Source Chinese LLMs.
- llama3: Meta newly released LLMs.
- Snowflake Arctic: Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. Taking an average of Coding (HumanEval+ and MBPP+), SQL Generation (Spider), and Instruction following (IFEval).
- DeepSeek-V2-Chat: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- Qwen 1.8B,7B,14B,72B: Chat & pretrained large language model proposed by Alibaba Cloud.
- Granite Code Models 3b,8b,20b,34b: Granite Code Models, IBM's open-source code models: A Family of Open Foundation Models for Code Intelligence
- Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
- MiniCPM-V 2.0: An Efficient End-side MLLM with Strong OCR and Understanding Capabilities
- Stable Audio Open 1.0: Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts.
- Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B: Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
- GLM-4-9B: GLM-4 series: Open Multilingual Multimodal Chat LMs
- AutoCoder: A new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
- Nemotron 4 340B: The Nvidia's Open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models.
- Fish Speech V1.2: Fish Speech V1.2 is a leading text-to-speech (TTS) model trained on 300k hours of English, Chinese, and Japanese audio data.
- Phi-3 family: Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths.
- Gemma 2: Gemma 2 offers best-in-class performance, runs at incredible speed across different hardware and easily integrates with other AI tools.
- open_llm_leaderboard: This is the hub organisation(HuggingFace) maintaining the Open LLM Leaderboard.
- LMSys Chatbot Arena Leaderboard: A crowdsourced, randomized battle platform. Use user votes to compute Elo ratings.
- META Leaderboard: Massive Text Embedding Benchmark (MTEB) Leaderboard.
- LLM-Perf Leaderboard: Aim to benchmark the performance (latency, throughput & memory) of LLMs with different hardwares, backends and optimizations using Optimum-Benchmark and Optimum flavors.
- Big Code Models Leaderboard: Compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E.
- Open ASR Leaderboard: Rank and evaluate speech recognition models on the Hugging Face Hub.
- Toolbench Leaderboard: An evaluation for LLM tool manipulation capabilities.
- OpenCompass 2.0 LLM Leaderboard: Provides comprehensive, objective, and neutral scores and rankings for top-tier large language models and multimodal models.
- Open Ko-LLM Leaderboard: Evaluates the performance of Korean Large Language Model (LLM).
- ChatGPT: ChatGPT is a free-to-use AI system. Use it for engaging conversations, gain insights, automate tasks, and witness the future of AI, all in one place.
- Gemini: Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.
- character.ai: Where intelligent agents live!
- Claude: Talk with Claude, an AI assistant from Anthropic.
- Mistral AI: Mistral makes frontier AI ubiquitous, and to provide tailor-made AI to all the builders.
Including text to speech, speech to text, speech to speech, generate voice:
- *Vall-E: A neural codec language model for speech synthesis.
- ElevenLabs: AI Voice Generator & Text to Speech
- Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- Krisp: Krisp cancels background noise and reduces echo during your calls.
- Voicemod: Voicemod is a free real-time voice changer and soundboard available on both Windows and macOS.
- *NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
- VoiceCraft: VoiceCraft is Zero-Shot Speech Editing and Text-to-Speech in the Wild.
- Parler-TTS: Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).
- Sounds: Sounds for creators, game developers, artists, video makers. Experience the best AI Sound FX generator
- VIVA: VIVA is the AI powerd creative visual design platform
- ChatTTS: ChatTTS is a generative speech model for daily dialogue.
- StreamSpeech: StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
- Dream Machine: Dream Machine is an AI model that makes high quality, realistic videos fast from text and images.
- CosyVoice: Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
- Suno: Suno is an innovative tool designed for music creation, leveraging artificial intelligence to transform text input into original songs
- Udio: Make your music. Discover, create, and share music with the world.
- Haimian Music: An AI-generated music product by ByteDance, delivers superior vocal quality in both Chinese and English.
- Jamboss: Jamboss is a super simple AI music generator app that empowers you to turn your ideas and lyrics into amazing full-length songs.
Including text to image, image to image, and animate:
- DALL-E: Creating images from text.
- Stable Diffusion: Stable Diffusion is a deep learning, text-to-image model.
- Midjourney: Midjourney is a generative artificial intelligence program and service that creates images from natural language descriptions, similar to other AI technologies like OpenAI's DALL-E and Stability AI's Stable Diffusion.
- StickerBaker: StickerBaker is an open-source tool that allows users to create stickers using AI technology.
- *PIXART-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation.
- ResAdapter: ResAdapter is a plug-and-play resolution adapter for enabling diffusion models of arbitrary style domains to generate resolution-free images: no additional training, no additional inference and no style transfer.
- FaceChain: FaceChain is a deep-learning toolchain for generating your Digital-Twin.
- APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)
- OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models: OMG is a framework for multi-concept image generation
- BasicPBC: Learning Inclusion Matching for Animation Paint Bucket Colorization.
- DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing.
- VAR: a new visual generation method elevates GPT-style models beyond diffusion & Scaling laws observed.
- Ideogram: Ideogram is a free-to-use AI tool that generates realistic images, posters, logos and more.
- MagicClothing: Focus on controllable garment-driven image synthesis.
- *IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination.
- HeyBeauty: Discover Beauty with AI, Make Fashion redefined.
- IC-Light: IC-Light is a project to manipulate the illumination of images.
- Logo Diffusion: Create Logos in Seconds With Generative A.I.
- MistoLine: A Versatile and Robust SDXL-ControlNet Model for Adaptable Line Art Conditioning
- InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
- Omost: Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability.
- ToonCrafter: ToonCrafter can interpolate two cartoon images by leveraging the pre-trained image-to-video diffusion priors.
- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
- UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation.
- Krea: Generate and enhance images and videos using powerful AI for free.
- Leonardo AI: Leonardo AI is a generative AI tool that lets you craft top-tier visual assets for your.
- MimicBrush: Zero-shot Image Editing with Reference Imitation
- SketchDeco: Decorating B&W Sketches with Colour.
- Tensor.Art: AI model sharing platform, online run models to generate image and traning model for free.
- AutoStudio: AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
- LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
- IMAGDressing: Interactive Modular Apparel Generation for Virtual Dressing
- PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings
Including text to 3D model:
- TripoSR: TripoSR is a fast and feed-forward 3D generative model developed in collaboration between Stability AI and Tripo AI.
- PantoMatrix: PantoMatrix: Talking Face and Body Animation Generation
- Gaussian Head Avatar:Ultra High-fidelity Head Avatar via Dynamic Gaussians.
- *Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text.
- *CAT3D: CAT3D: Create Anything in 3D with Multi-View Diffusion Models
- DiffTF: Large-Vocabulary 3D Diffusion Model with Transformer
- DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models
- Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image.
- Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention.
- *OccFusion: Rendering Occluded Humans with Generative Diffusion Priors
- AIUNI: AI Generate Unique Assets , Avatar, Animation.
- MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model
Including text to video, image to video, video to video:
- *Sora: Creating video from text. Sora is an AI model that can create realistic and imaginative scenes from text instructions.
- *Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
- Runway: Runway is an applied AI research company shaping the next era of art, entertainment and human creativity.
- HeyGen: HeyGen is an innovative video platform that harnesses the power of generative AI to streamline your video creation process.
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations
- MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising.
- CameraCtrl: Enabling Camera Control for Text-to-Video Generation.
- Pika: Pika is the idea-to-video platform that sets your creativity in motion.
- *VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time.
- OpenVoice: Instant voice cloning by MyShell.
- Veo: Veo is Google most capable video generation model to date.
- AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
- Pandora: Towards General World Model with Natural Language Actions and Video States
- EasyAnimate: An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion.
- V-Express: V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
- MusePose: A Pose-Driven Image-to-Video Framework for Virtual Human Generation
- Hedra: Hedra is a video content generation platform and social media platform that allows individuals to edit, export and share AI-generated videos and video components.
- MASA: Matching Anything by Segmenting Anything
- MotionClone: Training-Free Motion Cloning for Controllable Video Generation
- MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
- Video-Infinity: Video-Infinity generates long videos quickly using multiple GPUs without extra training.
- DiffSynth Studio: DiffSynth Studio is a Diffusion engine.
- SAM 2: Segment Anything Model 2 (SAM 2) is a foundation model towards solving promptable visual segmentation in images and videos.
Including search engine, web browser:
- Phind: web browser, to generate answers based on web search results and LLMs, also to provide customizable functionality for adjusting the weighting of search result sources
- Devv: The next generation AI search engine for developers. Solve your programming problems in seconds.
- Perplexity: Perplexity AI unlocks the power of knowledge with information discovery and sharing.
- Arc: Effortlessly organize everything you do online — work, study, hobbies — all in one window with Spaces and Profiles.
- Perplexica: Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
- Reor: Private & offline AI personal knowledge management app.
- GitHub Copilot: Get AI-based suggestions in real time.
- Codeium: Codeium offers best in class AI code completion, search, and chat — all for free. It supports over 70+ languages and integrates with your favorite IDEs, with lightning fast speeds and state-of-the-art suggestion quality.
- Amazon CodeWhisperer: Amazon CodeWhisperer is an AI-powered productivity tool for the IDE and command line that generates code suggestions based on comments and existing code.
- Transformer Debugger: Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders.
- CopilotKit: A framework for building custom AI Copilots 🤖 in-app AI chatbots, in-app AI Agents, & AI-powered Textareas.
- Codium: CodiumAI’s first tool is an IDE extension that interacts with the developer to generate meaningful tests and code explanations for busy devs.
- Tabby: Self-hosted AI coding assistant
- CodeRabbit: CodeRabbit is an innovative AI code review platform that streamlines and enhances the development process.
- Cursor: The AI Code Editor.
- Melty: Melty is the first AI code editor that's aware of what you're doing from the terminal to GitHub, and collaborates with you to write production-ready code.
- AgentGPT: Assemble, configure, and deploy autonomous AI Agents in your browser.
- *Devin: Introducing Devin, the first AI software engineer and setting a new state of the art on the SWE-bench coding benchmark.
- OpenDevin: An autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects.
- Plandex: An AI coding engine for complex tasks.
- Devika: an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective.
- Aider: Aider is AI pair programming in your terminal.
- Agent Protocol: A single common interface for communicating with agents
- Devon: An open-source pair programmer
- PR-Agent: CodiumAI PR-Agent: An AI-Powered 🤖 Tool for Automated Pull Request Analysis, Feedback, Suggestions and More!
- FinRobot: An Open-Source AI Agent Platform for Financial Applications using LLMs
- AgentQL: Build AI agents using a query language for precise web and app automation
- Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
- Translation Agent: Agentic translation using reflection workflow
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement
- MetaGPT: MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc.
- ChatDev: The primary objective of ChatDev is to offer an easy-to-use, highly customizable and extendable framework, which is based on large language models (LLMs) and serves as an ideal scenario for studying collective intelligence.
- TransAgents: Multi-Agent for Translating Ultra-Long Literary Texts
- Warp: Warp is a tool designed to enhance the terminal experience by providing AI-powered assistance for command lookups and allow users to input their objectives in plain English
- Gorilla: Gorilla CLI powers your command-line interactions with a user-centric tool.
- CodeWhisperer Cli: CodeWhisperer for command line adds IDE-style completions for hundreds of popular CLIs like as Git, npm, Docker, MongoDB Atlas, and the AWS CLI. Previously known as fig.
- Open Interpreter: A natural language interface for computers.
- Dora: Design and publish stunning 3D & animated websites effortlessly, without the need for coding.
- Design2Code: How Far Are We From Automating Front-End Engineering
- Tempo: Tempo generates and edits high-quality react code directly in your codebase so you can ship UIs in minutes.
- OpenUI: OpenUI let's you describe UI using your imagination, then see it rendered live.
- v0: Generate UI with shadcn/ui from simple text prompts and images.
- Groq: Groq is on a mission to set the standard for GenAI inference speed, helping real-time AI applications come to life today.
- *LOOI Root: Turn Your Smartphone into a Desktop Robot
- Friend: Open-Source AI Wearable with 24h+ on single charge
- insight: A raspberry pi lay around and built an AI wearable called insight.
- Limitless: Personalized AI powered by what you’ve seen, said, and heard.
- Frame AI glasses: Open-source eyewear.
- Rabbit R1: Your pocket companion.
- *Haptic Source-effector: Full-body Haptics via Non-invasive Brain Stimulation
- OpenGlass: Turn any glasses into AI-powered smart glasses
- Octo: Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
- HumanPlus: Humanoid Shadowing and Imitation from Humans
- LeRobot: LeRobot: End-to-end Learning for Real-World Robotics in Pytorch
- Ray-Ban Meta Smart Glasses: The Ray-Ban Meta collection combines the latest in wearable tech with authentic Ray-Ban design, to keep you connected wherever you go.
- Solos AirGo Vision: Audio Smartglasses powered by ChatGPT
- Prompt-Engineering-Guide: Guides, papers, lecture, notebooks and resources for prompt engineering.
- Prompt Library: The Dr. Ethan Mollick and Dr. Lilach Mollick of Wharton School of the University of Pennsylvania Prompt Library.
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
- Text Generation Inference: Large Language Model Text Generation Inference
- Ollama: Get up and running with large language models locally.
- LM Studio: Discover, download, and run local LLMs.
- Cradle: The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
- LLMPerf: A Tool for evaulation the performance of LLM APIs. Also provide a Leaderboard for LLMs.
- WebLINX: Real-world website navigation with multi-turn dialogue.
- Latent Box: A collection of awesome-lists for AI, creativity and art.
- LLM Transparency Tool: LLM Transparency Tool (LLM-TT), an open-source interactive toolkit for analyzing internal workings of Transformer-based language models.
- LLM Visualization: A visualization and walkthrough of the LLM algorithm that backs OpenAI's ChatGPT. Explore the algorithm down to every add & multiply, seeing the whole process in action.
- HippoRAG: HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents.
- Vanna: Vanna is an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality.
- Rewind: Rewind is a personalized AI powered by everything you’ve seen, said, or heard. Your colleagues will wonder how you do it all.
- Cursor: The AI Code Editor.
- Wordware: A web-hosted IDE where non-technical domain experts work with AI Engineers to build task-specific AI agents. It approaches prompting as a new programming language rather than low/no-code blocks.
- Raycast: Raycast is a blazingly fast, totally extendable launcher. It lets you complete tasks, calculate, share common links, and much more.
- Gamma: A new medium for presenting ideas, powered by AI. Create beautiful, engaging content with none of the formatting and design work.
- Deep-tempest: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations
- Great Tables: Make awesome display tables using Python.
- ComfyUI: The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
- Gauth: Your AI Homework Helper.