Skip to content

InfiniteAICreations/awesome-llm-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Our goal is to establish and cultivate a comprehensive collection of projects, demonstrating the remarkable versatility and potential of llm applications.

Projects index:

Projects

‼️Attention: If the project name starts with *, it means the project is neither open source nor has it released any applications yet.

🦄 LLMs

  • Command-R: Command-R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.
  • Grok-1: Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.
  • Mistral: Mistral AI releases Open Source LLMs, including Mistral 7B, Mistral 8x7B and Codestral.
  • DBRX: DBRX is an open, general-purpose LLM created by Databricks.
  • mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.
  • OpenChat: Advancing Open-source Language Models with Imperfect Data
  • WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions
  • CodeGemma-7b: An official Google release for code LLMs.
  • Awesome-Chinese-LLM: Includes many Open Source Chinese LLMs.
  • llama3: Meta newly released LLMs.
  • Snowflake Arctic: Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. Taking an average of Coding (HumanEval+ and MBPP+), SQL Generation (Spider), and Instruction following (IFEval).
  • DeepSeek-V2-Chat: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
  • Qwen 1.8B,7B,14B,72B: Chat & pretrained large language model proposed by Alibaba Cloud.
  • Granite Code Models 3b,8b,20b,34b: Granite Code Models, IBM's open-source code models: A Family of Open Foundation Models for Code Intelligence
  • Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
  • MiniCPM-V 2.0: An Efficient End-side MLLM with Strong OCR and Understanding Capabilities
  • Stable Audio Open 1.0: Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts.
  • Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B: Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
  • GLM-4-9B: GLM-4 series: Open Multilingual Multimodal Chat LMs
  • AutoCoder: A new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
  • Nemotron 4 340B: The Nvidia's Open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models.
  • Fish Speech V1.2: Fish Speech V1.2 is a leading text-to-speech (TTS) model trained on 300k hours of English, Chinese, and Japanese audio data.
  • Phi-3 family: Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths.
  • Gemma 2: Gemma 2 offers best-in-class performance, runs at incredible speed across different hardware and easily integrates with other AI tools.

🏆 Benchmarks Leaderboard

💬 ChatBot

  • ChatGPT: ChatGPT is a free-to-use AI system. Use it for engaging conversations, gain insights, automate tasks, and witness the future of AI, all in one place.
  • Gemini: Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.
  • character.ai: Where intelligent agents live!
  • Claude: Talk with Claude, an AI assistant from Anthropic.
  • Mistral AI: Mistral makes frontier AI ubiquitous, and to provide tailor-made AI to all the builders.

🗣️ Voice

Including text to speech, speech to text, speech to speech, generate voice:

  • *Vall-E: A neural codec language model for speech synthesis.
  • ElevenLabs: AI Voice Generator & Text to Speech
  • Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
  • Krisp: Krisp cancels background noise and reduces echo during your calls.
  • Voicemod: Voicemod is a free real-time voice changer and soundboard available on both Windows and macOS.
  • *NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
  • VoiceCraft: VoiceCraft is Zero-Shot Speech Editing and Text-to-Speech in the Wild.
  • Parler-TTS: Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).
  • Sounds: Sounds for creators, game developers, artists, video makers. Experience the best AI Sound FX generator
  • VIVA: VIVA is the AI powerd creative visual design platform
  • ChatTTS: ChatTTS is a generative speech model for daily dialogue.
  • StreamSpeech: StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
  • Dream Machine: Dream Machine is an AI model that makes high quality, realistic videos fast from text and images.
  • CosyVoice: Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

🎵 Music

  • Suno: Suno is an innovative tool designed for music creation, leveraging artificial intelligence to transform text input into original songs
  • Udio: Make your music. Discover, create, and share music with the world.
  • Haimian Music: An AI-generated music product by ByteDance, delivers superior vocal quality in both Chinese and English.
  • Jamboss: Jamboss is a super simple AI music generator app that empowers you to turn your ideas and lyrics into amazing full-length songs.

🌄 Image

Including text to image, image to image, and animate:

  • DALL-E: Creating images from text.
  • Stable Diffusion: Stable Diffusion is a deep learning, text-to-image model.
  • Midjourney: Midjourney is a generative artificial intelligence program and service that creates images from natural language descriptions, similar to other AI technologies like OpenAI's DALL-E and Stability AI's Stable Diffusion.
  • StickerBaker: StickerBaker is an open-source tool that allows users to create stickers using AI technology.
  • *PIXART-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation.
  • ResAdapter: ResAdapter is a plug-and-play resolution adapter for enabling diffusion models of arbitrary style domains to generate resolution-free images: no additional training, no additional inference and no style transfer.
  • FaceChain: FaceChain is a deep-learning toolchain for generating your Digital-Twin.
  • APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)
  • OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models: OMG is a framework for multi-concept image generation
  • BasicPBC: Learning Inclusion Matching for Animation Paint Bucket Colorization.
  • DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing.
  • VAR: a new visual generation method elevates GPT-style models beyond diffusion & Scaling laws observed.
  • Ideogram: Ideogram is a free-to-use AI tool that generates realistic images, posters, logos and more.
  • MagicClothing: Focus on controllable garment-driven image synthesis.
  • *IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination.
  • HeyBeauty: Discover Beauty with AI, Make Fashion redefined.
  • IC-Light: IC-Light is a project to manipulate the illumination of images.
  • Logo Diffusion: Create Logos in Seconds With Generative A.I.
  • MistoLine: A Versatile and Robust SDXL-ControlNet Model for Adaptable Line Art Conditioning
  • InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
  • Omost: Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability.
  • ToonCrafter: ToonCrafter can interpolate two cartoon images by leveraging the pre-trained image-to-video diffusion priors.
  • Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
  • UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation.
  • Krea: Generate and enhance images and videos using powerful AI for free.
  • Leonardo AI: Leonardo AI is a generative AI tool that lets you craft top-tier visual assets for your.
  • MimicBrush: Zero-shot Image Editing with Reference Imitation
  • SketchDeco: Decorating B&W Sketches with Colour.
  • Tensor.Art: AI model sharing platform, online run models to generate image and traning model for free.
  • AutoStudio: AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
  • LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
  • IMAGDressing: Interactive Modular Apparel Generation for Virtual Dressing
  • PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings

🧸 3D Model

Including text to 3D model:

  • TripoSR: TripoSR is a fast and feed-forward 3D generative model developed in collaboration between Stability AI and Tripo AI.
  • PantoMatrix: PantoMatrix: Talking Face and Body Animation Generation
  • Gaussian Head Avatar:Ultra High-fidelity Head Avatar via Dynamic Gaussians.
  • *Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text.
  • *CAT3D: CAT3D: Create Anything in 3D with Multi-View Diffusion Models
  • DiffTF: Large-Vocabulary 3D Diffusion Model with Transformer
  • DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models
  • Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image.
  • Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention.
  • *OccFusion: Rendering Occluded Humans with Generative Diffusion Priors
  • AIUNI: AI Generate Unique Assets , Avatar, Animation.
  • MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

🎥 Video

Including text to video, image to video, video to video:

  • *Sora: Creating video from text. Sora is an AI model that can create realistic and imaginative scenes from text instructions.
  • *Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
  • Runway: Runway is an applied AI research company shaping the next era of art, entertainment and human creativity.
  • HeyGen: HeyGen is an innovative video platform that harnesses the power of generative AI to streamline your video creation process.
  • AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations
  • MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising.
  • CameraCtrl: Enabling Camera Control for Text-to-Video Generation.
  • Pika: Pika is the idea-to-video platform that sets your creativity in motion.
  • *VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time.
  • OpenVoice: Instant voice cloning by MyShell.
  • Veo: Veo is Google most capable video generation model to date.
  • AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
  • Pandora: Towards General World Model with Natural Language Actions and Video States
  • EasyAnimate: An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion.
  • V-Express: V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
  • MusePose: A Pose-Driven Image-to-Video Framework for Virtual Human Generation
  • Hedra: Hedra is a video content generation platform and social media platform that allows individuals to edit, export and share AI-generated videos and video components.
  • MASA: Matching Anything by Segmenting Anything
  • MotionClone: Training-Free Motion Cloning for Controllable Video Generation
  • MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
  • Video-Infinity: Video-Infinity generates long videos quickly using multiple GPUs without extra training.
  • DiffSynth Studio: DiffSynth Studio is a Diffusion engine.
  • SAM 2: Segment Anything Model 2 (SAM 2) is a foundation model towards solving promptable visual segmentation in images and videos.

🕸️ Search Engine

Including search engine, web browser:

  • Phind: web browser, to generate answers based on web search results and LLMs, also to provide customizable functionality for adjusting the weighting of search result sources
  • Devv: The next generation AI search engine for developers. Solve your programming problems in seconds.
  • Perplexity: Perplexity AI unlocks the power of knowledge with information discovery and sharing.
  • Arc: Effortlessly organize everything you do online — work, study, hobbies — all in one window with Spaces and Profiles.
  • Perplexica: Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
  • Reor: Private & offline AI personal knowledge management app.

👩🏽‍💻 Develop Assistant

  • GitHub Copilot: Get AI-based suggestions in real time.
  • Codeium: Codeium offers best in class AI code completion, search, and chat — all for free. It supports over 70+ languages and integrates with your favorite IDEs, with lightning fast speeds and state-of-the-art suggestion quality.
  • Amazon CodeWhisperer: Amazon CodeWhisperer is an AI-powered productivity tool for the IDE and command line that generates code suggestions based on comments and existing code.
  • Transformer Debugger: Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders.
  • CopilotKit: A framework for building custom AI Copilots 🤖 in-app AI chatbots, in-app AI Agents, & AI-powered Textareas.
  • Codium: CodiumAI’s first tool is an IDE extension that interacts with the developer to generate meaningful tests and code explanations for busy devs.
  • Tabby: Self-hosted AI coding assistant
  • CodeRabbit: CodeRabbit is an innovative AI code review platform that streamlines and enhances the development process.
  • Cursor: The AI Code Editor.
  • Melty: Melty is the first AI code editor that's aware of what you're doing from the terminal to GitHub, and collaborates with you to write production-ready code.

🧠 AI Agent

  • AgentGPT: Assemble, configure, and deploy autonomous AI Agents in your browser.
  • *Devin: Introducing Devin, the first AI software engineer and setting a new state of the art on the SWE-bench coding benchmark.
  • OpenDevin: An autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects.
  • Plandex: An AI coding engine for complex tasks.
  • Devika: an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective.
  • Aider: Aider is AI pair programming in your terminal.
  • Agent Protocol: A single common interface for communicating with agents
  • Devon: An open-source pair programmer
  • PR-Agent: CodiumAI PR-Agent: An AI-Powered 🤖 Tool for Automated Pull Request Analysis, Feedback, Suggestions and More!
  • FinRobot: An Open-Source AI Agent Platform for Financial Applications using LLMs
  • AgentQL: Build AI agents using a query language for precise web and app automation
  • Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
  • Translation Agent: Agentic translation using reflection workflow
  • DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement

🤼 Multi-Agent Collaboration

  • MetaGPT: MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc.
  • ChatDev: The primary objective of ChatDev is to offer an easy-to-use, highly customizable and extendable framework, which is based on large language models (LLMs) and serves as an ideal scenario for studying collective intelligence.
  • TransAgents: Multi-Agent for Translating Ultra-Long Literary Texts

💻 Terminal

  • Warp: Warp is a tool designed to enhance the terminal experience by providing AI-powered assistance for command lookups and allow users to input their objectives in plain English
  • Gorilla: Gorilla CLI powers your command-line interactions with a user-centric tool.
  • CodeWhisperer Cli: CodeWhisperer for command line adds IDE-style completions for hundreds of popular CLIs like as Git, npm, Docker, MongoDB Atlas, and the AWS CLI. Previously known as fig.
  • Open Interpreter: A natural language interface for computers.

📰 Web Sites

  • Dora: Design and publish stunning 3D & animated websites effortlessly, without the need for coding.
  • Design2Code: How Far Are We From Automating Front-End Engineering
  • Tempo: Tempo generates and edits high-quality react code directly in your codebase so you can ship UIs in minutes.
  • OpenUI: OpenUI let's you describe UI using your imagination, then see it rendered live.
  • v0: Generate UI with shadcn/ui from simple text prompts and images.

🗜️ Hardware

  • Groq: Groq is on a mission to set the standard for GenAI inference speed, helping real-time AI applications come to life today.
  • *LOOI Root: Turn Your Smartphone into a Desktop Robot
  • Friend: Open-Source AI Wearable with 24h+ on single charge
  • insight: A raspberry pi lay around and built an AI wearable called insight.
  • Limitless: Personalized AI powered by what you’ve seen, said, and heard.
  • Frame AI glasses: Open-source eyewear.
  • Rabbit R1: Your pocket companion.
  • *Haptic Source-effector: Full-body Haptics via Non-invasive Brain Stimulation
  • OpenGlass: Turn any glasses into AI-powered smart glasses
  • Octo: Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
  • HumanPlus: Humanoid Shadowing and Imitation from Humans
  • LeRobot: LeRobot: End-to-end Learning for Real-World Robotics in Pytorch
  • Ray-Ban Meta Smart Glasses: The Ray-Ban Meta collection combines the latest in wearable tech with authentic Ray-Ban design, to keep you connected wherever you go.
  • Solos AirGo Vision: Audio Smartglasses powered by ChatGPT

⌨️ Prompt Engineering

  • Prompt-Engineering-Guide: Guides, papers, lecture, notebooks and resources for prompt engineering.
  • Prompt Library: The Dr. Ethan Mollick and Dr. Lilach Mollick of Wharton School of the University of Pennsylvania Prompt Library.

🤯 LLMs Inference and Serving

  • vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
  • Text Generation Inference: Large Language Model Text Generation Inference
  • Ollama: Get up and running with large language models locally.
  • LM Studio: Discover, download, and run local LLMs.

📋 Others

  • Cradle: The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
  • LLMPerf: A Tool for evaulation the performance of LLM APIs. Also provide a Leaderboard for LLMs.
  • WebLINX: Real-world website navigation with multi-turn dialogue.
  • Latent Box: A collection of awesome-lists for AI, creativity and art.
  • LLM Transparency Tool: LLM Transparency Tool (LLM-TT), an open-source interactive toolkit for analyzing internal workings of Transformer-based language models.
  • LLM Visualization: A visualization and walkthrough of the LLM algorithm that backs OpenAI's ChatGPT. Explore the algorithm down to every add & multiply, seeing the whole process in action.
  • HippoRAG: HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents.
  • Vanna: Vanna is an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality.
  • Rewind: Rewind is a personalized AI powered by everything you’ve seen, said, or heard. Your colleagues will wonder how you do it all.
  • Cursor: The AI Code Editor.
  • Wordware: A web-hosted IDE where non-technical domain experts work with AI Engineers to build task-specific AI agents. It approaches prompting as a new programming language rather than low/no-code blocks.
  • Raycast: Raycast is a blazingly fast, totally extendable launcher. It lets you complete tasks, calculate, share common links, and much more.
  • Gamma: A new medium for presenting ideas, powered by AI. Create beautiful, engaging content with none of the formatting and design work.
  • Deep-tempest: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations
  • Great Tables: Make awesome display tables using Python.
  • ComfyUI: The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
  • Gauth: Your AI Homework Helper.