diff --git a/resources/all_assets.csv b/resources/all_assets.csv index bbd9d698..8cf5fb80 100644 --- a/resources/all_assets.csv +++ b/resources/all_assets.csv @@ -1,620 +1,808 @@ -type,name,organization,description,created_date,url,datasheet,modality,size,sample,analysis,dependencies,included,excluded,quality_control,access,license,intended_uses,prohibited_uses,monitoring,feedback,model_card,training_emissions,training_time,training_hardware,adaptation,output_space,terms_of_service,monthly_active_users,user_distribution,failures -dataset,ToyMix,Mila-Quebec AI Institute,ToyMix is the smallest dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,2023-10-09,https://arxiv.org/pdf/2310.04292.pdf,,"molecules, tasks",13B labels of quantum and biological nature.,[],Models of size 150k parameters trained on ToyMix and compared to models trained on its dependencies across GNN baselines.,"['QM9', 'TOX21', 'ZINC12K']",,,,open,CC BY-NC-SA 4.0,"The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models. Further, the ToyMix dataset is intended to be used in a multi-task setting, meaning that a single model should be trained to predict them simultaneously.",,,,,,,,,,,,, -dataset,LargeMix,Mila-Quebec AI Institute,LargeMix is the middle-sized dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,2023-10-09,https://arxiv.org/pdf/2310.04292.pdf,,"molecules, tasks",13B labels of quantum and biological nature.,[],Models of size between 4M and 6M parameters trained for 200 epochs on LargeMix and compared to models trained on its dependencies across GNN baselines.,"['L1000 VCAP', 'L1000 MCF7', 'PCBA1328', 'PCQM4M_G25_N4']",,,,open,CC BY-NC-SA 4.0,"The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models. Further, the LargeMix dataset is intended to be used in a multi-task setting, meaning that a single model should be trained to predict them simultaneously.",,,,,,,,,,,,, -dataset,UltraLarge,Mila-Quebec AI Institute,UltraLarge is the largest dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,2023-10-09,https://arxiv.org/pdf/2310.04292.pdf,,"molecules, tasks",13B labels of quantum and biological nature.,[],Models of size between 4M and 6M parameters trained for 50 epochs on UltraLarge and compared to models trained on its dependencies across GNN baselines.,['PM6_83M'],,,,open,CC BY-NC-SA 4.0,The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models.,,,,,,,,,,,,, -model,Lag-LLaMA,"Morgan Stanley, ServiceNow Research, University of Montreal, Mila-Quebec AI Institute",Lag-LLaMA is a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates.,2024-02-08,https://time-series-foundation-models.github.io/lag-llama.pdf,,text; text,unknown,,Evaluated on previously unseen time series datasets.,[],,,,open,Apache 2.0,,,unknown,https://huggingface.co/time-series-foundation-models/Lag-Llama/discussions,https://huggingface.co/time-series-foundation-models/Lag-Llama,unknown,unknown,A single NVIDIA Tesla-P100 GPU,,,,,, -model,Prithvi,IBM,"Prithvi is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. The model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function.",2023-08-03,https://github.com/NASA-IMPACT/hls-foundation-os,,"text, video; text, video",100M parameters (dense),,,['NASA HLS data'],,,,open,Apache 2.0,,,,https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M/discussions,https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M,,,,,,,,, -application,Watsonx.ai,IBM,"Watsonx.ai is part of the IBM watsonx platform that brings together new generative AI capabilities, powered by foundation models and traditional machine learning into a powerful studio spanning the AI lifecycle.",2023-09-07,https://www.ibm.com/products/watsonx-ai,,,,,,['Granite'],,,,limited,custom,,,,,,,,,,deployed AI models,https://www.ibm.com/docs/en/watsonx-as-a-service?topic=models-terms-use,,, -model,Granite,IBM,Granite is a set of multi-size foundation models that apply generative AI to both language and code.,2023-09-28,https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models/,,"text; code, text",13B parameters (dense),,unknown,[],,,"Training data passed through IBM HAP detector, language model designed to remove harmful content. Data also deduplicated and filtered for document quality.",limited,,,,,,,unknown,unknown,unknown,,,,,, -model,Animagine XL 3.1,Cagliostro Research Lab,"An open-source, anime-themed text-to-image model enhanced to generate higher quality anime-style images with a broader range of characters from well-known anime series, an optimized dataset, and new aesthetic tags for better image creation.",2024-03-18,https://cagliostrolab.net/posts/animagine-xl-v31-release,,text; image,unknown,,unknown,['Animagine XL 3.0'],,,"The model undergoes pretraining, first stage finetuning, and second stage finetuning for refining and improving aspects such as hand and anatomy rendering.",open,Fair AI Public License 1.0-SD,"Generating high-quality anime images from textual prompts. Useful for anime fans, artists, and content creators.",Not suitable for creating realistic photos or for users who expect high-quality results from short or simple prompts.,unknown,https://huggingface.co/cagliostrolab/animagine-xl-3.1/discussions,https://huggingface.co/cagliostrolab/animagine-xl-3.1,unknown,"Approximately 15 days, totaling over 350 GPU hours.",2x A100 80GB GPUs,,,,,, -application,Portkey,Portkey,Portkey is a hosted middleware that allows users to create generative AI applications,2023-05-06,https://portkey.ai/,,,,,,[],,,,open,,,,,,,,,,,generative AI apps,https://portkey.ai/terms,,, -application,Viable,Viable,"Viable analyzes qualitative consumer feedback and provides summary feedback to companies. -",,https://www.askviable.com/,,,,,,['OpenAI API'],,,unknown,limited,unknown,"Intended to be used by companies to digest qualitative consumer feedback. -","Prohibited uses are listed in the Terms of Service [[Terms of Service]](https://www.askviable.com/terms-of-service). The terms don't include statements specific to the use of the content generated by the system or GPT-3. -",unknown,unknown,,,,,unknown,"Question and answer, summarization, sentiment analysis, topic identification",https://www.askviable.com/terms-of-service,unknown,unknown,unknown -application,Auto-GPT,Auto-GPT,Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model.,2023-04-16,https://news.agpt.co/,,,,,,['GPT-4 API'],,,,open,MIT,,,,,,,,,"GPT-4 adapted to run autonomously by chaining together LLM ""thoughts""",text,,,, -model,Bark,Suno,Bark is a text-to-audio model that can generate multilingual speech as well as other noises.,2023-04-20,https://github.com/suno-ai/bark,,text; audio,,,,['AudioLM'],,,,open,MIT,,,,https://huggingface.co/spaces/suno/bark/discussions,https://github.com/suno-ai/bark/blob/main/model-card.md,unknown,unknown,,,,,,, -application,ChatGPT powered by OBO,HubSpot,"Give your sales, marketing, and customer service teams one of the most powerful AI tools available - ChatGPT priority access, no timeout limits, company wide access managed through a single account, incorporate into your existing processes without leaving HubSpot",2023-01-31,https://ecosystem.hubspot.com/marketplace/apps/sales/sales-enablement/the-obo-group-chatgpt-1398072,,,,,,['ChatGPT API'],,,,limited,unknown,,,,,,,,,,,,,, -model,GPT-JT,Together,,2022-11-29,https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai,,text; text,6B parameters (dense),,,"['GPT-J', 'P3', 'NaturalInstructions-v2']",,,,open,Apache 2.0,,,,,,,,,,,,,, -model,GPT-NeoXT-Chat-Base,Together,,2023-03-10,https://www.together.xyz/blog/openchatkit,,text; text,20B parameters (dense),,,"['GPT-NeoX', 'OIG-43M']",,,,open,Apache 2.0,,,,,,,,,,,,,, -model,OpenChatKit moderation model,Together,,2023-03-10,https://www.together.xyz/blog/openchatkit,,text; text,6B parameters (dense),,,"['GPT-JT', 'OIG-moderation']",,,,open,Apache 2.0,,,,,,,,,,,,,, -dataset,OIG-43M,"Together, LAION, Ontocord",,2023-03-10,https://laion.ai/blog/oig-dataset/,,text,43M instructions,[],,"['P3', 'NaturalInstructions-v2', 'FLAN dataset']",,,,open,Apache 2.0,,,,,,,,,,,,,, -dataset,OIG-moderation,"Together, LAION, Ontocord",,2023-03-10,https://laion.ai/blog/oig-dataset/,,text,unknown,[],,[],,,,open,Apache 2.0,,,,,,,,,,,,,, -dataset,RedPajama-Data,Together,The RedPajama base dataset is a 1.2 trillion token fully-open dataset created by following the recipe described in the LLaMA paper,2022-04-17,https://www.together.xyz/blog/redpajama,https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T,text,1.2 trillion tokens,[],,"['GitHub', 'Wikipedia']",,,,open,Apache 2.0,,,,,,,,,,,,,, -model,Llama-2-7B-32K-Instruct,Together,"Llama-2-7B-32K-Instruct is an open-source, long-context chat model finetuned from Llama-2-7B-32K, over high-quality instruction and chat data.",2023-08-18,https://together.ai/blog/llama-2-7b-32k-instruct,,text; text,7B parameters (dense),,"Model evaluated over AlpacaEval, Rouge score over BookSum, and accuracy over MQA.","['BookSum dataset', 'MQA dataset', 'Together API', 'LLaMA 2']",,,,open,LLaMA 2,,,,https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct/discussions,https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct,,,,,,,,, -dataset,RedPajama-Data-v2,Together,"RedPajama-Data-v2 is a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting.",2023-10-30,https://together.ai/blog/redpajama-data-v2,,text,30 trillion tokens,[],,['Common Crawl'],"documents in English, German, French, Spanish, and Italian.",,tokens filtered and deduplicated,open,Apache 2.0,"To be used as the start of a larger, community-driven development of large-scale datasets for LLMs.",,,Feedback can be sent to Together via https://www.together.ai/contact,,,,,,,,,, -model,StripedHyena,Together,"StripedHyena is an LLM and the first alternative model competitive with the best open-source Transformers in short and long-context evaluations, according to Together.",2023-12-08,https://www.together.ai/blog/stripedhyena-7b,,text; text,7B parameters (dense),,Model evaluated on a suite of short-context task benchmarks.,"['Hyena', 'RedPajama-Data']",,,,open,Apache 2.0,,,,https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B/discussions,https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B,unknown,unknown,unknown,,,,,, -model,StripedHyena Nous,Together,"StripedHyena Nous is an LLM and chatbot, along with the first alternative model competitive with the best open-source Transformers in short and long-context evaluations, according to Together.",2023-12-08,https://www.together.ai/blog/stripedhyena-7b,,text; text,7B parameters (dense),,Model evaluated on a suite of short-context task benchmarks.,"['Hyena', 'RedPajama-Data']",,,,open,Apache 2.0,,,,https://huggingface.co/togethercomputer/StripedHyena-Nous-7B/discussions,https://huggingface.co/togethercomputer/StripedHyena-Nous-7B,unknown,unknown,unknown,,,,,, -model,MediTron,"EPFL, Idiap Research Institute, OpenAssistant, Yale",Meditron is a large-scale medical LLM that remains open-source.,2023-11-27,https://arxiv.org/pdf/2311.16079.pdf,,text; text,70B parameters (dense),,Evaluated on TruthfulQA as main evaluation benchmark.,[],,,,open,LLaMA 2,"Medical exam question answering, supporting differential diagnosis, disease information.",,,https://huggingface.co/epfl-llm/meditron-7b/discussions,https://huggingface.co/epfl-llm/meditron-70b,unknown,unknown,unknown,,,,,, -model,XVERSE,Xverse,XVERSE is a multilingual large language model for over 40 languages.,2023-11-06,https://github.com/xverse-ai/XVERSE-65B,,text; text,65B parameters (dense),,Evaluated across a range of standard datasets regarding multiple model capabilities like language comprehension and logical reasoning.,[],,,,open,custom,,,unknown,https://huggingface.co/xverse/XVERSE-65B/discussions,https://huggingface.co/xverse/XVERSE-65B,unknown,unknown,unknown,,,,,, -model,Otter,Nanyang Technological University,"Otter is a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind’s Flamingo), trained on MIMIC-IT and showcasing improved instruction-following ability and in-context learning.",2023-05-05,https://arxiv.org/pdf/2305.03726v1.pdf,,"image, text; text",1.3B parameters (dense),,Evaluated on researcher experiments to test deeper understanding and advanced commonsense reasoning,"['MIMIC-IT', 'OpenFlamingo']",,,,open,MIT,Following and executing new instructions with few in-context learning examples given image and textual input.,,,,https://github.com/Luodian/Otter/blob/main/docs/model_card.md,,,4 RTX-3090 GPUs,,,,,, -dataset,EXMODD,Beijing Institute of Technology,EXMODD (Explanatory Multimodal Open-Domain Dialogue dataset) is a dataset built off the proposed MDCF (Multimodal Data Construction Framework).,2023-10-17,https://arxiv.org/pdf/2310.10967.pdf,,"image, text",unknown,[],Models fine-tuned on EXMODD and earlier dataset Image-Chat and then evaluated on Image-Chat validation set.,"['YFCC100M', 'Image-Chat']",,,,open,MIT,,,,Feedback can be sent to authors via poplpr@bit.edu.cn,,,,,,,,,, -model,MiniMA,Beijing Institute of Technology,MiniMA is a smaller finetuned Llama 2 model adapted for Chinese.,2023-11-13,https://github.com/GeneZC/MiniMA,,text; text,3B parameters (dense),,"Evaluated on standard benchmarks including MMLU, CEval, and DROP.",['Llama 2'],,,,open,Llama 2,,,unknokwn,https://huggingface.co/GeneZC/MiniMA-3B/discussions,https://huggingface.co/GeneZC/MiniMA-3B,unknown,unknown,8 A100 80G GPUs,,,,,, -model,ChatGLM,ChatGLM,"ChatGLM is a Chinese-English language model with question and answer and dialogue functions, and is aimed at a Chinese audience.",2023-03-14,https://chatglm.cn/blog,,text; text,6B parameters (dense),,Performance evaluated on English and Chinese language benchmark tests.,[],,,,open,Apache 2.0,,,,,,unknown,unknown,,,,,,, -model,OpenFold,Columbia,OpenFold is an open source recreation of AlphaFold2.,2022-11-20,https://www.biorxiv.org/content/10.1101/2022.11.20.517210v2,,amino acid sequence; protein structure,,,Evaluated on wide range of tasks using own evaluation benchmarks.,"['AlphaFold2', 'OpenProteinSet']",,,,open,CC BY 4.0,,,,,,unknown,"50,000 GPU hours",Single A100 NVIDIA GPU,,,,,, -model,Ferret,"Columbia, Apple AI",Ferret is a Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.,2023-10-11,https://arxiv.org/pdf/2310.07704.pdf,,"image, text; image, text",13B parameters,,Evaluated on the object hallucination benchmark and compared to GPT-4V.,"['CLIP', 'Vicuna']",,,,open,Apple,,,,,,unknown,2.5 to 5 days,8 A100 GPUs,,,,,, -model,Guanaco,University of Washington,"Guanaco is a model family trained with QLORA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance.",2023-05-23,https://arxiv.org/pdf/2305.14314v1.pdf,,text; text,33B parameters (dense),,Reports results on the Vicuna benchmark and compares performance level and time expenditure with ChatGPT,"['QLoRA', 'OASST1']",,,,open,MIT,,,,,,,,A single 24 GB GPU,,,,,, -model,Llark,"University of Washington, Spotify",Llark is an instruction-tuned multimodal model for music understanding.,2023-10-11,https://arxiv.org/pdf/2310.07160.pdf,,"audio, text; text",12B parameters (dense),,Evaluated on benchmark music understanding tasks on SOTA music datasets.,"['LLaMA 2', 'Jukebox']",,,,open,Apache 2.0,,,,,,unknown,54 hours,4 80GB NVIDIA A40 GPUs,,,,,, -model,InternLM,InternLM,"InternLM is an LLM pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.",2023-09-20,https://github.com/InternLM/InternLM,,text; text,7B parameters (dense),,Evaluated on the dimensions proposed by OpenCompass in comparison to other LLMs.,[],,,,open,Apache 2.0,,,unknown,https://huggingface.co/internlm/internlm-20b/discussions,https://huggingface.co/internlm/internlm-20b,unknown,unknown,unknown,,,,,, -model,BioMistral,"Avignon University, Nantes University","BioMistral is an open-source Large Language Model tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.",2024-02-15,https://arxiv.org/pdf/2402.10373.pdf,,text; text,7B parameters (dense),,BioMistral was evaluated on a benchmark comprising 10 established medical question-answering (QA) tasks in English and seven other languages.,"['Mistral', 'PubMed Central']",,,,open,Apache 2.0,"Research in the biomedical domain, especially for medical question-answering tasks.",Prohibited from deploying in production environments for natural language generation or any professional health and medical purposes.,,https://huggingface.co/BioMistral/BioMistral-7B/discussions,https://huggingface.co/BioMistral/BioMistral-7B,unknown,unknown,32 NVIDIA A100 80GB GPUs,,,,,, -application,Khanmigo,Khan Academy,An AI-powered assistant that functions as both a virtual tutor for students and a classroom assistant for teachers.,2023-03-14,https://www.khanacademy.org/khan-labs#khanmigo,,,,,,['GPT-4 API'],,,,limited,unknown,,,,,,,,,,,,,, -model,GAIA-1,Wayve,"GAIA-1 (‘Generative AI for Autonomy’) is a generative world model that leverages video, text, and action inputs to generate realistic driving scenarios while offering fine-grained control over ego-vehicle behavior and scene features.",2023-09-29,https://arxiv.org/pdf/2309.17080.pdf,,"text, video; video",9B parameters (dense),,Compared to self before being scaled on quality of video generation.,[],,,,closed,unknown,"The main use cases are pure representation learning, planning (look-ahead search), or learning a policy in the world model (neural simulator)",,,,,unknown,4 days,32 A100 80GB GPUs,,,,,, -model,GreenBit LLaMA,GreenBit AI,GreenBit LLaMA is a series of fine-tuned LLaMA models.,2023-09-29,https://github.com/GreenBitAI/low_bit_llama,,text; text,30B parameters (dense),,Evaluated on common LLM benchmarks.,['LLaMA'],,,,open,Apache 2.0,,,unknown,https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8/discussions,https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8,unknown,unknown,unknown,,,,,, -model,Ocean-1,Cresta,Ocean-1 is the culmination of Cresta's experience in deploying generative AI systems for large enterprises and signifies their latest milestone in advancing the cutting edge AI technology for customer facing conversations.,2023-06-20,https://cresta.com/blog/introducing-ocean-1-worlds-first-contact-center-foundation-model/,,text; text,7B parameters (dense),,Outperforms GPT-4 in common sense and reasoning tasks on the basis of both efficiency and accuracy.,"['GPT-4', 'Claude', 'Falcon-40B']",,,,closed,unknown,Acting as a contact center chatbot agent.,,unknown,,,unknown,unknown,unknown,,,,,, -model,Aurora-M,"Tokyo Institute of Technology, MIT-IBM Watson Lab, Sapienza University of Rome","Aurora-M is a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code.",2024-04-23,https://arxiv.org/pdf/2404.00399,,text; text,15B parameters,,"Evaluated on all language datasets compared to similarly sized SOTA models, with Aurora-M achieving strong performance in most.",['StarCoderPlus'],,,,open,unknown,,,unknown,,,unknown,48 days,"LUMI supercomputer, using 128 AMD MI250X GPUs",,,,,, -model,CodeParrot,HuggingFace,CodeParrot is an autoregressive language model trained on code,2021-12-06,https://twitter.com/lvwerra/status/1467933794699259908,,"text; code, text",1B parameters (dense),,,[],,,,open,,,,,,,unknown,unknown,16 x A100 (40GB),,,,,, -model,Zephyr,HuggingFace,Zephyr is a series of language models that are trained to act as helpful assistants.,2023-10-11,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha,,text; text,7B parameters (dense),,"Evaluated on loss, rewards, logps, and logits rejected and chosen.",['Mistral'],,,,open,MIT,Educational and research purposes,,,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/discussions,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha,unknown,unknown,unknown,,,,,, -model,IDEFICS,HuggingFace,"IDEFICS is an open-access visual language model, based on Flamingo.",2023-08-22,https://huggingface.co/blog/idefics,,"image, text; text",80B parameters (dense),,Evaluated in comparison to Flamingo and OpenFlamingo on standard benchmarks.,"['OBELICS', 'Wikipedia', 'LAION-5B', 'PMD']",,,,open,custom,Educational and research purposes,,,https://huggingface.co/HuggingFaceM4/idefics-80b-instruct/discussions,https://huggingface.co/HuggingFaceM4/idefics-80b-instruct,unknown,unknown,unknown,,,,,, -dataset,OBELICS,HuggingFace,OBELICS is a dataset consisting of 141 million interleaved image-text documents scraped from the web and contains 353 million images.,2023-08-22,https://huggingface.co/blog/idefics,https://huggingface.co/datasets/HuggingFaceM4/OBELICS,"image, text",115B tokens,[],Subset of training dataset evaluated for bias using Data Measurements Tool.,[],,All images for which creators explicitly requested opt-out of AI training.,Sexual and violent content still present in OBELICS even after filtering.,open,CC-BY-4.0,,,,https://huggingface.co/datasets/HuggingFaceM4/OBELICS/discussions,,,,,,,,,, -model,FinGPT,"University of Turku, HuggingFace, National Library of Finland",FinGPT is a series of Finnish LLMs trained from scratch.,2023-11-03,https://arxiv.org/pdf/2311.05640.pdf,,text; text,13B parameters (dense),,"Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for Finnish.",[],,,unknown,open,Apache 2.0,,,unknown,https://huggingface.co/TurkuNLP/gpt3-finnish-13B/discussions,https://huggingface.co/TurkuNLP/gpt3-finnish-13B,unknown,unknown,"192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single 64-core AMD Trento CPU and 512GB of memory.",,,,,, -model,BLUUMI,"University of Turku, HuggingFace, National Library of Finland",BLUUMI is a multilingual fine-tuned version of BLOOM.,2023-11-03,https://arxiv.org/pdf/2311.05640.pdf,,text; text,176B parameters (dense),,"Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for Finnish.",['BLOOM'],,,unknown,open,BigScience RAIL v1.0,,,unknown,https://huggingface.co/TurkuNLP/bloom-finnish-176b/discussions,https://huggingface.co/TurkuNLP/bloom-finnish-176b,unknown,unknown,"192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single 64-core AMD Trento CPU and 512GB of memory.",,,,,, -dataset,Cosmopedia v0.1,Hugging Face,"Cosmopedia is a dataset of synthetic textbooks, blogposts, stories, posts, and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1. The dataset contains over 30 million files and 25 billion tokens, making it the largest open synthetic dataset to date. It covers a variety of topics, mapping worldwide knowledge from Web datasets like RefinedWeb and RedPajama, to generate synthetic content.",2024-02-22,,https://huggingface.co/datasets/HuggingFaceTB/cosmopedia,text,25B tokens,[],Some seed samples were used in different prompt styles and audiences. Less than 1% of files are duplicates after running MinHash deduplication. Contaminated samples were removed from each dataset split.,['Mixtral'],,unknown,Measures were taken to reduce redundancy and ensure diversity in generated content. A decontamination pipeline was implemented to avoid benchmark contamination.,open,unknown,,unknown,,https://huggingface.co/datasets/HuggingFaceTB/cosmopedia/discussions,,,,,,,,,, -model,Idefics2,Hugging Face,"Idefics2 is a general multimodal model that takes as input arbitrary sequences of text and images, generating text responses. It has the capability to describe visual content, answer questions about images, perform basic arithmetic operations, create stories grounded in multiple images, and extract information from documents.",2024-04-15,https://huggingface.co/blog/idefics2,,"image, text; text",8B parameters,,The performance of Idefics2 has been evaluated on numerous benchmarks. It is top of its class size and competes with much larger models such as LLava-Next-34B and MM1-30B-chat.,['The Cauldron'],,,"The quality of the model has been ensured by training it on a mixture of openly available datasets and enhancing its OCR capabilities. Further improvements include manipulating images in their native resolutions and aspect ratios, better pre-trained backbones, and allowing for sub-image splitting.",open,Apache 2.0,"The model can be used for answering questions about images, describing visual content, creating stories grounded in multiple images, extracting information from documents, and performing basic arithmetic operations.",unknown,unknown,https://huggingface.co/HuggingFaceM4/idefics2-8b/discussions,https://huggingface.co/HuggingFaceM4/idefics2-8b,unknown,unknown,unknown,,,,,, -dataset,The Cauldron,Hugging Face,The Cauldron is an open compilation of 50 manually-curated datasets formatted for multi-turn conversations.,2024-04-15,https://huggingface.co/blog/idefics2,https://huggingface.co/datasets/HuggingFaceM4/the_cauldron,"image, text",50 vision-language datasets,[],,"['LNarratives', 'Rendered Text', 'WebSight', 'DaTikz']",,,unknown,open,CC BY 4.0,,,unknown,https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/discussions,,,,,,,,,, -model,DeepFloyd IF,Stability AI,A text-to-image cascaded pixel diffusion model released in conjunction with AI research lab DeepFloyd.,2023-04-28,https://stability.ai/blog/deepfloyd-if-text-to-image-model,,text; image,4.3B parameters (dense),,Evaluated on the COCO dataset.,['LAION-5B'],,,,open,custom,,,,https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/discussions,https://huggingface.co/DeepFloyd/IF-I-XL-v1.0,,,,,,,,, -model,StableLM,Stability AI,Large language models trained on up to 1.5 trillion tokens.,2023-04-20,https://github.com/Stability-AI/StableLM,,text; text,7B parameters (dense),,,"['StableLM-Alpha dataset', 'Alpaca dataset', 'gpt4all dataset', 'ShareGPT52K dataset', 'Dolly dataset', 'HH dataset']",,,,open,Apache 2.0,,,,,,,,,,,,,, -application,Stable Diffusion,Stability AI,Stable Diffusion is a generative software that creates images from text prompts.,2022-08-22,https://stability.ai/blog/stable-diffusion-public-release,,,,,,[],,,,open,custom,,,,https://huggingface.co/CompVis/stable-diffusion/discussions,,,,,,image,,,, -application,Stable Diffusion XL,Stability AI,"Stable Diffusion XL is an updated version of Stable Diffusion, and creates descriptive images with shorter prompts and generate words within images.",2023-07-26,https://stability.ai/stablediffusion,,,,,,[],,,,open,Open Rail++,,,,,,,,,,image,,,, -model,Stable Video Diffusion,Stability AI,Stable Video Diffusion is a latent diffusion model trained to generate short video clips from an image conditioning.,2023-11-21,https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf,,text; video,unknown,,Evaluated via a user study comparing preferences between Stable Video Diffusion and competing text-to-video models.,['Large Video Dataset'],,,,limited,custom,Intended for research purposes only.,Using the model to generate representations of real-world people or events.,,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/discussions,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt,unknown,unknown,unknown,,,,,, -dataset,Large Video Dataset,Stability AI,"Large Video Dataset is the dataset that trained Stable Video Diffusion, consisting of over 212 years of content.",2023-11-21,https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf,,video with caption,580M annotated video clip pairs,[],Large Video Dataset compared to publicly available research datasets on general statistics before and after filtering.,"['WebVid-10M', 'CoCa', 'V-BLIP']",,,"Dataset annotated with dense optical flow, and low optical flow videos are removed.",closed,unknown,,,,,,,,,,,,,, -application,Sky Replacer,Stability AI,Sky Replacer is an exciting new tool that allows users to replace the color and aesthetic of the sky in their original photos with a selection of nine alternatives to improve the overall look and feel of the image.,2023-11-01,https://clipdrop.co/real-estate/sky-replacer,,,,,,[],,,,open,unknown,,,,,,,,,,image,https://stability.ai/terms-of-use,,, -model,StableLM 2,Stability AI,"StableLM 2 is a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch.",2024-01-19,https://stability.ai/news/introducing-stable-lm-2,,text; text,1.6B parameters (dense),,Evaluated on standard LLM benchmarks and in multilingual tasks compared to similarly sized open-source models.,"['RedPajama-Data', 'The Pile', 'RefinedWeb', 'The Stack', 'OpenWebText', 'OpenWebMath']",,,,open,custom,The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications.,,unknown,https://huggingface.co/stabilityai/stablelm-2-1_6b/discussions,https://huggingface.co/stabilityai/stablelm-2-1_6b,11 tCO2eq,92k GPU hours,512 NVIDIA A100 40GB GPUs,,,,,, -model,Stable Cascade,Stability AI,"Stable Cascade is built upon the Würstchen architecture and its main difference to other models, like Stable Diffusion, is that it is working at a much smaller latent space.",2024-01-16,https://huggingface.co/stabilityai/stable-cascade,,text; image,unknown,,Evaluated on image generation benchmarks in comparison to equal and smaller-sized models.,[],,,,open,custom,The model is intended for research purposes for now.,"The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy.",unknown,https://huggingface.co/stabilityai/stable-cascade/discussions,https://huggingface.co/stabilityai/stable-cascade,"2,276 kgCO2eq","24,602 A100 GPU hours",Some number of A100 GPUs,,,,,, -model,Stable Video 3D,Stability AI,"Stable Video 3D (SV3D) is a generative model based on Stable Video Diffusion that takes in a still image of an object as a conditioning frame, and generates an orbital video of that object.",2024-03-18,https://stability.ai/news/introducing-stable-video-3d,,image; video,unknown,,unknown,['Objaverse'],,,unknown,open,StabilityAI Non-Commercial Research Community License,This model is intended to be used for generating orbital videos of objects from still images.,"The model should not be used for generating factual or true representations of people or events, or in any way that violates Stability AIs Acceptable Use Policy.",unknown,https://huggingface.co/stabilityai/sv3d/discussions,https://huggingface.co/stabilityai/sv3d,unknown,unknown,unknown,,,,,, -model,Stable Audio 2.0,Stability AI,"Stable Audio 2.0 sets a new standard in AI-generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1kHz stereo.",2024-04-03,https://stability-ai.squarespace.com/news/stable-audio-2-0,,"audio, text; audio",unknown,,,['AudioSparx'],,,"To protect creator copyrights, for audio uploads, Stability AI partners with Audible Magic to use their content recognition (ACR) technology to power real-time content matching and prevent copyright infringement. Opt-out requests were honored during the training phase.",open,unknown,"It can be used to generate melodies, backing tracks, stems, and sound effects.",Uploading copyrighted material for transformation.,Advanced content recognition is used to maintain compliance and prevent copyright infringement.,,,unknown,unknown,unknown,,,,,, -model,Reka Flash,Reka,"Reka Flash is a multimodal, multilingual, state-of-the-art 21B model trained entirely from scratch.",2024-02-12,https://reka.ai/reka-flash-an-efficient-and-capable-multimodal-language-model/,,"image, text, video; text",21B parameters (dense),,"Evaluated on MMLU, GSM8K, HumanEval, and GPQA benchmarks, among others.",[],,,,limited,unknown,,,unknown,,,unknown,unknown,unknown,,,,,, -model,Reka Core,Reka,"Reka Core is a frontier-class multimodal language model comparable to industry leaders. It has powerful capabilities including multimodal understanding (including images, videos, and audio), superb reasoning abilities, code generation, and multilinguality with proficiency in 32 languages.",2024-04-15,https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model,,"audio, image, text, video; text",unknown,,"Reka Core was evaluated against leading models such as OpenAIs GPT-4, Claude-3 Opus, and Gemini Ultra on a variety of tasks and metrics including multimodal and human evaluation conducted by a third party. It was found to be competitive or even surpassing these models.",[],,,,limited,unknown,"Reka Core can be used in e-commerce, social media, digital content and video games, healthcare, robotics, and other industries for tasks that require multimodal understanding, coding, complex reasoning, and more.",unknown,unknown,unknown,,unknown,few months,thousands of GPUs,,,,,, -model,FuseChat,FuseAI,FuseChat is a powerful chat Language Learning Model (LLM) that integrates multiple structure and scale-varied chat LLMs using a fuse-then-merge strategy. The fusion is done using two stages,2024-02-26,https://arxiv.org/abs/2402.16107,,text; text,7B parameters,,"The FuseChat model was evaluated on MT-Bench which comprises 80 multi-turn dialogues spanning writing, roleplay, reasoning, math, coding, stem, and humanities domains. It yields an average performance of 66.52 with specific scores for individual domains available in the leaderboard results.","['Nous Hermes 2', 'OpenChat 3.5']",,,,open,Apache 2.0,"FuseChat is intended to be used as a powerful chat bot that takes in text inputs and provides text-based responses. It can be utilized in a variety of domains including writing, roleplay, reasoning, math, coding, stem, and humanities.",unknown,unknown,https://huggingface.co/FuseAI/FuseChat-7B-VaRM/discussions,https://huggingface.co/FuseAI/FuseChat-7B-VaRM,unknown,unknown,unknown,,,,,, -model,MoMo,Moreh,MoMo is a large language model fine-tuned from Qwen.,2024-01-16,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO,,text; text,72B parameters (dense),,unknown,"['Qwen', 'OpenOrca']",,,unknown,open,MIT,,,unknown,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO/discussions,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO,unknown,unknown,AMD’s MI250 GPU,,,,,, -model,Mistral,Mistral AI,Mistral is a compact language model.,2023-09-27,https://mistral.ai/news/announcing-mistral-7b/,,text; text,7.3B parameters (dense),,Evaluated in comparison to LLaMA series models on standard language benchmarks.,[],,,,open,Apache 2.0,,,,https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions,https://huggingface.co/mistralai/Mistral-7B-v0.1,unknown,unknown,unknown,,,,,, -model,Mistral Large,Mistral AI,Mistral Large is Mistral AI’s new cutting-edge text generation model.,2024-02-26,https://mistral.ai/news/mistral-large/,,text; text,unknown,,Evaluated on commonly used benchmarks in comparison to the current LLM leaders.,[],,,,limited,unknown,,,,,,unknown,unknown,unknown,,,,,, -application,Le Chat,Mistral AI,Le Chat is a first demonstration of what can be built with Mistral models and what can deployed in the business environment.,2024-02-26,https://mistral.ai/news/le-chat-mistral/,,,,,,"['Mistral', 'Mistral Large']",,,,limited,unknown,,,,,,,,,,,https://mistral.ai/terms/#terms-of-use,unknown,unknown,unknown -application,Reexpress One,Reexpress AI,"Reexpress One offers a means of document classification, semantic search, and uncertainty analysis on-device.",2023-03-21,https://re.express/index.html,,,,,,[],,,,limited,unknown,,,unknown,https://github.com/ReexpressAI/support,,,,,,data analyses,hhttps://re.express/tos.html,unknown,unknown,unknown -model,Dolphin 2.2 Yi,Cognitive Computations,Dolphin 2.2 Yi is an LLM based off Yi.,2023-11-14,https://erichartford.com/dolphin,,text; text,34B parameters (dense),,,"['Dolphin', 'Yi']",,,,open,custom,,,unknown,https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b/discussions,https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b,unknown,3 days,4 A100 GPUs,,,,,, -model,WizardLM Uncensored,Cognitive Computations,WizardLM Uncensored is WizardLM trained with a subset of the dataset - responses that contained alignment / moralizing were removed.,2023-06-01,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored,,text; text,30B parameters (dense),,Evaluated on OpenLLM leaderboard.,['WizardLM'],,,,open,unknown,,,unknown,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored/discussions,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored,unknown,unknown,unknown,,,,,, -application,DuckAssist,DuckDuckGo,The first Instant Answer in DuckDuckGo search results to use natural language technology to generate answers to search queries using Wikipedia and other related sources,2023-03-08,https://spreadprivacy.com/duckassist-launch/,,,,,,['Anthropic API'],,,,open,unknown,,,,,,,,,,,,,, -application,Perplexity Ask,Perplexity,Perplexity Ask is a new search interface that uses advanced artificial intelligence technologies,2022-12-07,https://www.perplexity.ai/,,,,,,"['GPT-3.5', 'Bing Search']",,,,open,,,,,,,,,,,,,,, -application,Bird SQL,Perplexity,Twitter search interface that is powered by Perplexity's structured search engine.,2022-12-15,https://www.perplexity.ai/sql,,,,,,"['Perplexity Ask', 'OpenAI API']",,,,closed,,,,,,,,,,,,,,, -application,Perplexity Chat,Perplexity,Perplexity chat is an AI chatbot trained in-house by Perplexity.,2023-10-27,https://labs.perplexity.ai/,,,,,,[],,,,open,,,,,,,,,,,Chatbot output in response to user queries,https://blog.perplexity.ai/legal/terms-of-service,,, -model,Vulture,Virtual Interactive,Vulture is a further fine-tuned causal Decoder-only LLM built by Virtual Interactive (VILM) on top of Falcon.,2023-10-02,https://huggingface.co/vilm/vulture-180b,,text; text,180B parameters (dense),,,['Falcon-180B'],,,,open,Apache 2.0,,Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,unknown,https://huggingface.co/vilm/vulture-180b/discussions,https://huggingface.co/vilm/vulture-180b,unknown,3000 A100 hours,unknown,,,,,, -model,DeciLM,Deci,DeciLM is a LLM that on release ranks as the fastest and most accurate model of its size.,2023-12-12,https://deci.ai/blog/introducing-decilm-7b-the-fastest-and-most-accurate-7b-large-language-model-to-date,,text; text,7B parameters (dense),,"Evaluated on the OpenLLM benchmarks and, on release, outperforms all other 7B models on the OpenLLM Leaderboard.",[],,,,open,Apache 2.0,This model is intended for commercial and research use in English and can be fine-tuned for use in other languages.,,unknown,,https://deci.ai/model-zoo/decilm-7b/,unknown,unknown,NVIDIA A10 GPUs,,,,,, +type,name,organization,description,created_date,url,model_card,modality,analysis,size,dependencies,training_emissions,training_time,training_hardware,quality_control,access,license,intended_uses,prohibited_uses,monitoring,feedback,adaptation,output_space,terms_of_service,monthly_active_users,user_distribution,failures,datasheet,sample,included,excluded +model,GenSLM,Argonne National Laboratory,,2022-10-11,https://www.biorxiv.org/content/10.1101/2022.10.10.511571v1,,text; genome sequence,,25B parameters (dense),"['SARS-CoV-2 genome dataset', 'BV-BRC dataset']",,,,,open,MIT,,,,,,,,,,,,,, +model,h2oGPT,H2O AI,Series of models fine-tuned on well-known LLMs using the h2oGPT repositories.,2023-06-16,https://arxiv.org/pdf/2306.08161.pdf,https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b,text; text,Evaluated on EleutherAI evaluation harness.,20B parameters (dense),"['GPT-NeoX', 'H2O AI OpenAssistant', 'h2oGPT Repositories']",unknown,unknown,unspecified number of 48GB A100 NVIDIA GPUs,,open,Apache 2.0,,,,https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b/discussions,,,,,,,,,, +model,H2O Danube,H2O AI,H2O Danube is a language model trained on 1T tokens following the core principles of LLaMA 2 and Mistral.,2024-01-30,https://arxiv.org/pdf/2401.16818.pdf,https://huggingface.co/h2oai/h2o-danube-1.8b-base,text; text,Evaluated on common sense and world knowledge benchmarks.,1.8B parameters (dense),[],unknown,unknown,8x H100 GPUs on a single node,unknown,open,Apache 2.0,,"Users are encouraged to use the large language model responsibly and ethically. By using this model, you agree not to use it for purposes that promote hate speech, discrimination, harassment, or any form of illegal or harmful activities.",unknown,https://huggingface.co/h2oai/h2o-danube-1.8b-base/discussions,,,,,,,,,, +application,Character,Character AI,Character allows users to converse with various chatbot personas.,2022-09-16,https://beta.character.ai/,,,,,[],,,,,limited,unknown,,,,,,AI-generated chat conversations,https://beta.character.ai/tos,unknown,unknown,unknown,,,, +model,Bark,Suno,Bark is a text-to-audio model that can generate multilingual speech as well as other noises.,2023-04-20,https://github.com/suno-ai/bark,https://github.com/suno-ai/bark/blob/main/model-card.md,text; audio,,,['AudioLM'],unknown,unknown,,,open,MIT,,,,https://huggingface.co/spaces/suno/bark/discussions,,,,,,,,,, +model,Baichuan 2,Baichuan Inc.,"Baichuan 2 is a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.",2023-09-20,https://arxiv.org/pdf/2309.10305.pdf,,text; text,"Evaluated on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.",13B parameters (dense),[],unknown,unknown,1024 NVIDIA A800 GPUs,,open,Apache 2.0,,,,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions,,,,,,,,,, +application,Q-Chat,Quizlet,"Quizlet is introducing Q-Chat, a fully-adaptive AI tutor that engages students with adaptive questions based on relevant study materials delivered through a fun chat experience.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,['ChatGPT API'],,,,,open,,,,,,,,https://quizlet.com/tos,,,,,,, +dataset,LP-MusicCaps,South Korea Graduate School of Culture Technology,LP-MusicCaps is a LLM-based pseudo music caption dataset.,2023-07-31,https://arxiv.org/pdf/2307.16372.pdf,,,"Evaluated in comparison to the MusicCaps dataset and with respect to n-gram, neural metrics.",2.2M captions paired with 0.5M audio clips,"['MusicCaps', 'Million Song Dataset', 'Magnatagtune']",,,,,open,CC-BY-NC-4.0,,,,,,,,,,,,[],, +model,MPT,Mosaic,MPT is a series of large language models seeking to address the limitations of other open source models like LLaMA and Pythia.,2023-05-05,https://www.mosaicml.com/blog/mpt-7b,,text; text,Evaluated on a range of benchmarks and performed on par with LLaMA-7B.,7B parameters (dense),"['RedPajama-Data', 'C4', 'The Stack', 'Multimodal C4']",unknown,9.5 days,440 A100 40GB GPUs,,open,Apache 2.0,,,,,,,,,,,,,, +model,CommonCanvas,"Cornell University, Mosaic",CommonCanvas is a text-to-image model trained solely on Creative Commons licensed images.,2023-10-25,https://arxiv.org/pdf/2310.16825.pdf,,text; image,"Compared to Stable Diffusion 2, a SOTA text-to-image model.",unknown,['CommonCatalog'],unknown,6.79 days,128 A100 NVIDIA GPUs,,open,Apache 2.0,,,,,,,,,,,,,, +dataset,CommonCatalog,Mosaic,CommonCatalog is a curated dataset of CommonCrawl images and synthetic captions.,2023-10-25,https://arxiv.org/pdf/2310.16825.pdf,,image-caption pairings,,70M images,"['YFCC100M', 'BLIP-2']",,,,,open,Apache 2.0,,,,,,,,,,,https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md,[],images with derivative licenses,images with non-derivative licenses +dataset,Neeva dataset,Neeva,,,https://neeva.com/index,,text,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,Neeva model,Neeva,,,https://neeva.com/index,,text; text,,unknown,['Neeva dataset'],,,,,closed,unknown,,,,,,,,,,,,,, +application,NeevaAI,Neeva,NeevaAI is an AI-powered search tool that combines the capabilities of LLMs with Neeva's independent in-house search stack to create a unique and transformative search experience.,2023-01-06,https://neeva.com/blog/introducing-neevaai,,,,,['Neeva model'],,,,,open,Custom,,,,,,,https://neeva.com/terms,,,,,,, +model,StarCoder,BigCode,"StarCoder is a Large Language Model for Code (Code LLM) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks.",2023-05-09,https://arxiv.org/pdf/2305.06161.pdf,https://huggingface.co/bigcode/starcoder,code; code,"Tested on several benchmarks, most notably Python benchmark HumanEval.",15.5B parameters (dense),['The Stack'],16.68 tons of CO2eq,"320,256 GPU hours",512 A100 80GB GPUs distributed across 64 nodes,"No specific quality control is mentioned in model training, though details on data processing and how the tokenizer was trained are provided in the paper.",open,BigCode Open RAIL-M v1.0,"As a foundation model to fine-tune and create more specialized models that support use cases such as code completion, fill-in-the-middle, and text summarization. Can also be used as a Tech Assistant prompt and not as an instruction model given training limitations.",See BigCode Open RAIL-M license and FAQ,,https://huggingface.co/bigcode/starcoder/discussions,,,,,,,,,, +model,SantaCoder,BigCode,Multilingual code model derived from the findings of BigCode Project analysis on Github stars' association to data quality.,2023-02-24,https://arxiv.org/pdf/2301.03988.pdf,https://huggingface.co/bigcode/santacoder,code; code,Evaluated on MultiPL-E system benchmarks.,1.1B parameters (dense),"['The Stack', 'BigCode Dataset']",124 kg of CO2eq,"14,284 GPU hours",96 NVIDIA Tesla V100 GPUs,,open,BigCode Open RAIL-M v1,The model was trained on GitHub code. As such it is not an instruction model and commands do not work well. You should phrase commands like they occur in source code such as comments or write a function signature and docstring and let the model complete the function body.,See BigCode Open RAIL-M license and FAQ,,https://huggingface.co/bigcode/santacoder/discussions,,,,,,,,,, +dataset,The Stack,BigCode,"The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The Stack serves as a pre-training dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis of programs from natural language descriptions as well as other from code snippets.",2022-11-20,https://arxiv.org/pdf/2211.15533.pdf,,code,Evaluated models trained on The Stack on HumanEval and MBPP and compared against similarly-sized models.,6 TB,['GitHub'],,,,allowed users whose data were part of The Stack's training data to opt-out,open,"The Stack is a collection of source code from repositories with various licenses. Any use of all or part of the code gathered in The Stack must abide by the terms of the original licenses, including attribution clauses when relevant. Provenance information is provided for each data point.",creating code LLMs,See https://huggingface.co/datasets/bigcode/the-stack,,https://huggingface.co/datasets/bigcode/the-stack/discussions,,,,,,,https://huggingface.co/datasets/bigcode/the-stack,['https://huggingface.co/datasets/bigcode/the-stack/viewer/default/train'],, +model,StarCoder2-15B,BigCode,"StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded. The training was carried out using the Fill-in-the-Middle objective on 4+ trillion tokens.",2024-02-28,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,https://huggingface.co/bigcode/starcoder2-15b,code; text,See https://arxiv.org/pdf/2402.19173.pdf,15B parameters (dense),['The Stack v2'],unknown,unknown,1024 x H100 GPUs,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,open,BigCode OpenRail-M,"The model was trained on GitHub code as well as additional selected data sources such as Arxiv and Wikipedia. As such it is not an instruction model and commands like ""Write a function that computes the square root."" do not work well. Intended to generate code snippets from given context, but not for writing actual functional code directly.",See BigCode Open RAIL-M license and FAQ,unknown,https://huggingface.co/bigcode/starcoder2-15b/discussions,,,,,,,,,, +model,StarCoder2-7B,BigCode,"StarCoder2-7B model is a 7B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3.5+ trillion tokens.",2024-02-28,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,https://huggingface.co/bigcode/starcoder2-7b,code; text,See https://arxiv.org/pdf/2402.19173.pdf,7B parameters (dense),['The Stack v2'],"29,622.83 kgCO2eq","145,152 hours (cumulative)",432 H100 GPUs,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,open,BigCode OpenRail-M,"Intended to generate code snippets from given context, but not for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.",See BigCode Open RAIL-M license and FAQ,unknown,https://huggingface.co/bigcode/starcoder2-7b/discussions,,,,,,,,,, +model,StarCoder2-3B,BigCode,"StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.",2024-02-28,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,https://huggingface.co/bigcode/starcoder2-3b,code; text,See https://arxiv.org/pdf/2402.19173.pdf,3B parameters (dense),['The Stack v2'],"16,107.01 kgCO2eq","97,120 hours (cumulative)",160 A100 GPUs,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,open,BigCode OpenRail-M,"Intended to generate code snippets from given context, but not for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.",See BigCode Open RAIL-M license and FAQ,unknown,https://huggingface.co/bigcode/starcoder2-3b/discussions,,,,,,,,,, +model,MediTron,"EPFL, Idiap Research Institute, OpenAssistant, Yale",Meditron is a large-scale medical LLM that remains open-source.,2023-11-27,https://arxiv.org/pdf/2311.16079.pdf,https://huggingface.co/epfl-llm/meditron-70b,text; text,Evaluated on TruthfulQA as main evaluation benchmark.,70B parameters (dense),[],unknown,unknown,unknown,,open,LLaMA 2,"Medical exam question answering, supporting differential diagnosis, disease information.",,,https://huggingface.co/epfl-llm/meditron-7b/discussions,,,,,,,,,, +dataset,COYO-700M,Kakao Brain,"COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models. +",2022-08-31,https://github.com/kakaobrain/coyo-dataset,,"image, text","""We empirically validated the quality of COYO dataset by re-implementing popular models such as ALIGN, unCLIP, and ViT. We trained these models on COYO-700M or its subsets from scratch, achieving competitive performance to the reported numbers or generated samples in the original papers."" +",747M image-text pairs,['CommonCrawl'],,,,,open,CC-BY-4.0,"""It is strongly recommended that this dataset be used only for research, keeping this in mind when using the dataset, and Kakao Brain does not recommend using this dataset as it is without special processing to clear inappropriate data to create commercial products."" +","Authors note the following limitations: ""Kakao Brain tried to construct a ""Safe"" dataset when building the COYO dataset. However, despite these efforts, this large-scale dataset was not hand-picked by humans to avoid the risk due to its very large size (over 700M). Keep in mind that the unscreened nature of the dataset means that the collected images can lead to strongly discomforting and disturbing content for humans. The COYO dataset may contain some inappropriate data, and any problems resulting from such data are the full responsibility of the user who used it."" +",,Feedback can be given by emailing at coyo at kakaobrain.com,,,,,,,,[],Included all image formats that Pillow library can decode. Collected only English text using cld3.,"Removed images less than 5KB image size. Removed images with an aspect ratio greater than 3.0. Removed images with min(width, height) < 200. Removed images with a score of OpenNSFW2 or GantMan/NSFW higher than 0.5. Removed all duplicate images based on the image pHash value from external public datasets. Removed texts with a length of 5 or less. Removed texts that do not have a noun form. Removed texts with less than 3 words or more than 256 words and texts over 1000 in length. Removed texts appearing more than 10 times. Removed texts containing NSFW words. Removed duplicated samples based on (image_phash, text)." +dataset,Luminous dataset,Aleph Alpha,The dataset used to train the Luminous models.,2022-04-14,,,text,unknown,unknown,[],,,,unknown,closed,unknown,unknown,unknown,,,,,,,,,,[],unknown,unknown +model,Luminous,Aleph Alpha,Luminous is a family of multilingual language models,2022-04-14,https://twitter.com/Aleph__Alpha/status/1514576711492542477,,text; text,,200B parameters (dense),['Luminous dataset'],unknown,unknown,unknown,,limited,,,,,,,,,,,,,,, +application,Aleph Alpha API,Aleph Alpha,The Aleph Alpha API serves a family of text-only language models (Luminous) and multimodal text-and-image models (Magma).,2021-09-30,https://www.aleph-alpha.com/,,,,,['Luminous'],,,,,limited,,unknown,unknown,unknown,unknown,,The text models provide text outputs given text inputs. The multimodal models provide text completions given text and image inputs.,https://www.aleph-alpha.com/terms-conditions,unknown,unknown,unknown,,,, +model,MAGMA,Aleph Alpha,An autoregressive VL model that is able to generate text from an arbitrary combination of visual and textual input,2022-10-24,https://arxiv.org/pdf/2112.05253.pdf,,"image, text; text",Evaluated on the OKVQA benchmark as a fully open-ended generative task.,6B parameters (dense),"['GPT-J', 'CLIP']",,,32 A100 GPUs,,open,MIT,,,,,,,,,,,,,, +model,CodeParrot,HuggingFace,CodeParrot is an autoregressive language model trained on code,2021-12-06,https://twitter.com/lvwerra/status/1467933794699259908,,"text; code, text",,1B parameters (dense),[],unknown,unknown,16 x A100 (40GB),,open,,,,,,,,,,,,,,, +model,Zephyr,HuggingFace,Zephyr is a series of language models that are trained to act as helpful assistants.,2023-10-11,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha,text; text,"Evaluated on loss, rewards, logps, and logits rejected and chosen.",7B parameters (dense),['Mistral'],unknown,unknown,unknown,,open,MIT,Educational and research purposes,,,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/discussions,,,,,,,,,, +model,IDEFICS,HuggingFace,"IDEFICS is an open-access visual language model, based on Flamingo.",2023-08-22,https://huggingface.co/blog/idefics,https://huggingface.co/HuggingFaceM4/idefics-80b-instruct,"image, text; text",Evaluated in comparison to Flamingo and OpenFlamingo on standard benchmarks.,80B parameters (dense),"['OBELICS', 'Wikipedia', 'LAION-5B', 'PMD']",unknown,unknown,unknown,,open,custom,Educational and research purposes,,,https://huggingface.co/HuggingFaceM4/idefics-80b-instruct/discussions,,,,,,,,,, +dataset,OBELICS,HuggingFace,OBELICS is a dataset consisting of 141 million interleaved image-text documents scraped from the web and contains 353 million images.,2023-08-22,https://huggingface.co/blog/idefics,,"image, text",Subset of training dataset evaluated for bias using Data Measurements Tool.,115B tokens,[],,,,Sexual and violent content still present in OBELICS even after filtering.,open,CC-BY-4.0,,,,https://huggingface.co/datasets/HuggingFaceM4/OBELICS/discussions,,,,,,,https://huggingface.co/datasets/HuggingFaceM4/OBELICS,[],,All images for which creators explicitly requested opt-out of AI training. +model,FinGPT,"University of Turku, HuggingFace, National Library of Finland",FinGPT is a series of Finnish LLMs trained from scratch.,2023-11-03,https://arxiv.org/pdf/2311.05640.pdf,https://huggingface.co/TurkuNLP/gpt3-finnish-13B,text; text,"Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for Finnish.",13B parameters (dense),[],unknown,unknown,"192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single 64-core AMD Trento CPU and 512GB of memory.",unknown,open,Apache 2.0,,,unknown,https://huggingface.co/TurkuNLP/gpt3-finnish-13B/discussions,,,,,,,,,, +model,BLUUMI,"University of Turku, HuggingFace, National Library of Finland",BLUUMI is a multilingual fine-tuned version of BLOOM.,2023-11-03,https://arxiv.org/pdf/2311.05640.pdf,https://huggingface.co/TurkuNLP/bloom-finnish-176b,text; text,"Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for Finnish.",176B parameters (dense),['BLOOM'],unknown,unknown,"192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single 64-core AMD Trento CPU and 512GB of memory.",unknown,open,BigScience RAIL v1.0,,,unknown,https://huggingface.co/TurkuNLP/bloom-finnish-176b/discussions,,,,,,,,,, +dataset,Cosmopedia v0.1,Hugging Face,"Cosmopedia is a dataset of synthetic textbooks, blogposts, stories, posts, and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1. The dataset contains over 30 million files and 25 billion tokens, making it the largest open synthetic dataset to date. It covers a variety of topics, mapping worldwide knowledge from Web datasets like RefinedWeb and RedPajama, to generate synthetic content.",2024-02-22,,,text,Some seed samples were used in different prompt styles and audiences. Less than 1% of files are duplicates after running MinHash deduplication. Contaminated samples were removed from each dataset split.,25B tokens,['Mixtral'],,,,Measures were taken to reduce redundancy and ensure diversity in generated content. A decontamination pipeline was implemented to avoid benchmark contamination.,open,unknown,,unknown,,https://huggingface.co/datasets/HuggingFaceTB/cosmopedia/discussions,,,,,,,https://huggingface.co/datasets/HuggingFaceTB/cosmopedia,[],,unknown +model,Idefics2,Hugging Face,"Idefics2 is a general multimodal model that takes as input arbitrary sequences of text and images, generating text responses. It has the capability to describe visual content, answer questions about images, perform basic arithmetic operations, create stories grounded in multiple images, and extract information from documents.",2024-04-15,https://huggingface.co/blog/idefics2,https://huggingface.co/HuggingFaceM4/idefics2-8b,"image, text; text",The performance of Idefics2 has been evaluated on numerous benchmarks. It is top of its class size and competes with much larger models such as LLava-Next-34B and MM1-30B-chat.,8B parameters,['The Cauldron'],unknown,unknown,unknown,"The quality of the model has been ensured by training it on a mixture of openly available datasets and enhancing its OCR capabilities. Further improvements include manipulating images in their native resolutions and aspect ratios, better pre-trained backbones, and allowing for sub-image splitting.",open,Apache 2.0,"The model can be used for answering questions about images, describing visual content, creating stories grounded in multiple images, extracting information from documents, and performing basic arithmetic operations.",unknown,unknown,https://huggingface.co/HuggingFaceM4/idefics2-8b/discussions,,,,,,,,,, +dataset,The Cauldron,Hugging Face,The Cauldron is an open compilation of 50 manually-curated datasets formatted for multi-turn conversations.,2024-04-15,https://huggingface.co/blog/idefics2,,"image, text",,50 vision-language datasets,"['LNarratives', 'Rendered Text', 'WebSight', 'DaTikz']",,,,unknown,open,CC BY 4.0,,,unknown,https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/discussions,,,,,,,https://huggingface.co/datasets/HuggingFaceM4/the_cauldron,[],, +model,Konan LLM,Konan,"Konan LLM is a Large Language Model developed in-house by Konan Technology. Optimized for super-large AI training, it leverages high-quality, large-scale data and over 20 years of expertise in natural language processing.",2023-09-17,https://en.konantech.com/en/llm/konanllm,,text; text,,13B parameters,[],unknown,unknown,unknown,,limited,unknown,"Document generation, document review, Q&A, customer response scenarios.",,,,,,,,,,,,, +application,Notion AI,Notion,"Notion AI is a connected assistant that helps you think bigger, work faster, and augments your creativity, right inside the functional workspace you’re already familiar with.",2023-02-22,https://www.notion.so/help/guides/notion-ai-for-docs,,,,,['Anthropic API'],,,,,limited,,,,,,,,,,,,,,, +model,Falcon-40B,UAE Technology Innovation Institute,"Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora.",2023-06-14,https://arxiv.org/pdf/2311.16867.pdf,https://huggingface.co/tiiuae/falcon-40b,text; text,"Evaluated in 1-shot against the PaLM models, with the tasks of the paper ""Language models are few-shot learners"" (Brown et al., 2020); (2) on a small set of few-shot tasks reported by the GPT-4 paper; (3) against state-of-the-art models across common sense, question answering, and code tasks; (4) against models which also report results from the EAI Harness, for which we are able to compare with identical prompts and metrics.",40B parameters (dense),['RefinedWeb'],unknown,2 months,384 A100 40GB GPUs,,open,Apache 2.0,Research on large language models; as a foundation for further specialization for specific use cases.,irresponsible or harmful use or production use without adequate assessment of risks and mitigation.,None,https://huggingface.co/tiiuae/falcon-40b/discussions,,,,,,,,,, +dataset,RefinedWeb,UAE Technology Innovation Institute,RefinedWeb is a high-quality five trillion tokens web-only English pretraining dataset.,2023-06-01,https://arxiv.org/pdf/2306.01116.pdf,,text,,600B tokens,[],,,,,open,custom,,,,,,,,,,,https://huggingface.co/datasets/tiiuae/falcon-refinedweb,[],, +model,Falcon-180B,UAE Technology Innovation Institute,"Falcon-180B is a 180B parameters causal decoder-only model built by TII and trained on 3,500B tokens of RefinedWeb enhanced with curated corpora.",2023-09-06,https://arxiv.org/pdf/2311.16867.pdf,https://huggingface.co/tiiuae/falcon-180B,text; text,"Falcon-180B outperforms LLaMA-2, StableLM, RedPajama, MPT on the Open LLM Leaderboard at https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.",180B parameters (dense),['RefinedWeb'],,9 months,4096 A100 40GB GPUs,,open,unknown,Research on large language models; as a foundation for further specialization for specific use cases.,Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,None,https://huggingface.co/tiiuae/falcon-180b/discussions,,,,,,,,,, +application,UnderwriteGPT,Paladin Group and Dais Technology,UnderwriteGPT is the world's first generative AI underwriting tool.,2023-02-01,https://dais.com/underwritegpt/,,,,,[],,,,,limited,,,,,,,,,,,,,,, +dataset,Open X-Embodiment dataset,Open X-Embodiment,"The Open X-Embodiment dataset is a dataset of robot movements assembled from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks)",2023-10-03,https://robotics-transformer-x.github.io/,,robot trajectories,"Analyzed on breakdown of types of robot trajectory in dataset, and overall coverage.",160K tasks,[],,,,unknown,open,Apache 2.0,Further research on X-embodiment models.,,unknown,,,,,,,,All data can be found at https://robotics-transformer-x.github.io/.,[],N/A,N/A +model,RT-1-X,"Open X-Embodiment, Google Deepmind","RT-1-X is a model trained on the Open X-Embodiment dataset that exhibits better generalization and new capabilities compared to its predecessor RT-1, an efficient Transformer-based architecture designed for robotic control.",2023-10-03,https://robotics-transformer-x.github.io/,,"images, text; robot trajectories","Evaluated on in-distribution robotics skills, and outperforms its predecessor RT-1 by 50% in emergent skill evaluations.",35M parameters (dense),"['Open X-Embodiment dataset', 'ImageNet EfficientNet', 'USE']",unknown,unknown,unknown,unknown,open,Apache 2.0,Further research on X-embodiment models.,,unknown,,,,,,,,,,, +model,RT-2-X,"Open X-Embodiment, Google Deepmind","RT-2-X is a model trained on the Open X-Embodiment dataset that exhibits better generalization and new capabilities compared to its predecessor RT-2, a large vision-language model co-fine-tuned to output robot actions as natural language tokens.",2023-10-03,https://robotics-transformer-x.github.io/,,"images, text, robot trajectories; robot trajectories","Evaluated on in-distribution robotics skills, and outperforms its predecessor RT-2 by 3x in emergent skill evaluations.",55B parameters (dense),"['Open X-Embodiment dataset', 'ViT (unknown size)', 'UL2']",unknown,unknown,unknown,unknown,closed,unknown,Further research on X-embodiment models.,,unknown,,,,,,,,,,, +application,ARES,Faraday Lab,ARES is a text-to-image generator based on Stable Diffusion. The goal is to provide a simple tool with a user interface allowing mainstream AI access for artists and creators.,2023-04-26,https://faradaylab.fr/,,,,,['Stable Diffusion'],,,,,open,unknown,,,,,,generated images,,,,,,,, +application,ChatGPT powered by OBO,HubSpot,"Give your sales, marketing, and customer service teams one of the most powerful AI tools available - ChatGPT priority access, no timeout limits, company wide access managed through a single account, incorporate into your existing processes without leaving HubSpot",2023-01-31,https://ecosystem.hubspot.com/marketplace/apps/sales/sales-enablement/the-obo-group-chatgpt-1398072,,,,,['ChatGPT API'],,,,,limited,unknown,,,,,,,,,,,,,, +application,Auto-GPT,Auto-GPT,Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model.,2023-04-16,https://news.agpt.co/,,,,,['GPT-4 API'],,,,,open,MIT,,,,,"GPT-4 adapted to run autonomously by chaining together LLM ""thoughts""",text,,,,,,,, +model,SciPhi Mistral,SciPhi,SciPhi Mistral is a Large Language Model (LLM) fine-tuned from Mistral.,2023-11-07,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k,text; text,,7B parameters (dense),['Mistral'],unknown,unknown,unknown,,open,MIT,,,unknown,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k/discussions,,,,,,,,,, +model,EXAONE 2.0,LG AI Research,EXAONE 2.0 is a multimodal artificial intelligence that can be used to help develop new materials and medicines.,2023-07-19,https://www.lgresearch.ai/exaone,,"image, text; image, text",,unknown,[],unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +application,Robin AI,Robin AI,"Robin AI uses Claude and Anthropic's models to understand language - including in technical domains like legal language. It's also very confident at drafting, summarising, translations, and explaining complex concepts in simple terms",,https://www.robinai.co.uk/,,,,,['Anthropic API'],,,,,limited,,,,,,,,https://www.robinai.co.uk/terms,,,,,,, +model,InternLM,InternLM,"InternLM is an LLM pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.",2023-09-20,https://github.com/InternLM/InternLM,https://huggingface.co/internlm/internlm-20b,text; text,Evaluated on the dimensions proposed by OpenCompass in comparison to other LLMs.,7B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/internlm/internlm-20b/discussions,,,,,,,,,, +model,FuseChat,FuseAI,FuseChat is a powerful chat Language Learning Model (LLM) that integrates multiple structure and scale-varied chat LLMs using a fuse-then-merge strategy. The fusion is done using two stages,2024-02-26,https://arxiv.org/abs/2402.16107,https://huggingface.co/FuseAI/FuseChat-7B-VaRM,text; text,"The FuseChat model was evaluated on MT-Bench which comprises 80 multi-turn dialogues spanning writing, roleplay, reasoning, math, coding, stem, and humanities domains. It yields an average performance of 66.52 with specific scores for individual domains available in the leaderboard results.",7B parameters,"['Nous Hermes 2', 'OpenChat 3.5']",unknown,unknown,unknown,,open,Apache 2.0,"FuseChat is intended to be used as a powerful chat bot that takes in text inputs and provides text-based responses. It can be utilized in a variety of domains including writing, roleplay, reasoning, math, coding, stem, and humanities.",unknown,unknown,https://huggingface.co/FuseAI/FuseChat-7B-VaRM/discussions,,,,,,,,,, +model,GOAT,National University of Singapore,GOAT is a fine-tuned LLaMA model which uses the tokenization of numbers to significantly outperform benchmark standards on a range of arithmetic tasks.,2023-05-23,https://arxiv.org/pdf/2305.14201.pdf,,text; text,"Performance assessed on BIG-bench arithmetic sub-task, and various elementary arithmetic tasks.",7B parameters (dense),"['LLaMA', 'GOAT dataset']",unknown,unknown,24 GB VRAM GPU,Number data is randomly generated from log space to reduce likelihood of redundancy and range of magnitudes.,open,Apache 2.0,Integration into other instruction-tuned LLMs to further enhance arithmetic reasoning abilities in solving math word problems.,,,,,,,,,,,,, +model,OpenMoE,"National University of Singapore, University of Edinburgh, ETH Zurich",OpenMoE is a series of fully open-sourced and reproducible decoder-only MoE LLMs.,2024-01-12,https://github.com/XueFuzhao/OpenMoE,https://huggingface.co/OrionZheng/openmoe-base,text; text,Evaluated on relatively simple established benchmarks.,34B parameters (dense),"['RedPajama', 'The Stack']",unknown,unknown,unknown,unknown,open,Apache 2.0,,,unknown,https://huggingface.co/OrionZheng/openmoe-base/discussions,,,,,,,,,, +model,Ocean-1,Cresta,Ocean-1 is the culmination of Cresta's experience in deploying generative AI systems for large enterprises and signifies their latest milestone in advancing the cutting edge AI technology for customer facing conversations.,2023-06-20,https://cresta.com/blog/introducing-ocean-1-worlds-first-contact-center-foundation-model/,,text; text,Outperforms GPT-4 in common sense and reasoning tasks on the basis of both efficiency and accuracy.,7B parameters (dense),"['GPT-4', 'Claude', 'Falcon-40B']",unknown,unknown,unknown,,closed,unknown,Acting as a contact center chatbot agent.,,unknown,,,,,,,,,,, +dataset,Common Corpus,Pleias,"Common Corpus is the largest public domain dataset released for training Large Language Models (LLMs). This dataset includes 500 billion words from a diverse range of cultural heritage initiatives and is the largest corpus in English, French, Dutch, Spanish, German and Italian. It supports efforts to train fully open LLMs on sources without copyright concerns.",2024-03-20,https://huggingface.co/blog/Pclanglais/common-corpus,,text,unknown,500 billion words,[],,,,"All data included in the corpus are from fully open and auditable sources, ensuring they are copyright-free.",open,,"The dataset is intended to support open and reproducible AI research, enhancing accessibility, diversity, and democracy in AI by enabling everyone to explore large models.",It should not be used for tasks that infringe on copyright laws.,unknown,unknown,,,,,,,,[],"The dataset includes 500 billion words from a wide diversity of cultural heritage initiatives. It also has the largest English-speaking dataset to date with 180 billion words, including a major US collection of 21 million digitized newspapers and large monographs datasets collected by digital historian Sebastian Majstorovic. It also contains a huge volume of data in French (110 billion words), German (30 billion words), Spanish, Dutch and Italian, as well as data in low-resource languages that are currently underrepresented.",The data excluded are those that have copyright issues. +model,Guanaco,University of Washington,"Guanaco is a model family trained with QLORA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance.",2023-05-23,https://arxiv.org/pdf/2305.14314v1.pdf,,text; text,Reports results on the Vicuna benchmark and compares performance level and time expenditure with ChatGPT,33B parameters (dense),"['QLoRA', 'OASST1']",,,A single 24 GB GPU,,open,MIT,,,,,,,,,,,,,, +model,Llark,"University of Washington, Spotify",Llark is an instruction-tuned multimodal model for music understanding.,2023-10-11,https://arxiv.org/pdf/2310.07160.pdf,,"audio, text; text",Evaluated on benchmark music understanding tasks on SOTA music datasets.,12B parameters (dense),"['LLaMA 2', 'Jukebox']",unknown,54 hours,4 80GB NVIDIA A40 GPUs,,open,Apache 2.0,,,,,,,,,,,,,, +model,Notus,Argilla,"Notus is an open source LLM, fine-tuned using Direct Preference Optimization (DPO) and AIF (AI Feedback) techniques.",2023-12-01,https://argilla.io/blog/notus7b/,https://huggingface.co/argilla/notus-7b-v1,text; text,Evaluated on MT-Bench and AlphaEval benchmarks.,7B parameters (dense),"['UltraFeedback', 'Zephyr']",unknown,unknown,8 x A100 40GB GPUs,,open,MIT,Academic research and free commercial usage,,,https://huggingface.co/argilla/notus-7b-v1/discussions,,,,,,,,,, +application,AI Dungeon,Latitude,"AI Dungeon is a single-player text adventure game that uses AI to generate content. +",2019-12-17,https://play.aidungeon.io,,,,,['OpenAI API'],,,,,limited,custom,,,,,,,https://play.aidungeon.io/main/termsOfService,,,,,,, +model,SALMONN,"ByteDance, Tsinghua University","SALMONN is a large language model (LLM) enabling speech, audio event, and music inputs.",2023-10-20,https://github.com/bytedance/SALMONN,https://huggingface.co/MSIIP/SALMONN,"audio, text; text","Evaluated on benchmarks pertaining to speech, music, and other audio recognition.",unknown,"['Whisper', 'BEATs', 'Vicuna']",unknown,unknown,unknown,,open,Apache 2.0,,,,https://huggingface.co/MSIIP/SALMONN/discussions,,,,,,,,,, +model,SDXL-Lightning,ByteDance,"SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps. The models are distilled from stabilityai/stable-diffusion-xl-base-1.0. This repository contains checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models.",2024-02-21,https://arxiv.org/pdf/2402.13929.pdf,https://huggingface.co/ByteDance/SDXL-Lightning,text; image,Evaluated via qualitative comparison relative to other SoTA image generation models.,unknown,['Stable Diffusion XL'],unknown,unknown,64 A100 80G GPUs,unknown,open,OpenRail++,"The model can be used for fast, high-quality text-to-image generation. It supports 1-step, 2-step, 4-step, and 8-step distilled models which provide varying generation quality.",unknown,unknown,https://huggingface.co/ByteDance/SDXL-Lightning/discussions,,,,,,,,,, +model,Lemur,OpenLemur,Lemur is an openly accessible language model optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.,2023-10-10,https://arxiv.org/pdf/2310.06830.pdf,https://huggingface.co/OpenLemur/lemur-70b-v1,"code, text; code, text",Evaluated on text and code benchmarks in comparison to other models.,70B parameters (dense),"['LLaMA 2', 'The Stack', 'RefinedWeb', 'RedPajama', 'Common Crawl', 'Wikipedia', 'ArXiv']",unknown,unknown,TPUv4-512 pod,,open,LLaMA2,,,,https://huggingface.co/OpenLemur/lemur-70b-v1/discussions,,,,,,,,,, +model,Lemur-Chat,OpenLemur,Lemur-Chat is an openly accessible language model optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.,2023-10-10,https://arxiv.org/pdf/2310.06830.pdf,https://huggingface.co/OpenLemur/lemur-70b-chat-v1,text; text,Evaluated on text and code benchmarks in comparison to other models.,70B parameters (dense),"['Lemur', 'OpenAssistant 1', 'OpenOrca', 'ShareGPT & ChatLogs', 'Evol-CodeAlpaca data']",unknown,unknown,unknown,,open,CC-BY-NC-4.0,,,,https://huggingface.co/OpenLemur/lemur-70b-chat-v1/discussions,,,,,,,,,, +model,MM1,Apple,"MM1 is a family of multimodal models, including both dense variants up to 30B and mixture-of-experts (MoE) variants up to 64B.",2024-03-16,https://arxiv.org/pdf/2403.09611.pdf,,"image, text; text",Evaluated on image captioning and visual question answering across many benchmarks.,30B parameters (dense),[],unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +model,OpenELM,Apple,"OpenELM is a family of Open-source Efficient Language Models. It uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy.",2024-04-24,https://machinelearning.apple.com/research/openelm,https://huggingface.co/apple/OpenELM-3B-Instruct,text; text,"The models were evaluated in terms of zero-shot, LLM360, and OpenLLM leaderboard results.",3B parameters,"['RefinedWeb', 'The Pile', 'RedPajama-Data', 'Dolma', 'CoreNet library']",unknown,unknown,unknown,unknown,open,Apple,To empower and enrich the open research community by providing access to state-of-the-art language models.,"No explicit prohibited uses stated, though it is noted that users should undertake thorough safety testing.",,https://huggingface.co/apple/OpenELM-3B-Instruct/discussions,,,,,,,,,, +dataset,Jurassic-1 dataset,AI21 Labs,"The dataset used to train the Jurassic-1 models, based on publicly available data.",2021-08-11,https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf,,text,,300B tokens,[],,,,,closed,unknown,unknown,,,,,,,,,,,[],unknown,unknown +dataset,Jurassic-1 Instruct dataset,AI21 Labs,The dataset used to instruction-tune the Jurassic-1 Instruct models.,2022-12-01,,,text,unknown,unknown,[],,,,unknown,closed,unknown,unknown,unknown,,,,,,,,,,[],unknown,unknown +model,Jurassic-1,AI21 Labs,"Jurassic-1 is a family of autoregressive language models (Large, Grande, Jumbo).",2021-08-11,https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf,,text,"Evaluated on several standard benchmarks (e.g. ARC, BoolQ, HellaSwag, RTE, Winogrande)",178B parameters (dense),['Jurassic-1 dataset'],unknown,Several months,Over 800 A100 GPUs,,limited,unknown,unknown,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",,,,,,,,,,,, +model,Jurassic-1 Instruct,AI21 Labs,Jurassic-1 Instruct is an instruction-tuned autoregressive language model.,2022-12-01,https://docs.ai21.com/docs/jurassic-1-instruct-beta,,text; text,,17B parameters (dense),"['Jurassic-1', 'Jurassic-1 Instruct dataset']",unknown,unknown,unknown,unknown,limited,unknown,"Jurassic-1 Instruct was trained specifically to handle instructions-only prompts (""zero-shot"") without examples (""few-shot""). It is the most natural way to interact with language models, and it is the best way to get a sense of the optimal output for your task without any examples.","Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",,,,,,,,,,,, +model,Jurassic-2,AI21 Labs,Jurassic-2 is a family of language models designed to replace Jurassic-1.,2023-03-09,https://docs.ai21.com/docs/jurassic-2-models,,text; text,The model was evaluated on the HELM benchmark as discussed in https://www.ai21.com/blog/introducing-j2.,unknown,[],unknown,unknown,unknown,unknown,limited,unknown,,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,,,,,,,,,,, +application,AI21 Playground,AI21 Labs,The AI21 Labs Playground supports several task-specific APIs in addition to a variety of models.,2021-08-11,https://studio.ai21.com/playground/,,,,,"['Jurassic-1', 'Jurassic-1 Instruct', 'Jurassic-2', 'AI21 Summarization API', 'AI21 Paraphrase API']",,,,unknown,limited,,"The intended uses are text completion, rewriting, and summarization.","Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,Feedback can be given by emailing at info at ai21.com,The Playground provides direct access to the language models (Complete API) as well as wrapped for Rewrite and Summarize.,text,https://www.ai21.com/terms-of-use,unknown,unknown,unknown,,,, +application,AI21 Paraphrase API,AI21 Labs,AI21 Studio's Paraphrase API offers access to our world-class paraphrasing engine. It has been specifically developed for suggesting alternative ways to convey the same message using different words.,2023-03-09,https://docs.ai21.com/docs/paraphrase-api,,,,,['Jurassic-2'],,,,unknown,limited,,The intended uses are text paraphrasing.,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,Feedback can be given by emailing at info at ai21.com,The AI21 language models are further specialized to the task of paraphrasing.,text,https://www.ai21.com/terms-of-use,unknown,unknown,unknown,,,, +application,AI21 Summarization API,AI21 Labs,AI21 Studio's Summarize API offers access to our world-class summarization engine. It has been specifically developed for reading long texts and providing a faithful summary of the original document.,2023-03-09,https://docs.ai21.com/docs/summarize-api,,,,,['Jurassic-2'],,,,unknown,limited,,The intended uses are text paraphrasing.,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,Feedback can be given by emailing at info at ai21.com,The AI21 language models are further specialized to the task of summarization.,text,https://www.ai21.com/terms-of-use,unknown,unknown,unknown,,,, +application,Wordtune,AI21 Labs,"Wordtune, the first AI-based writing companion that understands context and meaning.",2020-10-27,https://www.wordtune.com/,,,,,['AI21 Paraphrase API'],,,,unknown,limited,Wordtune License,The Wordtune assistant is a writing assistant,,unknown,,unknown,text,https://www.wordtune.com/terms-of-use,unknown,unknown,unknown,,,, +application,Wordtune Read,AI21 Labs,"Wordtune Read is an AI reader that summarizes long documents so you can understand more, faster.",2021-11-16,https://www.wordtune.com/read,,,,,['AI21 Summarize API'],,,,unknown,limited,Wordtune License,,,unknown,,unknown,text,https://www.wordtune.com/terms-of-use,unknown,unknown,unknown,,,, +model,Jamba,AI21 Labs,"Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. Jamba is the world’s first production-grade Mamba based model.",2024-03-28,https://www.ai21.com/blog/announcing-jamba,https://huggingface.co/ai21labs/Jamba-v0.1,text; text,Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.,52B parameters (sparse),[],unknown,unknown,unknown,,open,Apache 2.0,"intended for use as a foundation layer for fine tuning, training",,,https://huggingface.co/ai21labs/Jamba-v0.1/discussions,,,,,,,,,, +model,Amber,LLM360,"Amber is the first model in the LLM360 family, an initiative for comprehensive and fully open-sourced LLMs, where all training details, model checkpoints, intermediate results, and additional analyses are made available to the community.",2023-12-12,https://www.llm360.ai/,https://huggingface.co/LLM360/Amber,text; text,Evaluated on several benchmark LLM tasks,7B parameters (dense),"['Arxiv', 'Books', 'C4', 'RefinedWeb', 'StarCoder', 'StackExchange', 'Wikipedia']",unknown,unknown,"56 DGX A100 nodes, each equipped with 4 80GB A100 GPUs",,open,Apache 2.0,to support open and collaborative AI research by making the full LLM training process transparent.,,unknown,https://huggingface.co/LLM360/Amber/discussions,,,,,,,,,, +model,CrystalCoder,LLM360,CrystalCoder is a language model with a balance of code and text data that follows the initiative under LLM360 of its training process being fully transparent.,2023-12-12,https://www.llm360.ai/,https://huggingface.co/LLM360/CrystalCoder,"text; code, text","Evaluated on English and coding tasks and benchmarks, and outperforms LLaMA 2 in some.",7B parameters (dense),"['SlimPajama', 'StarCoder']",unknown,unknown,"Trained on the Cerebras Condor Galaxy 1 (CG-1), a 4 exaFLOPS, 54 million core, 64-node cloud AI supercomputer.",,open,Apache 2.0,to support open and collaborative AI research by making the full LLM training process transparent.,,unknown,https://huggingface.co/LLM360/CrystalCoder/discussions,,,,,,,,,, +model,K2,LLM360,"K2 is a 65 billion parameter large language model inspired by the Llama 2 65B model. The model is also supported with a suite of research tools, tutorials and step-by-step guides for learning pre-training and fine-tuning techniques.",2024-05-29,https://www.llm360.ai/paper2.pdf,https://huggingface.co/LLM360/K2,text; text,"Evaluated on the LLM360 Performance and Evaluation Collection that checks standard best practice benchmarks, medical, math, and coding knowledge.",65B parameters,[],unknown,unknown,unknown,unknown,open,Apache 2.0,The model is intended for learning pre-training techniques or enhancing research capabilities in large language models.,unknown,unknown,https://huggingface.co/LLM360/K2/discussions,,,,,,,,,, +model,GreenBit LLaMA,GreenBit AI,GreenBit LLaMA is a series of fine-tuned LLaMA models.,2023-09-29,https://github.com/GreenBitAI/low_bit_llama,https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8,text; text,Evaluated on common LLM benchmarks.,30B parameters (dense),['LLaMA'],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8/discussions,,,,,,,,,, +model,Reka Flash,Reka,"Reka Flash is a multimodal, multilingual, state-of-the-art 21B model trained entirely from scratch.",2024-02-12,https://reka.ai/reka-flash-an-efficient-and-capable-multimodal-language-model/,,"image, text, video; text","Evaluated on MMLU, GSM8K, HumanEval, and GPQA benchmarks, among others.",21B parameters (dense),[],unknown,unknown,unknown,,limited,unknown,,,unknown,,,,,,,,,,, +model,Reka Core,Reka,"Reka Core is a frontier-class multimodal language model comparable to industry leaders. It has powerful capabilities including multimodal understanding (including images, videos, and audio), superb reasoning abilities, code generation, and multilinguality with proficiency in 32 languages.",2024-04-15,https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model,,"audio, image, text, video; text","Reka Core was evaluated against leading models such as OpenAIs GPT-4, Claude-3 Opus, and Gemini Ultra on a variety of tasks and metrics including multimodal and human evaluation conducted by a third party. It was found to be competitive or even surpassing these models.",unknown,[],unknown,few months,thousands of GPUs,,limited,unknown,"Reka Core can be used in e-commerce, social media, digital content and video games, healthcare, robotics, and other industries for tasks that require multimodal understanding, coding, complex reasoning, and more.",unknown,unknown,unknown,,,,,,,,,, +model,ACT-1,Adept,ACT-1 (ACtion Transformer) is a large-scale transformer model designed and trained specifically for taking actions on computers (use software tools APIs and websites) in response to the user's natural language commands.,2022-09-14,https://www.adept.ai/blog/act-1,,text; text,,,[],unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +model,Persimmon,Adept,"Persimmon is the most capable open-source, fully permissive model with fewer than 10 billion parameters, as of its release date.",2023-09-07,https://www.adept.ai/blog/persimmon-8b,,text; text,"Evaluated in comparison to LLaMA 2 and MPT Instruct, and outperforms both on standard benchmarks.",8B parameters (dense),[],,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,Fuyu,Adept,Fuyu is a small version of the multimodal model that powers Adept's core product.,2023-10-17,https://www.adept.ai/blog/fuyu-8b,https://huggingface.co/adept/fuyu-8b,"image, text; text",Evaluated on standard image understanding benchmarks.,8B parameters (dense),[],unknown,unknown,unknown,,open,CC-BY-NC-4.0,The model is intended for research purposes only.,"The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.",,https://huggingface.co/adept/fuyu-8b/discussions,,,,,,,,,, +model,Fuyu Heavy,Adept,Fuyu Heavy is a new multimodal model designed specifically for digital agents.,2024-01-24,https://www.adept.ai/blog/adept-fuyu-heavy,,"image, text; text","Evaluated on the MMLU, GSM8K, MATH, and HumanEval benchmarks. According to these benchmarks, Fuyu-Heavy is, as of release, the strongest multimodal model trained outside of Google or OpenAI.",unknown,[],unknown,unknown,unknown,,closed,unknown,unknown,,,,,,,,,,,,, +model,Grok-1,xAI,"Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy,",2023-11-04,https://grok.x.ai/,https://x.ai/model-card/,text; text,Grok-1 was evaluated on a range of reasoning benchmark tasks and on curated foreign mathematic examination questions.,314B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,"Grok-1 is intended to be used as the engine behind Grok for natural language processing tasks including question answering, information retrieval, creative writing and coding assistance.",,unknown,,,,,,,,,,, +model,Grok-1.5V,xAI,"Grok-1.5V is a first-generation multimodal model which can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.",2024-04-12,https://x.ai/blog/grok-1.5v,,"image, text; text","The model is evaluated in a zero-shot setting without chain-of-thought prompting. The evaluation domains include multi-disciplinary reasoning, understanding documents, science diagrams, charts, screenshots, photographs and real-world spatial understanding. The model shows competitive performance with existing frontier multimodal models.",unknown,[],unknown,unknown,unknown,,limited,unknown,"Grok-1.5V can be used for understanding documents, science diagrams, charts, screenshots, photographs. It can also translate diagrams into Python code.",unknown,unknown,,,,,,,,,,, +model,SauerkrautLM,VAGO Solutions,SauerkrautLM is a German language model merged from two Mistral derivatives.,2023-11-28,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO,text; text,Evaluated on standard benchmarks in comparison to other German language models.,7B parameters (dense),"['OpenHermes 2.5 Mistral', 'OpenOrca Mistral']",unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO/discussions,,,,,,,,,, +model,Megatron-LM,NVIDIA,Megatron-LM is an autoregressive language model,2021-04-09,https://arxiv.org/abs/2104.04473,,text; text,,1T parameters (dense),[],unknown,84 days,3072 A100 GPUs,unknown,closed,unknown,,,,,,,,,,,,,, +dataset,MineDojo,NVIDIA,,2022-06-17,https://arxiv.org/abs/2206.08853,,"text, video",,"730k videos, 6k Wikipedia pages, 340k reddit posts","['YouTube', 'Wikipedia', 'Reddit']",,,,,open,MIT,,,,,,,,,,,,[],, +dataset,VIMA dataset,"NVIDIA, Stanford",,2022-10-06,https://vimalabs.github.io/,,"image, text",,200M parameters (dense model),"['T5', 'Mask R-CNN', 'VIMA dataset']",,,,,open,MIT,,,,,,,,,,,,[],, +model,VIMA,"NVIDIA, Stanford",,2022-10-06,https://vimalabs.github.io/,,"image, text; robotics trajectories",,200M parameters (dense),[],,,,,open,MIT,,,,,,,,,,,,,, +model,Nemotron 4,Nvidia,Nemotron 4 is a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens.,2024-02-27,https://arxiv.org/pdf/2402.16819.pdf,,"text; code, text","Evaluated on standard LLM benchmarks across a range of fields like reasoning, code generation, and mathematical skills.",15B parameters (dense),[],unknown,13 days,3072 H100 80GB SXM5 GPUs across 384 DGX H100 nodes,Deduplication and quality filtering techniques are applied to the training dataset.,open,unknown,,,unknown,,,,,,,,,,, +application,Cformers,Nolano,Cformers is a set of transformers that act as an API for AI inference in code.,2023-03-19,https://www.nolano.org/services/Cformers/,,,,,[],,,,,limited,MIT,,,,,,,,,,,,,, +dataset,HowTo100M,"École Normale Supérieure, Inria","HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of 136M video clips with captions sourced from 1.2M Youtube videos (15 years of video) and 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness.",2019-06-07,https://arxiv.org/pdf/1906.03327.pdf,,"text, video","Authors use the dataset to learn a joint text-video embedding by leveraging more than 130M video clip-caption pairs. They then evaluate the learned embeddings on the tasks of localizing steps in instructional videos of CrossTask and textbased video retrieval on YouCook2, MSR-VTT and LSMDC datasets. They show that their learned embedding can perform better compared to models trained on existing carefully annotated but smaller video description datasets.",136M video clips,['YouTube'],,,,,open,Apache 2.0,,"No uses are explicitly prohibited by the authors. They note the following limitations of the dataset: ""We note that the distribution of identities and activities in the HowTo100M dataset may not be representative of the global human population and the diversity in society. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data."" +",,,,,,,,,https://arxiv.org/pdf/1906.03327.pdf,[],"The dataset features 1.22 million videos from YouTube with a primary focus on videos containing ""visual tasks"", that involve some interaction with the physical world (e.g. Making peanut butter, Pruning a tree) as compared to others that are more abstract (e.g. Ending a toxic relationship, Choosing a gift). To obtain predominantly visual tasks, the authors limit them to one of 12 categories (Food and Entertaining, Home and Garden, Hobbies and Crafts, Cars & Other Vehicles, Pets and Animals, Holidays and Traditions, Personal Care and Style, Sports and Fitness, Health, Education and Communications, Arts and Entertainment, Computers and Electronics). They also restrict to the top 200 YouTube search results, as the latter ones may not be related to the query task.","Categories such as Relationships and Finance and Business, that may be more abstract, are excluded. Videos with less than 100 views are removed. Authors also ignore videos that have less than 100 words. Videos longer than 2,000 seconds are removed. As some videos may appear in several tasks, the videos are deduplicated based on YouTube IDs." +model,360 Zhinao,360 Security,360 Zhinao is a multilingual LLM in Chinese and English with chat capabilities.,2024-05-23,https://arxiv.org/pdf/2405.13386,,text; text,"Achieved competitive performance on relevant benchmarks against other 7B models in Chinese, English, and coding tasks.",7B parameters,[],unknown,unknown,unknwon,,open,unknown,,,,,,,,,,,,,, +model,Xwin-LM,Xwin,"Xwin-LM is a LLM, which on release, ranked top 1 on AlpacaEval, becoming the first to surpass GPT-4 on this benchmark.",2023-09-20,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1,text; text,Evaluated on AlpacaEval benchmark against SOTA LLMs.,70B parameters (dense),[],unknown,unknown,unknown,,open,LLaMA2,,,,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions,,,,,,,,,, +application,HyperWrite,OthersideAI,"HyperWrite is a writing assistant that generates text based on a user's request, as well as style and tone choices. +",,https://hyperwriteai.com/,,,,,['OpenAI API'],,,,unknown,limited,custom,"HyperWrite is intended to be used as a writing assistant. +",unknown,unknown,unknown,unknown,Generation,https://hyperwriteai.com/terms,unknown,unknown,unknown,,,, +model,Dolly,Databricks,"""Databricks’ Dolly, a large language model trained on the Databricks + Machine Learning Platform, demonstrates that a two-years-old open source + model (GPT-J) can, when subjected to just 30 minutes of fine tuning on a + focused corpus of 50k records (Stanford Alpaca), exhibit surprisingly + high quality instruction following behavior not characteristic of the + foundation model on which it is based."" + [[Dolly Repository]](https://github.com/databrickslabs/dolly). +",2023-03-24,https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html,,text; text,"""We evaluated Dolly on the instruction-following capabilities described in the InstructGPT paper that ChatGPT is based on and found that it exhibits many of the same qualitative capabilities, including text generation, brainstorming and open Q&A."" [[Databricks Blog Post]] (https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html). +",6B parameters (dense),"['GPT-J', 'Alpaca dataset']",unknown,30 minutes,A single NDasrA100_v4 machine with 8x A100 40GB GPUs,,open,Apache 2.0,"""Dolly is intended exclusively for research purposes and is not licensed for commercial use."" [[Limitations]](https://github.com/databrickslabs/dolly#limitations). +","Authors note the following limitations of the model: ""The Dolly model family is under active development, and so any list of shortcomings is unlikely to be exhaustive, but we include known limitations and misfires here as a means to document and share our preliminary findings with the community. In particular, dolly-6b struggles with syntactically complex prompts, mathematical operations, factual errors, dates and times, open-ended question answering, hallucination, enumerating lists of specific length, and stylistic mimicry."" [[Limitations]](https://github.com/databrickslabs/dolly#limitations). +",,https://github.com/databrickslabs/dolly/issues,,,,,,,,,, +model,DBRX,Databricks,DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction by Databricks. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. DBRX only accepts text-based inputs and produces text-based outputs.,2024-03-27,https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,https://huggingface.co/databricks/dbrx-base,text; text,"DBRX outperforms established open-source and open-weight base models on the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval. Full evaluation details can be found in the corresponding technical blog post.",132B parameters (sparse),[],unknown,3 months,3072 NVIDIA H100s connected by 3.2Tbps Infiniband,Recommendations provided for retrieval augmented generation (RAG) in scenarios where accuracy and fidelity are important and additional testing around safety in the context of the specific application and domain is suggested.,open,Databricks Open Model License,"DBRX models are open, general-purpose LLMs intended and licensed for both commercial and research applications. They can be further fine-tuned for various domain-specific natural language and coding tasks.","DBRX models are not intended to be used out-of-the-box in non-English languages, and do not support native code execution, function calling or any use that violates applicable laws or regulations or is otherwise prohibited by the Databricks Open Model License and Databricks Open Model Acceptable Use Policy.",unknown,https://huggingface.co/databricks/dbrx-base/discussions,,,,,,,,,, +application,Reexpress One,Reexpress AI,"Reexpress One offers a means of document classification, semantic search, and uncertainty analysis on-device.",2023-03-21,https://re.express/index.html,,,,,[],,,,,limited,unknown,,,unknown,https://github.com/ReexpressAI/support,,data analyses,hhttps://re.express/tos.html,unknown,unknown,unknown,,,, +dataset,NaturalInstructions-v2,AI2,,2022-04-16,https://arxiv.org/abs/2204.07705,,NLP tasks,,1600 tasks,[],,,,,open,Apache 2.0,,,,,,,,,,,,[],, +dataset,SODA,AI2,"SODA is the first publicly available, million-scale, high-quality dialogue dataset covering a wide range of social interactions.",2023-05-24,https://arxiv.org/pdf/2212.10465.pdf,,text,"Randomly sampled dialogues from dataset are evaluated according to six established criteria of natural flow, context dependence, topic consistency, speaker consistency, specificity, and overall.",1.5M dialogues,[],,,,,open,CC BY 4.0,,,,,,,,,,,https://huggingface.co/datasets/allenai/soda,[],, +dataset,Multimodal C4,AI2,An augmentation of C4 with images added and made openly available.,2023-06-09,https://arxiv.org/pdf/2304.06939.pdf,,"image, text",Conducted experiments on models trained with Multimodal C4 in comparison to models trained on single image/caption datasets,43B English tokens with 101.2M documents and 571M images,['C4'],,,,,open,ODC-By,,,,,,,,,,,,[],, +model,COSMO,AI2,COSMO is a conversation agent with greater generalizability on both in- and out-of-domain chitchat datasets,2023-05-24,https://arxiv.org/pdf/2212.10465.pdf,https://huggingface.co/allenai/cosmo-xl,text; text,Evaluated by human testers on generalization capabilities and responses compared to other chatbots.,11B parameters (dense),"['SODA', 'ProsocialDialog', 'T5']",unknown,unknown,v3-128 TPU accelerators with batch size 256,,open,,,,,https://huggingface.co/allenai/cosmo-xl/discussions,,,,,,,,,, +dataset,Dolma,AI2,"Dolma is a dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials",2023-08-18,https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64,,text,Compared with other open and closed datasets in regards to size and quality control.,3T tokens,[],,,,,open,AI2 ImpACT,,,"Quality filtration, deduplication, and risk mitigation via logistic qualifiers and regular expressions used.",,,,,,,,https://drive.google.com/file/d/12gOf5I5RytsD159nSP7iim_5zN31FCXq/view,[],, +dataset,Tulu-V2-mix,AI2,Tulu-V2-mix is a dataset composed of many high-quality instruction datasets that results in stronger performance across a variety of reasoning and knowledge-probing tasks.,2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,,text,Models trained with dataset evaluated on downstream performance.,unknown,"['FLAN Collection', 'Open Assistant 1', 'ShareGPT', 'Alpaca dataset', 'Code Alpaca', 'LIMA', 'WizardLM', 'OpenOrca']",,,,,open,ODC-BY,,,,https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture/discussions,,,,,,,https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture,[],, +model,Tulu 2,AI2,Tulu 2 is a language model trained on the new Tulu-v2-mix dataset and fine-tuned on more state of the art language models.,2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,https://huggingface.co/allenai/tulu-2-70b,text; text,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,70B parameters (dense),"['LLaMA 2', 'Tulu-V2-mix']",unknown,unknown,unknown,,open,AI2 ImpACT,,,,https://huggingface.co/allenai/tulu-2-70b/discussions,,,,,,,,,, +model,Tulu 2 DPO,AI2,"Tulu 2 DPO is created in a similar manner to Tulu 2, but with Direct Preference Optimization (DPO).",2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,https://huggingface.co/allenai/tulu-2-dpo-70b,text; text,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,70B parameters (dense),"['LLaMA 2', 'Tulu-V2-mix']",unknown,unknown,unknown,,open,AI2 ImpACT,,,,https://huggingface.co/allenai/tulu-2-dpo-70b/discussions,,,,,,,,,, +model,Code Tulu 2,AI2,"Code Tulu 2 is a fine-tuned version of Code LLaMA that was trained on a mix of publicly available, synthetic and human datasets.",2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,https://huggingface.co/allenai/codetulu-2-13b,"text; code, text",Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,13B parameters (dense),"['Code LLaMA', 'Tulu-V2-mix']",unknown,unknown,unknown,,open,AI2 ImpACT,,,,https://huggingface.co/allenai/codetulu-2-13b/discussions,,,,,,,,,, +model,OLMo,AI2,"Open Language Model (OLMo) is designed to provide access to data, training code, models, and evaluation code necessary to advance AI through open research to empower academics and researchers to study the science of language models collectively.",2024-02-01,https://allenai.org/olmo/olmo-paper.pdf,https://huggingface.co/allenai/OLMo-7B,text; text,"Evaluated on standard LLM tasks and benchmarks in comparison to LLaMA, Falcon, and MPT, in addition to other same-sized models.",7B parameters (dense),['Dolma'],75.05 tCo2eq,unknown,"27 nodes, with each node containing 8x NVIDIA A100-40GB GPUs provided by MosaicML",training data from Dolma filtered and deduplicated before being trained on.,open,Apache 2.0,,,unknown,https://huggingface.co/allenai/OLMo-7B/discussions,,,,,,,,,, +dataset,MADLAD-400,AI2,"MADLAD-400 is a document-level multilingual dataset based on Common Crawl, covering 419 languages in total.",2023-09-09,https://arxiv.org/abs/2309.04662,,text,,3 trillion tokens,['Common Crawl'],,,,,open,CC BY 4.0,,,unknown,https://huggingface.co/datasets/allenai/MADLAD-400/discussions,,,,,,,https://huggingface.co/datasets/allenai/MADLAD-400,[],, +model,Inflection-1,Inflection AI,Inflection AI's first version of its in-house LLM. via Inflection AI's conversational API.,2023-06-22,https://inflection.ai/inflection-1,,text; text,"Evaluated on wide range of language benchmarks like MMLU 5-shot, GSM-8K, and HellaSwag 10-shot among others.",unknown,[],,,unknown,,limited,unknown,,,,,,,,,,,,,, +application,Pi,Inflection AI,Personal AI chatbot designed to be conversational and specialized in emotional intelligence.,2023-05-02,https://inflection.ai/press,,,,,['Inflection-2.5'],,,,,limited,unknown,to be used as a personal assistant chatbot for everyday activities,,,,,natural language text responses,,,,,,,, +model,Inflection-2,Inflection AI,"Inflection-2 is the best model in the world for its compute class and the second most capable LLM in the world, according to benchmark evaluation, as of its release.",2023-11-22,https://inflection.ai/inflection-2,,text; text,"Evaluated against state of the art models on benchmarks, and found to be most performant model outside of GPT-4.",unknown,[],unknown,unknown,5000 NVIDIA H100 GPUs,,closed,unknown,,,,,,,,,,,,,, +model,Inflection-2.5,Inflection AI,"Inflection-2.5 is an upgraded in-house model that is competitive with all the world's leading LLMs, as of release, like GPT-4 and Gemini.",2024-03-07,https://inflection.ai/inflection-2-5,,text; text,"Evaluated on standard LLM and technical benchmarks in comparison to Inflection-1 and GPT-4, along with advanced STEM examinations.",unknown,[],unknown,unknown,unknown,,limited,unknown,,,,,,,,,,,,,, +application,Transformify Automate,Transformify,Transformify Automate is a platform for automated task integration using natural language prompts.,2023-05-30,https://www.transformify.ai/automate,,,,,['GPT-4'],,,,,open,,,,,,,text and code,https://www.transformify.ai/legal-stuff,,,,,,, dataset,The Pile,EleutherAI,"A large language model training dataset, used to train GPT-NeoX-20B. -",2021-01-01,https://arxiv.org/pdf/2101.00027.pdf,https://arxiv.org/pdf/2201.07311.pdf,"code, text",825 GB,"['...pot trending topics and the coverage around them. First up, there’s a bit of a visual redesign. Previously, clicking on a trending topic would highlight a story from one publication, and you’d have to scroll down past a live video section to view related stories. Facebook is replacing that system with a simple carousel, which does a better job of showing you different coverage options. To be clear, the change doesn’t affect how stories are sourced, according to Facebook. It’s still the same algorithm pickin...', 'Total knee arthroplasty (TKA) is a promising treatment for endstage osteoarthritis (OA) of the knee for alleviating pain and restoring the function of the knee. Some of the cases with bilateral TKA are symptomatic, necessitating revision arthroplasty in both the knees. A bilateral revision TKA can be done ei', 'On the converse, the set-valued map $\\Phi:[0,3]\\rightrightarrows [0,3]$ $$\\Phi(x):=\\left\\{\\begin{array}{ll} \\{1\\} & \\mbox{ if } 0\\leq x<1\\\\ {}[1,2] & \\mbox{ if } 1\\leq x\\leq 2\\\\ \\{2\\} &', 'This Court thus uses the same interpretation of V.R.C.P. 52(a) as it did *487 under the previous statutory requirement found in 12 V.S.A. § 2385. In essense, the defendants urge that this Court should reconsider the case of Green Mountain Marble Co. v. Highway Board, supra, and follow the Federal practice of looking to the evide']","Analyses of the data's composition, document statistics, language/dialectal coverage, topical distribution, and biases are conducted are conducted in the paper [[The Pile Paper]](https://arxiv.org/pdf/2101.00027.pdf). -",[],"The Pile data come from 22 sources, with over half of the data being from Common Crawl (Pile-CC; 227GB), fiction and nonfiction books (Books3; 101GB), biomedical articles (PubMed Central; 90GB), and code (Github; 95 GB). Refer to the paper for full decomposition [[Table 1]](https://arxiv.org/pdf/2101.00027.pdf#table.caption.2). -","Authors report that they have excluded some datasets ""because they were too small to be worth spending time or because the English component of the data did not merit inclusion on its own. Three datasets were excluded for other reasons: (1) US Congressional Records were excluded because it ""reflects the opinions and biases of the political class over the past 200 years, including segregationism and xenophobia."" (2) Online Fanfiction resources amounting to Hundreds of GiB were excluded on logistical grounds. (3) Literotica, platform where users can upload short-form erotic fiction, was excluded because the authors decided to exclude fanfiction, the corpus would require significant investigation, and corpus contain significant amount of stereotyping [[Appendix B]](https://arxiv.org/pdf/2101.00027.pdf). -","In addition to the data inclusion and exclusion decisions, the quality was controlled through filtering for English (pycld2 language classifier), filtering for documents similar to OpenWebText2 (classifier on CommonCrawl), and several forms of deduplication as detailed in the paper [[Appendix C]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.C) [[Appendix D]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.D). +",2021-01-01,https://arxiv.org/pdf/2101.00027.pdf,,"code, text","Analyses of the data's composition, document statistics, language/dialectal coverage, topical distribution, and biases are conducted are conducted in the paper [[The Pile Paper]](https://arxiv.org/pdf/2101.00027.pdf). +",825 GB,[],,,,"In addition to the data inclusion and exclusion decisions, the quality was controlled through filtering for English (pycld2 language classifier), filtering for documents similar to OpenWebText2 (classifier on CommonCrawl), and several forms of deduplication as detailed in the paper [[Appendix C]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.C) [[Appendix D]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.D). ",open,MIT,"The Pile was intended to be used as a high quality large text dataset for language modeling tasks, explained in more detail in the paper [[Section 1]](https://arxiv.org/pdf/2101.00027.pdf#section.1). ",,,"Feedback can be given by emailing the authors at contact at eleuther.ai. -",,,,,,,,,, -model,GPT-J,EleutherAI,GPT-J is an open-source autoregressive language model.,2021-06-04,https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/,,text; text,6B parameters (dense),,,['The Pile'],,,,open,Apache 2.0,,,,,,,,TRC (Unspecified # of TPU v3-8s),,,,,, -model,GPT-Neo,EleutherAI,,2021-03-21,https://github.com/EleutherAI/gpt-neo,,text; text,2.7B parameters (dense),,,['The Pile'],,,,open,MIT,,,,,,,,,,,,,, +",,,,,,,https://arxiv.org/pdf/2201.07311.pdf,"['...pot trending topics and the coverage around them. First up, there’s a bit of a visual redesign. Previously, clicking on a trending topic would highlight a story from one publication, and you’d have to scroll down past a live video section to view related stories. Facebook is replacing that system with a simple carousel, which does a better job of showing you different coverage options. To be clear, the change doesn’t affect how stories are sourced, according to Facebook. It’s still the same algorithm pickin...', 'Total knee arthroplasty (TKA) is a promising treatment for endstage osteoarthritis (OA) of the knee for alleviating pain and restoring the function of the knee. Some of the cases with bilateral TKA are symptomatic, necessitating revision arthroplasty in both the knees. A bilateral revision TKA can be done ei', 'On the converse, the set-valued map $\\Phi:[0,3]\\rightrightarrows [0,3]$ $$\\Phi(x):=\\left\\{\\begin{array}{ll} \\{1\\} & \\mbox{ if } 0\\leq x<1\\\\ {}[1,2] & \\mbox{ if } 1\\leq x\\leq 2\\\\ \\{2\\} &', 'This Court thus uses the same interpretation of V.R.C.P. 52(a) as it did *487 under the previous statutory requirement found in 12 V.S.A. § 2385. In essense, the defendants urge that this Court should reconsider the case of Green Mountain Marble Co. v. Highway Board, supra, and follow the Federal practice of looking to the evide']","The Pile data come from 22 sources, with over half of the data being from Common Crawl (Pile-CC; 227GB), fiction and nonfiction books (Books3; 101GB), biomedical articles (PubMed Central; 90GB), and code (Github; 95 GB). Refer to the paper for full decomposition [[Table 1]](https://arxiv.org/pdf/2101.00027.pdf#table.caption.2). +","Authors report that they have excluded some datasets ""because they were too small to be worth spending time or because the English component of the data did not merit inclusion on its own. Three datasets were excluded for other reasons: (1) US Congressional Records were excluded because it ""reflects the opinions and biases of the political class over the past 200 years, including segregationism and xenophobia."" (2) Online Fanfiction resources amounting to Hundreds of GiB were excluded on logistical grounds. (3) Literotica, platform where users can upload short-form erotic fiction, was excluded because the authors decided to exclude fanfiction, the corpus would require significant investigation, and corpus contain significant amount of stereotyping [[Appendix B]](https://arxiv.org/pdf/2101.00027.pdf). +" +model,GPT-J,EleutherAI,GPT-J is an open-source autoregressive language model.,2021-06-04,https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/,,text; text,,6B parameters (dense),['The Pile'],,,TRC (Unspecified # of TPU v3-8s),,open,Apache 2.0,,,,,,,,,,,,,, +model,GPT-Neo,EleutherAI,,2021-03-21,https://github.com/EleutherAI/gpt-neo,,text; text,,2.7B parameters (dense),['The Pile'],,,,,open,MIT,,,,,,,,,,,,,, model,GPT-NeoX,EleutherAI,"GPT-NeoX (20B) is an open-sourced autoregressive language model. -",2022-02-02,http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf,,"text; text, code",20B parameters (dense),,"The model was evaluated on standard NLP benchmarks: LAMBADA, ANLI, HellaSwag, MMLU among others [[Section 4]](http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf#section.4). -",['The Pile'],,,,open,Apache 2.0,"As stated in the model card: ""GPT-NeoX-20B learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt. Due to the generality of the pretraining set, it has acquired the ability to generate completions across a wide range of tasks - from programming to fiction writing [[Model Card]](https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md)."" +",2022-02-02,http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf,https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md,"text; text, code","The model was evaluated on standard NLP benchmarks: LAMBADA, ANLI, HellaSwag, MMLU among others [[Section 4]](http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf#section.4). +",20B parameters (dense),['The Pile'],31.73 tCO2e,47.10 petaflop/s-day,12 x 8 A100 GPUs,,open,Apache 2.0,"As stated in the model card: ""GPT-NeoX-20B learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt. Due to the generality of the pretraining set, it has acquired the ability to generate completions across a wide range of tasks - from programming to fiction writing [[Model Card]](https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md)."" ",,,"Feedback can be provided using the # 20b channel in EleutherAI Discord group [[EleutherAI Blog Post]](https://blog.eleuther.ai/announcing-20b/). Find the Discord link in the FAQ page [[FAQ]](https://www.eleuther.ai/faq/). -",https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md,31.73 tCO2e,47.10 petaflop/s-day,12 x 8 A100 GPUs,,,,,, +",,,,,,,,,, application,GooseAI API,GooseAI,"GooseAI API is an API service providing access to NLP services. -",,goose.ai,,,,,,['GPT-NeoX'],,,unknown,limited,custom,"Intended to be used as an NLP infrastructure. +",,goose.ai,,,,,['GPT-NeoX'],,,,unknown,limited,custom,"Intended to be used as an NLP infrastructure. ","Illegal or abusive activity, security violations, network abuse -",At will monitoring by the provider,Email support,,,,,unknown,"Text Generation, Text Completion",https://goose.ai/docs/tos,unknown,unknown,unknown -model,VQGAN-CLIP,EleutherAI,VQGAN-CLIP is a model that better generates and edits images using a multimodal encoder to guide image generation.,2022-09-04,https://arxiv.org/pdf/2204.08583.pdf,,text; image,227M parameters (dense),,"Evaluated by human testers rating alignment of text input, image output pairs.","['VQGAN', 'CLIP']",,,,open,MIT,,,,,,unknown,Less than 1 V100-hour,1 NVIDIA Tesla K80 GPU,,,,,, -model,Pythia,Eleuther AI,A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters,2023-05-31,https://arxiv.org/pdf/2304.01373.pdf,,text; text,12B parameters (dense),,Evaluated on a variety of NLP benchmarks and found to perform similarly to OPT and BLOOM models.,['The Pile'],,,,open,Apache 2.0,,,,https://huggingface.co/EleutherAI/pythia-6.9b/discussions,https://huggingface.co/EleutherAI/pythia-12b,,,64 A100 GPUs,,,,,, -model,Llemma,"Princeton University, Eleuther AI",Llemma is a large language model for mathematics.,2023-10-16,https://arxiv.org/pdf/2310.10631.pdf,,text; text,34B parameters (dense),,Evaluated on math benchmarks in comparison to general large language models.,"['Proof Pile 2', 'Code LLaMA']",,,,open,LLaMA 2,,,,https://huggingface.co/EleutherAI/llemma_34b/discussions,https://huggingface.co/EleutherAI/llemma_34b,unknown,47k A100 hours,256 A100 40GB GPUs,,,,,, -dataset,Proof Pile 2,"Princeton University, Eleuther AI",Proof Pile 2 is a corpus for language modeling of mathematics.,2023-10-16,https://arxiv.org/pdf/2310.10631.pdf,https://huggingface.co/datasets/EleutherAI/proof-pile-2,text,55B tokens,[],,"['Common Crawl', 'OpenWebMath', 'Arxiv', 'RedPajama-Data']","scientific papers, web data containing mathematics, mathematical code",,,open,MIT,,,,https://huggingface.co/datasets/EleutherAI/proof-pile-2/discussions,,,,,,,,,, -model,Pile-T5,EleutherAI,"Pile-T5 is a version of the broadly used T5 model, but improved to eliminate weaknesses such as the omission of crucial code-related tokens. It utilizes LLaMA tokenizer and is trained on the Pile, offering enhancements for finetuning on downstream tasks, particularly those involving code.",2024-04-15,https://blog.eleuther.ai/pile-t5/,,text; text,unknown,,"The models were evaluated on SuperGLUE, CodeXGLUE, as well as MMLU and Bigbench Hard. Comparisons were made with T5v1.1 and found that Pile-T5 models performed better in most conditions.","['The Pile', 'T5x', 'LLaMA', 'umT5']",,,,open,unknown,The model is aimed at downstream tasks that benefit from the encoder-decoder architecture. Particularly useful for tasks involving code.,unknown,unknown,unknown,,unknown,2 million steps,unknown,,,,,, -application,Virtual Volunteer,Be My Eyes,The first-ever digital visual assistant powered by OpenAI’s new GPT-4 language model.,2023-03-14,https://www.bemyeyes.com/blog/introducing-be-my-eyes-virtual-volunteer,,,,,,['GPT-4 API'],,,,limited,unknown,,,,,,,,,,,,,, -model,CodeGeeX,Tsinghua University,CodeGeeX is an autoregressive language model trained on code,2022-09-20,https://github.com/THUDM/CodeGeeX,,text; code,13B parameters (dense),,,[],,,,limited,Apache 2.0,,,,,,unknown,unknown,THUDM 1536 Ascend 910 (32GB) Cluster,,,,,, -model,CogView,Tsinghua University,CogView is a transformer model for text-to-image generation,2021-05-26,https://arxiv.org/abs/2105.13290,,text; image,4B parameters (dense),,,[],,,,open,Apache 2.0,,,,,,,,,,,,,, -model,CogView 2,Tsinghua University,CogView 2 is a hierarchical transformer for text-to-image generation,2022-04-28,https://arxiv.org/abs/2204.14217,,text; image,6B parameters (dense),,,[],,,,open,Apache 2.0,,,,,,,,,,,,,, -model,CogVideo,Tsinghua University,CogVideo is a transformer model for text-to-video generation,2022-05-29,https://arxiv.org/abs/2205.15868,,text; video,unknown,,,[],,,,open,Apache 2.0,,,,,,,,,,,,,, -model,GLM-130B,Tsinghua University,GLM-130B is a bidirectional language model trained on English and Chinese,2022-08-04,https://keg.cs.tsinghua.edu.cn/glm-130b/,,text; text,130B parameters (dense),,,"['The Pile', 'GLM-130B Chinese corpora', 'P3', 'DeepStruct finetuning dataset']",,,,open,GLM-130B License,,,,,,,,THUDM 96 DGX-A100 (40G) cluster,,,,,, -model,CogVLM,"Zhipu AI, Tsinghua University",CogVLM is a powerful open-source visual language foundation model,2023-11-06,https://arxiv.org/pdf/2311.03079.pdf,,"image, text; text",17B parameters (dense),,Evaluated on image captioning and visual question answering benchmarks.,"['Vicuna', 'CLIP']",,,,open,custom,Future multimodal research,,,,,unknown,4096 A100 days,unknown,,,,,, -model,UltraLM,Tsinghua University,UltraLM is a series of chat language models trained on UltraChat.,2023-06-27,https://github.com/thunlp/UltraChat#UltraLM,,text; text,13B parameters (dense),,Evaluated on AlpacaEval Leaderboard benchmarks.,['UltraChat'],,,,open,LLaMA 2,,,unknown,https://huggingface.co/openbmb/UltraLM-13b/discussions,https://huggingface.co/openbmb/UltraLM-13b,unknown,unknown,unknown,,,,,, -dataset,UltraChat,Tsinghua University,"UltraChat is an open-source, large-scale, and multi-round dialogue data powered by Turbo APIs.",2023-04-20,https://github.com/thunlp/UltraChat,https://huggingface.co/datasets/stingning/ultrachat,text,unknown,[],UltraLM evaluated off of UltraChat is evaluated on standard LLM benchmarks.,[],"Dialogue data of questions about the world, writing and creation tasks, and questions on existing materials.",,,open,MIT,,,unknown,https://huggingface.co/datasets/stingning/ultrachat/discussions,,,,,,,,,, -model,PolyCoder,Carnegie Mellon University,"PolyCoder is a code model trained on 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine.",2022-02-26,https://arxiv.org/abs/2202.13169,,code,2.7B parameters (dense),,Reports results on standard code benchmarks across a variety of programming languages.,['Github'],,,"No specific quality control is mentioned in model training, though details on data processing and how the tokenizer was trained are provided in the paper.",open,MIT,unknown,None,None,https://huggingface.co/NinedayWang/PolyCoder-2.7B/discussion,https://huggingface.co/NinedayWang/PolyCoder-2.7B,unknown,6 weeks,8 NVIDIA RTX 8000,,,,,, -model,Moment,"Carnegie Mellon University, University of Pennsylvania",Moment is a family of open-source foundation models for general-purpose time-series analysis.,2024-02-06,https://arxiv.org/pdf/2402.03885.pdf,,,385M parameters (dense),,Evaluated on nascent time-series datasets and benchmarks.,[],,,,open,unknown,,,unknown,,,unknown,unknown,Single A6000 GPU,,,,,, -model,OpenAssistant LLaMA 2,OpenAssistant,OpenAssistant LLaMA 2 is an Open-Assistant fine-tuning of Meta's LLaMA 2.,2023-08-23,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10,,text; text,70B parameters (dense),,,['LLaMA 2'],,,,open,LLaMA 2,,,unknown,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10/discussions,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10,unknown,unknown,unknown,,,,,, -model,Inflection-1,Inflection AI,Inflection AI's first version of its in-house LLM. via Inflection AI's conversational API.,2023-06-22,https://inflection.ai/inflection-1,,text; text,unknown,,"Evaluated on wide range of language benchmarks like MMLU 5-shot, GSM-8K, and HellaSwag 10-shot among others.",[],,,,limited,unknown,,,,,,,,unknown,,,,,, -application,Pi,Inflection AI,Personal AI chatbot designed to be conversational and specialized in emotional intelligence.,2023-05-02,https://inflection.ai/press,,,,,,['Inflection-2.5'],,,,limited,unknown,to be used as a personal assistant chatbot for everyday activities,,,,,,,,,natural language text responses,,,, -model,Inflection-2,Inflection AI,"Inflection-2 is the best model in the world for its compute class and the second most capable LLM in the world, according to benchmark evaluation, as of its release.",2023-11-22,https://inflection.ai/inflection-2,,text; text,unknown,,"Evaluated against state of the art models on benchmarks, and found to be most performant model outside of GPT-4.",[],,,,closed,unknown,,,,,,unknown,unknown,5000 NVIDIA H100 GPUs,,,,,, -model,Inflection-2.5,Inflection AI,"Inflection-2.5 is an upgraded in-house model that is competitive with all the world's leading LLMs, as of release, like GPT-4 and Gemini.",2024-03-07,https://inflection.ai/inflection-2-5,,text; text,unknown,,"Evaluated on standard LLM and technical benchmarks in comparison to Inflection-1 and GPT-4, along with advanced STEM examinations.",[],,,,limited,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,RWKV World 4,RWKV,"RWKV World 4 is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).",2023-05-03,https://huggingface.co/RWKV/rwkv-4-world-7b,,text; text,7B parameters (dense),,,[],,,,open,Apache 2.0,,,unknown,https://huggingface.co/RWKV/rwkv-4-world-7b/discussions,https://huggingface.co/RWKV/rwkv-4-world-7b,unknown,unknown,unknown,,,,,, -model,RWKV 4 Pile,RWKV,"RWKV 4 Pile is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).",2023-05-15,https://huggingface.co/RWKV/rwkv-4-14b-pile,,text; text,14B parameters (dense),,,[],,,,open,Apache 2.0,,,unknown,https://huggingface.co/RWKV/rwkv-4-14b-pile/discussions,https://huggingface.co/RWKV/rwkv-4-14b-pile,unknown,unknown,unknown,,,,,, -model,RWKV World 5,RWKV,"RWKV World 5 is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).",2023-12-16,https://huggingface.co/RWKV/rwkv-5-world-3b,,text; text,3B parameters (dense),,,[],,,,open,Apache 2.0,,,unknown,https://huggingface.co/RWKV/rwkv-5-world-3b/discussions,https://huggingface.co/RWKV/rwkv-5-world-3b,unknown,unknown,unknown,,,,,, -model,ERNIE 3.0 Titan,"Baidu, PengCheng Laboratory",ERNIE 3.0 Titan is a language model,2021-12-23,https://arxiv.org/abs/2112.12731,,text; text,260B parameters (dense),,,[],,,,closed,unknown,unknown,unknown,,,,unknown,unknown,"Baidu V100 Cluster, PengCheng Lab Ascend 910 NPU cluster",,,,,, -model,ERNIE-ViLG,Baidu,ERNIE-ViLG is a model for text-to-image generation,2021-12-31,https://arxiv.org/abs/2112.15283,,text; image,10B parameters (dense),,,[],,,,limited,,unknown,unknown,,,,unknown,unknown,unknown,,,,,, -model,ERNIE-ViLG 2.0,Baidu,ERNIE-ViLG is a model for text-to-image generation,2022-10-27,https://arxiv.org/abs/2210.15257,,text; image,10B parameters (dense),,,[],,,,closed,unknown,unknown,unknown,,,,unknown,18 days according to [[the paper]](https://arxiv.org/abs/2210.15257),320 A100 GPUs according to [[the paper]](https://arxiv.org/abs/2210.15257),,,,,, -model,ERNIE 4.0,Baidu,ERNIE-4.0 is a multimodal generalist foundation model.,2023-10-17,https://www.prnewswire.com/news-releases/baidu-launches-ernie-4-0-foundation-model-leading-a-new-wave-of-ai-native-applications-301958681.html,,"text; image, text, video",unknown,,,[],,,,limited,unknown,unknown,unknown,,,,unknown,unknown,unknown,,,,,, -application,Q-Chat,Quizlet,"Quizlet is introducing Q-Chat, a fully-adaptive AI tutor that engages students with adaptive questions based on relevant study materials delivered through a fun chat experience.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,['ChatGPT API'],,,,open,,,,,,,,,,,,https://quizlet.com/tos,,, -application,Bedrock,Amazon,"Bedrock is a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. Bedrock is intended for customers to build and scale generative AI-based applications using FMs, democratizing access for all builders. using an API.",2023-04-13,https://aws.amazon.com/bedrock/,,,,,,"['Jurassic-2', 'Claude', 'Stable Diffusion', 'Amazon Titan', 'Claude 2', 'Cohere Command']",,,,limited,unknown,allowing companies to incorporate generative AI into their business models,,,,,,,,,foundation models made accessible via an API,https://aws.amazon.com/service-terms/,,, -model,FalconLite2,Amazon,"FalconLite2 is a fine-tuned and quantized Falcon language model, capable of processing long (up to 24K tokens) input sequences.",2023-08-08,https://huggingface.co/amazon/FalconLite2,,text; text,40B parameters (dense),,Evaluated against benchmarks that are specifically designed to assess the capabilities of LLMs in handling longer contexts.,['Falcon-40B'],,,,open,Apache 2.0,,,,https://huggingface.co/amazon/FalconLite2/discussions,https://huggingface.co/amazon/FalconLite2,unknown,unknown,unknown,,,,,, -model,Chronos,Amazon,"Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context.",2024-03-13,https://github.com/amazon-science/chronos-forecasting,,time-series; time-series,710M parameters (dense),,Chronos has been evaluated comprehensively on 42 datasets both in the in-domain (15 datasets) and zero-shot settings (27 datasets). Chronos outperforms task specific baselines in the in-domain setting and is competitive or better than trained models in the zero-shot setting.,['T5'],,,"Chronos was evaluated rigorously on 42 datasets, including 27 in the zero-shot setting against a variety of statistical and deep learning baselines.",open,Apache 2.0,"Chronos can be used for zero-shot time series forecasting on univariate time series from arbitrary domains and with arbitrary horizons. Chronos models can also be fine-tuned for improved performance of specific datasets. Embeddings from Chronos encoder may also be useful for other time series analysis tasks such as classification, clustering, and anomaly detection.",,,https://github.com/amazon-science/chronos-forecasting/discussions,https://huggingface.co/amazon/chronos-t5-large,,63 hours on p4d.24xlarge EC2 instance,8 NVIDIA A100 40G GPUs,,,,,, -model,Prism,Toyota Research Institute,Prism is a family of VLMs trained using new analyses about key vision design axes.,2024-02-09,https://arxiv.org/pdf/2402.07865.pdf,,"image, text; text",7B parameters (dense),,Evaluated on standard VLM benchmarks and outperforms SotA open-source VLMs as of release.,[],,,,open,LLaMA 2,,,unknown,,,unknown,less than 9 hours,8 A100 GPUs,,,,,, -model,InternVideo,Shanghai AI Laboratory,,2022-12-06,https://arxiv.org/pdf/2212.03191.pdf,,"text, video; video",1.3B parameters (dense),,,"['Kinetics-400', 'WebVid-2M', 'WebVid-10M', 'HowTo100M', 'AVA', 'Something-Something-v2', 'Kinetics-710']",,,,open,Apache 2.0,,,,,,,,,,,,,, -model,Lego-MT,Shanghai AI Laboratory,Lego-MT is a multilingual large language model which uses a more efficient approach of being an effective detachable model.,2023-05-29,https://arxiv.org/pdf/2212.10551.pdf,,text; text,1.2B parameters (dense),,Evaluated based on own constructed dataset covering 433 languages.,['OPUS'],,,,open,,,,,,,unknown,15 days,32 A100 GPUs,,,,,, -model,MathCoder,Shanghai AI Laboratory,MathCoder is a family of models capable of generating code-based solutions for solving challenging math problems.,2023-10-05,https://arxiv.org/pdf/2310.03731.pdf,,text; text,70B parameters (dense),,Evaluated on GSM8K and the competition-level MATH dataset.,"['GPT-4', 'LLaMA 2']",,,,open,unknown,bridging the gap between natural language understanding and computational problem-solving,,,,,unknown,unknown,32 NVIDIA A800 80GB GPUs,,,,,, -model,InternLM,Shanghai AI Laboratory,"InternLM is a high-quality language model proficient in English, Chinese, and code.",2023-09-20,https://github.com/InternLM/InternLM,,"code, text; code, text",20B parameters (dense),,Evaluated in comparison to LLaMA series models on standard benchmarks.,[],,,,open,Apache 2.0,,,,https://huggingface.co/internlm/internlm-20b/discussions,https://huggingface.co/internlm/internlm-20b,unknown,unknown,unknown,,,,,, -model,InternVideo2,"Shanghai AI Laboratory, Nanjing University, Zhejiang University","InternVideo2 is a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.",2024-03-22,https://github.com/OpenGVLab/InternVideo2,,"text, video; text",6B parameters,,"Evaluated across a range of video-related tasks and compared to other multimodal models like CLIP, VideoPrism, and VideoCoCa. InternVideo 2 generally performs among the best of such models on these benchmarks.","['InternVL', 'VideoMAEv2', 'LAION', 'WebVid', 'InternVid', 'LLaVA', 'KMash']",,,,open,MIT,,,unknown,,,unknown,35 days,"256 NVIDIA A100 GPUs for 32 days, and 64 GPUs for 3 days",,,,,, -model,CosmicMan,Shanghai AI Laboratory,"CosmicMan is a text-to-image foundation model specialized for generating high-fidelity human images with meticulous appearance, reasonable structure, and precise text-image alignment.",2024-04-01,https://cosmicman-cvpr2024.github.io/,,text; image,unknown,,The model was compared with SOTAs and has shown good performance in generating high-quality human images.,['CosmicMan-HQ 1.0'],,,The quality control measures taken include modeling the relationship between dense text descriptions and image pixels in a decomposed manner and enforcing attention refocusing without adding extra modules.,open,unknown,"The model is intended to generate high-quality, photorealistic human images from text descriptions. Applications include avatar generation and potentially virtual reality and video game character creation.",unknown,unknown,unknown,,unknown,1 week,32 80G NVIDIA A100 GPUs,,,,,, -dataset,CosmicMan-HQ 1.0,Shanghai AI Laboratory,"CosmicMan-HQ 1.0 is a large-scale dataset with 6 million high-quality, real-world human images.",2024-04-28,https://arxiv.org/pdf/2404.01294,,image,6 million images,[],"Compared to other human image datasets on data quantity, image quality, and annotations.",[],,,unknown,open,unknown,,,unknown,,,,,,,,,,, -model,Nucleus,Nucleus.AI,Nucleus is a 22B parameters causal decoder-only model built by Nucleus.AI and trained on 500B tokens of RefinedWeb along with curated corpora.,2023-10-05,https://www.withnucleus.ai/,,text; text,22B parameters (dense),,"Evaluated on the OpenLLM leaderboard, performing on par with similar-sized models.",['RefinedWeb'],,,,open,MIT,"Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)",Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,unknown,https://huggingface.co/NucleusAI/nucleus-22B-token-500B/discussions,https://huggingface.co/NucleusAI/nucleus-22B-token-500B,unknown,2 weeks,unknown,,,,,, -model,Devin,Cognition Labs,Devin is the world’s first fully autonomous AI software engineer.,2024-03-12,https://www.cognition-labs.com/introducing-devin,,text; code,unknown,,"Evaluated on SWE-Bench, a challenging software engineering benchmark, where Devin outperforms major state of the art models unassisted.",[],,,,limited,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,Konan LLM,Konan,"Konan LLM is a Large Language Model developed in-house by Konan Technology. Optimized for super-large AI training, it leverages high-quality, large-scale data and over 20 years of expertise in natural language processing.",2023-09-17,https://en.konantech.com/en/llm/konanllm,,text; text,13B parameters,,,[],,,,limited,unknown,"Document generation, document review, Q&A, customer response scenarios.",,,,,unknown,unknown,unknown,,,,,, -application,LinkedIn,LinkedIn,"More than 40 percent of LinkedIn's feed posts include at least one image. We want every member to have equal access to opportunity and are committed to ensuring that we make images accessible to our members who are blind or who have low vision so they can be a part of the online conversation. With Azure Cognitive Service for Vision, we can provide auto-captioning to edit and support alt. text descriptions.",,https://www.linkedin.com/,,,,,,['Azure Cognitive Services for Vision'],,,,open,unknown,,,,,,,,,,,,,, -application,Character,Character AI,Character allows users to converse with various chatbot personas.,2022-09-16,https://beta.character.ai/,,,,,,[],,,,limited,unknown,,,,,,,,,,AI-generated chat conversations,https://beta.character.ai/tos,unknown,unknown,unknown -dataset,Open X-Embodiment dataset,Open X-Embodiment,"The Open X-Embodiment dataset is a dataset of robot movements assembled from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks)",2023-10-03,https://robotics-transformer-x.github.io/,All data can be found at https://robotics-transformer-x.github.io/.,robot trajectories,160K tasks,[],"Analyzed on breakdown of types of robot trajectory in dataset, and overall coverage.",[],N/A,N/A,unknown,open,Apache 2.0,Further research on X-embodiment models.,,unknown,,,,,,,,,,, -model,RT-1-X,"Open X-Embodiment, Google Deepmind","RT-1-X is a model trained on the Open X-Embodiment dataset that exhibits better generalization and new capabilities compared to its predecessor RT-1, an efficient Transformer-based architecture designed for robotic control.",2023-10-03,https://robotics-transformer-x.github.io/,,"images, text; robot trajectories",35M parameters (dense),,"Evaluated on in-distribution robotics skills, and outperforms its predecessor RT-1 by 50% in emergent skill evaluations.","['Open X-Embodiment dataset', 'ImageNet EfficientNet', 'USE']",,,unknown,open,Apache 2.0,Further research on X-embodiment models.,,unknown,,,unknown,unknown,unknown,,,,,, -model,RT-2-X,"Open X-Embodiment, Google Deepmind","RT-2-X is a model trained on the Open X-Embodiment dataset that exhibits better generalization and new capabilities compared to its predecessor RT-2, a large vision-language model co-fine-tuned to output robot actions as natural language tokens.",2023-10-03,https://robotics-transformer-x.github.io/,,"images, text, robot trajectories; robot trajectories",55B parameters (dense),,"Evaluated on in-distribution robotics skills, and outperforms its predecessor RT-2 by 3x in emergent skill evaluations.","['Open X-Embodiment dataset', 'ViT (unknown size)', 'UL2']",,,unknown,closed,unknown,Further research on X-embodiment models.,,unknown,,,unknown,unknown,unknown,,,,,, -model,Taiyi Diffusion XL,"International Digital Economy Academy, South China University of Technology, University of Science and Technology of China",Taiyi Diffusion XL is a new Chinese and English bilingual text-to-image model which is developed by extending the capabilities of CLIP and Stable-DiffusionXL.,2024-01-26,https://arxiv.org/pdf/2401.14688.pdf,,text; image,3.5B parameters (dense),,Evaluated on human and machine benchmarks in comparison to established image models as a baseline.,"['CLIP', 'LAION-400M', 'Wukong', 'Stable Diffusion XL']",,,unknown,open,Apache 2.0,,,unknown,https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B/discussions,https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B,unknown,unknown,unknown,,,,,, -model,Pegasus-1,Twelve Labs,Pegasus-1 is a video-language foundation model.,2023-10-23,https://app.twelvelabs.io/blog/introducing-pegasus-1,,video; text,80B parameters (dense),,Evaluated in comparison to SOTA video-to-language models.,"['MSR-VTT', 'Video-ChatGPT Video Descriptions Dataset']",,,Data selected and cleaned to eliminate toxic and biased content.,open,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,Marengo 2.6,Twelve Labs,"Marengo 2.6 is a new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more. ",2024-03-01,https://www.twelvelabs.io/blog/introducing-marengo-2-6,,"audio, image, text; audio, image, video",unknown,,"Marengo-2.6 sets new benchmarks in zero-shot text-to-video, text-to-image, and text-to-audio retrieval tasks with a single embedding model.",[],,,,open,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,GodziLLa 2,Maya Philippines,"GodziLLa 2 is an experimental combination of various proprietary LoRAs from Maya Philippines and Guanaco LLaMA 2 1K dataset, with LLaMA 2.",2023-08-11,https://huggingface.co/MayaPH/GodziLLa2-70B,,text; text,70B parameters (dense),,"Evaluated on the OpenLLM leaderboard, releasing at rank number 4 on the leaderboard.","['LLaMA 2', 'Guanaco LLaMA dataset']",,,,open,LLaMA 2,,,unknown,,https://huggingface.co/MayaPH/GodziLLa2-70B,unknown,unknown,unknown,,,,,, -model,BiomedGPT,Lehigh University,BiomedGPT leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks.,2023-05-26,https://arxiv.org/pdf/2305.17100.pdf,,"image, text; text",472M parameters (dense),,outperforms majority of preceding state-of-the-art models over 15 unique biomedical modalities.,"['GPT-style autoregressive decoder', 'BiomedGPT biomedical datasets']",,,"No specific quality control is mentioned in model training, though details on data processing and how the model was trained are provided in the paper.",open,Apache 2.0,furthering research in developing unified and generalist models for biomedicine.,,,,,unknown,unknown,10 NVIDIA A5000 GPUs,,,,,, -model,MM1,Apple,"MM1 is a family of multimodal models, including both dense variants up to 30B and mixture-of-experts (MoE) variants up to 64B.",2024-03-16,https://arxiv.org/pdf/2403.09611.pdf,,"image, text; text",30B parameters (dense),,Evaluated on image captioning and visual question answering across many benchmarks.,[],,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,OpenELM,Apple,"OpenELM is a family of Open-source Efficient Language Models. It uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy.",2024-04-24,https://machinelearning.apple.com/research/openelm,,text; text,3B parameters,,"The models were evaluated in terms of zero-shot, LLM360, and OpenLLM leaderboard results.","['RefinedWeb', 'The Pile', 'RedPajama-Data', 'Dolma', 'CoreNet library']",,,unknown,open,Apple,To empower and enrich the open research community by providing access to state-of-the-art language models.,"No explicit prohibited uses stated, though it is noted that users should undertake thorough safety testing.",,https://huggingface.co/apple/OpenELM-3B-Instruct/discussions,https://huggingface.co/apple/OpenELM-3B-Instruct,unknown,unknown,unknown,,,,,, -model,StarCoder,BigCode,"StarCoder is a Large Language Model for Code (Code LLM) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks.",2023-05-09,https://arxiv.org/pdf/2305.06161.pdf,,code; code,15.5B parameters (dense),,"Tested on several benchmarks, most notably Python benchmark HumanEval.",['The Stack'],,,"No specific quality control is mentioned in model training, though details on data processing and how the tokenizer was trained are provided in the paper.",open,BigCode Open RAIL-M v1.0,"As a foundation model to fine-tune and create more specialized models that support use cases such as code completion, fill-in-the-middle, and text summarization. Can also be used as a Tech Assistant prompt and not as an instruction model given training limitations.",See BigCode Open RAIL-M license and FAQ,,https://huggingface.co/bigcode/starcoder/discussions,https://huggingface.co/bigcode/starcoder,16.68 tons of CO2eq,"320,256 GPU hours",512 A100 80GB GPUs distributed across 64 nodes,,,,,, -model,SantaCoder,BigCode,Multilingual code model derived from the findings of BigCode Project analysis on Github stars' association to data quality.,2023-02-24,https://arxiv.org/pdf/2301.03988.pdf,,code; code,1.1B parameters (dense),,Evaluated on MultiPL-E system benchmarks.,"['The Stack', 'BigCode Dataset']",,,,open,BigCode Open RAIL-M v1,The model was trained on GitHub code. As such it is not an instruction model and commands do not work well. You should phrase commands like they occur in source code such as comments or write a function signature and docstring and let the model complete the function body.,See BigCode Open RAIL-M license and FAQ,,https://huggingface.co/bigcode/santacoder/discussions,https://huggingface.co/bigcode/santacoder,124 kg of CO2eq,"14,284 GPU hours",96 NVIDIA Tesla V100 GPUs,,,,,, -dataset,The Stack,BigCode,"The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The Stack serves as a pre-training dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis of programs from natural language descriptions as well as other from code snippets.",2022-11-20,https://arxiv.org/pdf/2211.15533.pdf,https://huggingface.co/datasets/bigcode/the-stack,code,6 TB,['https://huggingface.co/datasets/bigcode/the-stack/viewer/default/train'],Evaluated models trained on The Stack on HumanEval and MBPP and compared against similarly-sized models.,['GitHub'],,,allowed users whose data were part of The Stack's training data to opt-out,open,"The Stack is a collection of source code from repositories with various licenses. Any use of all or part of the code gathered in The Stack must abide by the terms of the original licenses, including attribution clauses when relevant. Provenance information is provided for each data point.",creating code LLMs,See https://huggingface.co/datasets/bigcode/the-stack,,https://huggingface.co/datasets/bigcode/the-stack/discussions,,,,,,,,,, -model,StarCoder2-15B,BigCode,"StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded. The training was carried out using the Fill-in-the-Middle objective on 4+ trillion tokens.",2024-02-28,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,,code; text,15B parameters (dense),,See https://arxiv.org/pdf/2402.19173.pdf,['The Stack v2'],,,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,open,BigCode OpenRail-M,"The model was trained on GitHub code as well as additional selected data sources such as Arxiv and Wikipedia. As such it is not an instruction model and commands like ""Write a function that computes the square root."" do not work well. Intended to generate code snippets from given context, but not for writing actual functional code directly.",See BigCode Open RAIL-M license and FAQ,unknown,https://huggingface.co/bigcode/starcoder2-15b/discussions,https://huggingface.co/bigcode/starcoder2-15b,unknown,unknown,1024 x H100 GPUs,,,,,, -model,StarCoder2-7B,BigCode,"StarCoder2-7B model is a 7B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3.5+ trillion tokens.",2024-02-28,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,,code; text,7B parameters (dense),,See https://arxiv.org/pdf/2402.19173.pdf,['The Stack v2'],,,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,open,BigCode OpenRail-M,"Intended to generate code snippets from given context, but not for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.",See BigCode Open RAIL-M license and FAQ,unknown,https://huggingface.co/bigcode/starcoder2-7b/discussions,https://huggingface.co/bigcode/starcoder2-7b,"29,622.83 kgCO2eq","145,152 hours (cumulative)",432 H100 GPUs,,,,,, -model,StarCoder2-3B,BigCode,"StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.",2024-02-28,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,,code; text,3B parameters (dense),,See https://arxiv.org/pdf/2402.19173.pdf,['The Stack v2'],,,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,open,BigCode OpenRail-M,"Intended to generate code snippets from given context, but not for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.",See BigCode Open RAIL-M license and FAQ,unknown,https://huggingface.co/bigcode/starcoder2-3b/discussions,https://huggingface.co/bigcode/starcoder2-3b,"16,107.01 kgCO2eq","97,120 hours (cumulative)",160 A100 GPUs,,,,,, -model,h2oGPT,H2O AI,Series of models fine-tuned on well-known LLMs using the h2oGPT repositories.,2023-06-16,https://arxiv.org/pdf/2306.08161.pdf,,text; text,20B parameters (dense),,Evaluated on EleutherAI evaluation harness.,"['GPT-NeoX', 'H2O AI OpenAssistant', 'h2oGPT Repositories']",,,,open,Apache 2.0,,,,https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b/discussions,https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b,unknown,unknown,unspecified number of 48GB A100 NVIDIA GPUs,,,,,, -model,H2O Danube,H2O AI,H2O Danube is a language model trained on 1T tokens following the core principles of LLaMA 2 and Mistral.,2024-01-30,https://arxiv.org/pdf/2401.16818.pdf,,text; text,1.8B parameters (dense),,Evaluated on common sense and world knowledge benchmarks.,[],,,unknown,open,Apache 2.0,,"Users are encouraged to use the large language model responsibly and ethically. By using this model, you agree not to use it for purposes that promote hate speech, discrimination, harassment, or any form of illegal or harmful activities.",unknown,https://huggingface.co/h2oai/h2o-danube-1.8b-base/discussions,https://huggingface.co/h2oai/h2o-danube-1.8b-base,unknown,unknown,8x H100 GPUs on a single node,,,,,, -application,ARES,Faraday Lab,ARES is a text-to-image generator based on Stable Diffusion. The goal is to provide a simple tool with a user interface allowing mainstream AI access for artists and creators.,2023-04-26,https://faradaylab.fr/,,,,,,['Stable Diffusion'],,,,open,unknown,,,,,,,,,,generated images,,,, -dataset,C4,Google,The Colossal Clean Crawled Corpus (C4) is a processed version of Common Crawl to facilitate transfer learning in NLP.,2019-10-23,https://arxiv.org/abs/1910.10683,https://huggingface.co/datasets/c4,text,750GB,['https://huggingface.co/datasets/c4/viewer/en/train'],https://arxiv.org/abs/2104.08758,['CommonCrawl'],,"Data was filtered for English using langdetect. Further, data was filtered to end in terminal punctuation, to remove short pages (less than 5 sentences), and to remove ""Dirty, Naughty, Obscene or Otherwise Bad Words"". -","Data filtering excluded obscene words from a block list as well as short documents and some deduplication was done based on string overlap. -",open,ODC-By 1.0,To faciliate transfer learning research in NLP.,,,https://huggingface.co/datasets/c4/discussions,,,,,,,,,, -dataset,Internal Google BERT dataset,Google,"The dataset used to train Internal Google BERT models. -",2019-11-25,https://blog.google/products/search/search-language-understanding-bert/,,text,unknown,[],unknown,[],"Web pages, and search queries",unknown,unknown,closed,unknown,unknown,unknown,unknown,unknown,,,,,,,,,, -dataset,Conceptual Captions,Google,"A dataset containing 3 million (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems. -",2018-07-01,https://aclanthology.org/P18-1238/,,"image, text","3.3M (image, text) pairs",[],"Authors evaluate the dataset on two image captioning models - RNN-based model and Transformer model, under two experimental conditions - using the training & development sets provided by the MS COCO dataset, versus training & development sets using the Conceptual dataset. They use three different test sets- the blind COCO-C40 test set, the Conceptual test set and the Flickr 1K test set. They present both Human and Automatic evaluation results. Human evaluations indicate that the Conceptual-based models are superior. Automatic models fail to corroborate the human evaluation results. This highlights the weakness of these automatic metrics.",[],,"The following filtering steps are applied in the given order: -1. Image-based Filtering - ""It only keeps JPEG images where both dimensions are greater than 400 pixels, and the ratio of larger to smaller dimension is no more than 2. It excludes images that trigger pornography or profanity detectors. These filters discard more than 65% of the candidates."" -2. Text-based Filtering - ""Candidates with no determiner, no noun, or no preposition are discarded; candidates with a high noun ratio are also discarded; candidates with a high rate of token repetition are discarded; candidates where the first word is not capitalized, or with too high capitalized-word ratio are discarded; we use a vocabulary VW of 1B token types, appearing at least 5 times in the English Wikipedia, and discard candidates that contain tokens that are not found in this vocabulary. candidates that score too high or too low on the polarity annotations, or trigger the pornography/profanity detectors, are discarded; predefined boiler-plate prefix/suffix sequences matching the text are cropped, e.g. “click to enlarge picture”, “stock photo”; we also drop text which begins/ends in certain patterns, e.g. “embedded image permalink”, “profile photo”. These filters only allow around 3% of the incoming candidates to pass to the later stages."" -3. Image&Text-based Filtering - ""We filter out candidates for which none of the text tokens can be mapped to the content of the image. This filter discards around 60% of the incoming candidates."" -4. Text Transformation with Hypernymization - ""Noun modifiers of certain types (proper nouns, numbers, units) are removed; dates, durations, and preposition-based locations (e.g., ""in Los Angeles"") are removed; named-entities are identified, matched against the KG entries, and substitute with their hypernym; resulting coordination noun-phrases with the same head (e.g., ""actor and actor"") are resolved into a single-head, pluralized form (e.g., ""actors""). Around 20% of samples are discarded during this transformation. We then cluster all resolved entities (e.g., 2560 ""actor"", ""dog"", ""neighborhood"", etc.) and keep only the candidates for which all detected types have a count of over 100 (around 55% of the candidates)."" -","Input candidate (image, caption) pairs pass through several stages of filtering and processing to ensure quality.",open,Conceptual Captions License,,unknown,unknown,Feedback can be provided by creating an issue in the [[Conceptual Captions GitHub repository]](https://github.com/google-research-datasets/conceptual-captions) or by emailing at conceptual-captions at google.com,,,,,,,,,, -dataset,Conceptual 12M,Google,"A dataset with 12 million image-text pairs specifically meant to be used for vision-and-language pre-training. -",2021-02-17,https://arxiv.org/pdf/2102.08981.pdf,,"image, text","12M (image, text) pairs",[],"The dataset is benchmarked against CC3M on two most fundamental V+L tasks: vision-to-language generation and vision-and-language matching, with an emphasis on long-tail visual recognition. The results illustrate the benefit of scaling up pre-training data for vision-and-language tasks, as indicated by the new state-of-the-art results on both the nocaps and Conceptual Captions benchmarks. -",[],,"Some of the filtering steps used in the preparation of Conceptual Captions dataset are relaxed to trade off high-recall for low-precision. The following steps are applied in the given order: -1. Image-based Filtering - Only keep JPEG images where both dimensions are greater than 400 pixels, and the ratio of larger to smaller dimension is no more than 2.5. Exclude images that trigger pornography or profanity detectors. -2. Text-based Filtering - Allow text between 3 and 256 words in the alt-text. Discard candidates with no noun or no determiner, but permit ones without prepositions. Set the maximum fraction of word repetition allowed to 0.2. Increase the threshold for counting a word type as rare from 5 to 20. -3. Image&Text-based Filtering - Filter out candidates for which none of the text tokens can be mapped to the content of the image. -","Input candidate (image, caption) pairs pass through several stages of filtering and processing to ensure quality. Person-name substitutions are performed in the alt-texts to protect the privacy of individuals in the associated images.",open,Conceptual Captions License,,unknown,unknown,Feedback can be provided by creating an issue in the [[Conceptual 12M GitHub repository]](https://github.com/google-research-datasets/conceptual-12m) or by emailing at conceptual-captions at google.com,,,,,,,,,, -model,T5,Google,Text-To-Text Transfer Transformer (T5) is a model that unifies all NLP tasks under the text-to-text format.,2019-10-23,https://arxiv.org/abs/1910.10683,,text; text,11B parameters (dense),,https://huggingface.co/t5-base#evaluation,['C4'],,,The T5 paper documents many analyses/ablations that were considered before arriving at the final architecture/training procedure.,open,Apache 2.0,NLP tasks,unknown,,https://huggingface.co/t5-large/discussions,https://huggingface.co/t5-base,unknown,unknown,"1,024 TPU v3 chips (Cloud TPU Pods)",,,,,, -model,Internal Google BERT,Google,"Internal Google BERT model used to power Google Search products. -",2019-11-25,https://blog.google/products/search/search-language-understanding-bert/,,text; text,unknown,,unknown,['Internal Google BERT dataset'],,,unknown,closed,unknown,unknown,unknown,unknown,unknown,unknown,unknown,unknown,unknown,,,,,, -application,Google Search,Google,"Google Search is Google's search engine. -",2019-11-25,https://blog.google/products/search/search-language-understanding-bert/,,,,,,"['Internal Google BERT', 'MUM']",,,unknown,open,,"Searching the web using text, voice or image","Prohibited use cases aren't specifically spelled out for Google search, but several illegal and discouraged use cases are shared in the Respect Others section of the [[Term of Service]](https://policies.google.com/terms). -","It is implied that Google scan uses of its products for spam, malware and illegal content in the [[Term of Service]](https://policies.google.com/terms). -","Feedback can be sent to Google Feedback using the product interface [[Google Feedback]](https://www.google.com/tools/feedback). -",,,,,unknown,web page ranking,https://policies.google.com/terms,unknown,unknown,unknown -dataset,Infiniset,Google,"Infiniset ""is a combination of dialog data from public dialog data and other public web documents"" [[Appendix E]](https://arxiv.org/pdf/2201.08239.pdf#appendix.E). -",2021-06-18,https://arxiv.org/pdf/2201.08239.pdf,,"code, text",unknown,[],unknown,[],"Included in the dataset are data from ""public forums (0%); C4 data (12.5% ); code documents from sites related to programming like Q&A sites tutorials, etc (12.5%); Wikipedia (English) (12.5%); English web documents (6.25%); and Non-English web documents (6.25%)."" -",unknown,unknown,closed,unknown,unknown,"The prohibited uses for Infiniset weren't specifically listed, but the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1) advises avoiding harm, unjust impact and misinformation, among others. -",unknown,,,,,,,,,,, -model,LaMDA,Google,"LaMDA stands for Language Models for Dialog Application. It is a transformer based language model trained on dialogue data. -",2021-06-18,https://arxiv.org/pdf/2201.08239.pdf,,text; text,137B parameters (dense),,"The model performance was analyzed on sensibleness, specificity and interestingness. The model was also analyzed on safety, following metrics derived from Google AI Principles [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1). Finally, the model was analyzed on groundedness, testing its ability to produce responses that can be associated with ""known sources whenever possible [[Section 4.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.4.1)."" -",['Infiniset'],,,"LaMDA was fine-tuned to predict sensibleness, specificity and interestingness as well as safety. Then, the candidates were filtered out if the model safety predictions were below a certain threshold. The next candidates in the conversation were selected as a combination of these predictions. The model was also fine-tuned for groundedness. The results are shown in [[Figure 5]](https://arxiv.org/pdf/2201.08239.pdf#figure.caption.23). -",closed,unknown,"LaMDA is a language model, so it can be used for regular langauge modelling tasks without fine-tuning, but its fine-tuned for dialogue tasks. -","The prohibited uses of LaMDA weren't specifically listed, but the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1) advises avoiding harm, unjust impact and misinformation, among others. -",unknown,,,26 tCO2e,4108.80 petaflop/s-day,1024 TPU-V3 chips,,,,,, -dataset,PaLM dataset,Google,"PaLM dataset ""was created for pre-training language models"" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D). -",2022-04-04,https://arxiv.org/pdf/2204.02311.pdf,https://arxiv.org/pdf/2204.02311.pdf#appendix.D,"code, text",3.92 TB,[],unknown,['Infiniset'],"The dataset is based on Infiniset. It included multilingual text containing text from over 100 languages. The breakdown of the data included is as follows: Social media conversations (multilingual) 50, Filtered webpages (multilingual) 27%, BooksCorpus (English) 13%, GitHub (code) 5%, Wikipedia (multilingual) 4%, and News (English) 1%. Code was collected from GitHub repositories with appropriate licenses, totalling 96GB of source code [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3). -","GitHub repositories with copyleft licenses were excluded. Programming languageges other than the most common 24 were excluded [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3). -","In order to reduce low quality web pages, the web pages were sampled according to a ""quality score"" classifier. Code files were de-duplicated using Levenshtein distance [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3). -",closed,unknown,"""The dataset was created for pre-training language models by a team of researchers at Google"". -","""... should not be used for any of the unacceptable language model use cases, e.g., generation of toxic speech"" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D). -",unknown,Contact the authors.,,,,,,,,,, -model,Flan-T5,Google,Flan-T5 is a version of the T5 language model fine-tuned on instruction data,2022-10-20,https://arxiv.org/abs/2210.11416,,text; text,11B parameters (dense),,Evaluated on a variety of standard language datasets.,"['T5', 'Muffin', 'P3', 'NaturalInstructions-v2', 'Flan CoT']",,,"Across different multitask datasets, templates and formatting were maintained. For the chain-of-thoughts (CoT) data, specific exemplars were used.",open,Apache 2.0,unknown,,,https://huggingface.co/google/flan-t5-xxl/discussions,https://arxiv.org/pdf/2210.11416.pdf,Unknown,Unknown,512 v4 TPU Chips,,,,,, -model,UL2,Google,UL2 is a language model trained with a new pretraining objective,2022-05-10,https://arxiv.org/abs/2205.05131,,text; text,20B parameters (dense),,,['C4'],,,,open,Apache 2.0,,,,,,,,128 TPUv4,,,,,, -model,Parti,Google,Parti is a text-to-image diffusion model,2022-06-22,https://parti.research.google/,,text; image,20B parameters (dense),,,"['C4', 'LAION-400M', 'FIT400M', 'JFT-4B']",,,,closed,unknown,,,,,,,unknown,unknown,,,,,, -model,Imagen,Google,Imagen is a text-to-image diffusion model,2022-05-23,https://imagen.research.google/,,text; image,14B parameters (dense),,,"['LAION-400M', 'Google internal image-text dataset']",,,,open,unknown,,,,,,unknown,unknown,128 TPU-v4,,,,,, -model,VATT,Google,VATT is a family of models trained on multimodal data,2022-04-22,https://arxiv.org/abs/2104.11178,,"text; audio, video",155M parameters (dense),,,"['AudioSet', 'HowTo100M']",,,,open,Apache 2.0,,,,,,unknown,3 days,256 TPU-v3,,,,,, -model,PaLM,Google,"PaLM stands Pathways Language Model, ""dense decoder-only Transformer model trained with the Pathways system"" [[Google ai Blog]](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html). -",2022-04-04,https://arxiv.org/pdf/2204.02311.pdf,,"text; text, code",540B parameters (dense),,"""PaLM is evaluated on English Natural Language Processing (NLP) tasks, tasks from BIG-bench, reasoning tasks, code completion tasks, multilingual generation and question answering tasks, translation tasks, and bias and toxicity benchmarks"" [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E). -",['PaLM dataset'],,,Unknown,limited,unknown,"""The primary use is research on language models, including: research on NLP applications like machine translation and question answering, advancing fairness and safety research, and understanding limitations of current LLMs. Within Google, PaLM is being used for research on a variety of open- ended text and code generation tasks, including reasoning [[Section 6.3]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.3) and code synthesis and understanding [[Section 6.4]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.4)"" [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E). -","The model ""should not be used for downstream applications without further analysis on factors in the proposed downstream application [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E)"" -",unknown,Contact the authors.,https://arxiv.org/pdf/2204.02311.pdf#appendix.E,271.43 tCO2,29600 petaflop/s-days,6144 TPU v4 chips,,,,,, -application,PaLM API,Google,a new developer offering that makes it easy and safe to experiment with Google’s language models.,2023-03-14,https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html,,,,,,['PaLM'],,,,limited,unknown,,,,,,,,,,,,,, -model,Med-PaLM,Google,,2022-12-26,https://arxiv.org/abs/2212.13138,,text; text,540B parameters (dense),,,"['Flan-PaLM', 'MultiMedQA']",,,,closed,unknown,,,,,,,,,,,,,, -model,Med-PaLM Multimodal,Google,,2023-07-26,https://arxiv.org/pdf/2307.14334.pdf,,"image, text, genome sequence; text",562B parameters (dense),,Evaluated on MultiMedBench tasks and radiologist evaluations of model-generated chest X-ray reports,"['PaLM-E', 'MultiMedBench']",,,,closed,unknown,,,,,,,,,,,,,, -model,MultiMedQA,Google,,2022-12-26,https://arxiv.org/abs/2212.13138,,text; text,unknown,,,"['MedQA', 'MedMCQA', 'PubMedQA', 'MMLU', 'LiveQA', 'Medication QA', 'HealthSearchQA']",,,,closed,unknown,,,,,,,,,,,,,, -model,Flan-PaLM,Google,,2022-10-20,https://arxiv.org/abs/2210.11416,,text; text,540B parameters (dense),,,"['PaLM', 'Muffin', 'P3', 'NaturalInstructions-v2']",,,,closed,unknown,,,,,,,,,,,,,, -model,Flan-U-PaLM,Google,,2022-10-20,https://arxiv.org/abs/2210.11416,,text; text,540B parameters (dense),,,"['U-PaLM', 'Muffin', 'P3', 'NaturalInstructions-v2']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,Muffin,Google,,2021-09-03,https://arxiv.org/abs/2109.01652,,text,62 tasks,[],,[],,,,open,Apache 2.0,,,,,,,,,,,,,, -model,U-PaLM,Google,,2022-10-20,https://arxiv.org/abs/2210.11399,,text; text,540B parameters (dense),,,"['PaLM', 'PaLM dataset']",,,,closed,unknown,,,,,,,,,,,,,, -model,PaLM-SayCan,Google,,2022-08-16,https://arxiv.org/abs/2204.01691,,text; robotics trajectories,540B parameters (dense),,,['PaLM'],,,,closed,"unknown (model weights), Apache 2.0 (SayCan code)",,,,,,,,,,,,,, -model,GLaM,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text; text,1.2T parameters (sparse),,,"['GLaM Web dataset', 'Wikipedia', 'GLaM Conversations dataset', 'GLaM Forums dataset', 'BooksCorpus', 'GLaM News dataset']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,GLaM Web dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -dataset,GLaM Conversations dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -dataset,GLaM Forums dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -dataset,GLaM News dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,MUM,Google,MUM (Multitask Unified Model) is a multimodal model that is specialized for more complex queries.,2021-05-18,https://blog.google/products/search/introducing-mum/,,"image, text; text",unknown,,,['MUM dataset'],,,,closed,unknown,,,,,,,,,,,,,, -dataset,MUM dataset,Google,,2021-05-18,https://blog.google/products/search/introducing-mum/,,"image, text",unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,Phenaki,Google,,2023-02-01,https://openreview.net/pdf?id=vOEXS39nOF,,text; video,1.8B parameters (dense),,,"['LAION-400M', 'Phenaki Video-Text Corpus']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,Phenaki Video-Text Corpus,Google,,2023-02-01,https://openreview.net/pdf?id=vOEXS39nOF,,"text, video",15M text-video pairs at 8FPS,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,Flan-UL2,Google,,2023-03-02,https://arxiv.org/abs/2205.05131,,text; text,20B parameters (dense),,,"['UL2', 'Flan Collection']",,,,open,Apache 2.0,,,,,,,,,,,,,, -dataset,Flan Collection,Google,,2023-01-31,https://arxiv.org/abs/2301.13688,,text,1836 tasks,[],,"['Flan dataset', 'P3', 'NaturalInstructions-v2']",,,,open,Apache 2.0,,,,,,,,,,,,,, -model,MusicLM,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,text; audio,1.4B parameters (dense),,,"['SoundStream', 'w2v-BERT', 'MuLan', 'MusicLM semantic model', 'MusicLM acoustic model']",,,,closed,unknown,,,,,,,,,,,,,, -model,SoundStream,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,unknown,,,['Free Music Archive'],,,,closed,unknown,,,,,,,,,,,,,, -model,w2v-BERT,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,600M parameters (dense),,,['Free Music Archive'],,,,closed,unknown,,,,,,,,,,,,,, -model,MuLan,Google,,2022-08-26,https://arxiv.org/abs/2208.12415,,text; audio,unknown,,,"['AST', 'BERT', 'MuLan dataset']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,MuLan dataset,Google,,2022-08-26,https://arxiv.org/abs/2208.12415,,"audio, text",370K hours audio,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -dataset,MusicLM dataset,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio,280K hours audio,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,MusicLM semantic model,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,430M parameters (dense),,,['MusicLM dataset'],,,,closed,unknown,,,,,,,,,,,,,, -model,MusicLM acoustic model,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,430M parameters (dense),,,['MusicLM dataset'],,,,closed,unknown,,,,,,,,,,,,,, -model,Noise2Music,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,"audio, text; audio",unknown,,,['Noise2Music pseudolabel dataset'],,,,closed,unknkown,,,,,,,,,,,,,, -dataset,LaMDA-LF,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,text,150k songs,[],,['LaMDA'],,,,closed,unknown,,,,,,,,,,,,,, -dataset,Rater-LF,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,text,10k captions,[],,['MusicCaps'],,,,closed,unknown,,,,,,,,,,,,,, -dataset,Rater-SF,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,text,24k captions,[],,['MusicCaps'],,,,closed,unknown,,,,,,,,,,,,,, -model,Noise2Music pseudolabeler,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,"audio, text; audio",unknown,,,"['MuLan', 'MuLaMCap', 'LaMDA-LF', 'Rater-LF', 'Rater-SF']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,Noise2Music audio dataset,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,audio,340k hours audio,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -dataset,Noise2Music pseudolabel dataset,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,"audio, text",340k hours audio with pseudolabels,[],,"['Noise2Music audio dataset', 'Noise2Music pseudolabeler']",,,,closed,unknown,,,,,,,,,,,,,, -application,AI Test Kitchen,Google,"AI Test Kitchen provides a new way for people to learn about, experience, and give feedback on emerging AI technology, like LaMDA.",2022-08-25,https://blog.google/technology/ai/join-us-in-the-ai-test-kitchen/,,,,,,['LaMDA'],,,,limited,unknown,,,,,,,,,,,,,, -application,Bard,Google,"Conversational AI service, powered by LaMDA",2023-02-06,https://blog.google/technology/ai/bard-google-ai-search-updates/,,,,,,['LaMDA'],,,,closed,unknown,,,,,,,,,,,,,, -model,Minerva,Google,,2022-06-29,https://arxiv.org/abs/2206.14858,,text; text,540B parameters (dense),,,"['PaLM', 'arXiv', 'PaLM dataset', 'Minerva Math Web Pages dataset']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,Minerva Math Web Pages dataset,Google,,2022-06-29,https://arxiv.org/abs/2206.14858,,text,17.5B tokens,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,USM,Google,"Universal Speech Model (USM) is a family of state-of-the-art speech models with 2B parameters trained on 12 million hours of speech and 28 billion sentences of text, spanning 300+ languages. USM, which is for use in YouTube (e.g., for closed captions), can perform automatic speech recognition (ASR) on widely-spoken languages like English and Mandarin, but also languages like Punjabi, Assamese, Santhali, Balinese, Shona, Malagasy, Luganda, Luo, Bambara, Soga, Maninka, Xhosa, Akan, Lingala, Chichewa, Nkore, Nzema to name a few. Some of these languages are spoken by fewer than twenty million people, making it very hard to find the necessary training data.",2023-03-06,https://arxiv.org/abs/2303.01037,,"audio, text; text",2B parameters (dense),,,"['YT-NLU-U', 'Pub-U', 'Web-NTL', 'YT-SUP+', 'Pub-S']",,,,limited,unknown,,,,,,,,,,,,,, -application,YouTube,Google,YouTube is a global online video sharing and social media platform,2005-02-14,https://www.youtube.com/,,,,,,['USM'],,,,open,,,,,,,,,,,,,,, -model,PaLM-E,Google,,2023-03-06,https://arxiv.org/abs/2303.03378,,"image, text; text",562B parameters (dense),,,"['PaLM', 'ViT-22B']",,,,closed,unknown,,,,,,,,,,,,,, -model,ViT-22B,Google,,2023-02-10,https://arxiv.org/abs/2302.05442,,"image, image",22B parameters (dense),,,['JFT'],,,,closed,unknown,,,,,,,,,,,,,, -model,AudioLM,Google,,2022-09-07,https://arxiv.org/abs/2209.03143,,"audio, text; audio",1B parameters (dense),,,"['w2v-BERT', 'SoundStream']",,,,closed,unknown,,,,,,,,,,,,,, -model,PaLI,Google,,2022-09-14,https://arxiv.org/abs/2209.06794,,text; image,17B parameters (dense),,,"['mT5', 'ViT-e', 'WebLI']",,,,closed,unknown,,,,,,,,,,,,,, -model,ViT-e,Google,,2022-09-14,https://arxiv.org/abs/2209.06794,,image; image,3.9B parameters (dense),,,['JFT'],,,,closed,unknown,,,,,,,,,,,,,, -dataset,WebLI,Google,,2022-09-14,https://arxiv.org/abs/2209.06794,,"image, text","10B images, 12B alt-text",[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,Vid2Seq,Google,,2023-02-27,https://arxiv.org/abs/2302.14115,,video; text,500M parameters (dense),,,"['T5', 'CLIP', 'YT-Temporal-1B']",,,,open,Apache 2.0,,,,,,,,,,,,,, -model,Google Joint SLM,Google,Joint speech and language model using a Speech2Text adapter and using a CTC-based blank-filtering.,2023-06-08,https://arxiv.org/pdf/2306.07944.pdf,,audio; text,unknown,,"evaluated on DSTC11 Challenge Task, based on MultiWoz 2.1, with a focus on dialog state tracking.","['CTC blank-filtering', 'Speech2Text adapter']",,,,closed,unknown,,,,,,,,,,,,,, -model,PaLM 2,Google,PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives similar to UL2.,2023-05-10,https://blog.google/technology/ai/google-palm-2-ai-large-language-model/,,text; text,unknown,,Reports results on standard code benchmarks across a variety of programming languages.,['PaLM 2 dataset'],,,"Employed de-duplication, removal of sensitive-PII and filtering. Added control tokens marking toxicity of text.",open,unknown,"general use large language model that can be used for language, reasoning, and code tasks.",becoming part of a general-purpose service or product or use within specific downstream applications without prior assessment,Google internal monitoring,Specific queries provided by annotators,https://ai.google/static/documents/palm2techreport.pdf,,,TPU v4 (number unspecified),,,,,, -model,MedLM,Google,"MedLM is a collection of foundation models tuned to follow natural language instructions for tasks in medicine, such as question answering and creating draft summaries.",2023-12-13,https://cloud.google.com/vertex-ai/docs/generative-ai/medlm/overview,,text; text,unknown,,"Assessed on medical benchmarks of professional medical exams, medical research, and consumer queries.",[],,,,limited,unknown,"to be used for question answering and creating draft summaries from existing documentation, to be reviewed, edited, and approved by the user before use.",,Google internal monitoring,,https://cloud.google.com/static/vertex-ai/docs/generative-ai/medlm/MedLM-model-card.pdf,unknown,unknown,unknown,,,,,, -model,Gemini,Google,"As of release, Gemini is Google's most capable and flexible AI model, proficient in multimodal domains.",2023-12-06,https://deepmind.google/technologies/gemini/#introduction,,"text; image, text, video",unknown,,"Evaluated on standard general, reasoning, math, coding, and multimodal benchmarks with results that surpass GPT-4 on almost all.",[],,,,closed,unknown,"general use large language model that can be used for language, reasoning, and code tasks.",becoming part of a general-purpose service or product or use within specific downstream applications without prior assessment,Google internal monitoring,,,unknown,unknown,unknown,,,,,, -model,TimesFM,Google,TimesFM is a single forecasting model pre-trained on a large time-series corpus of 100 billion real world time-points.,2024-02-02,https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html,,,200M parameters (dense),,Evaluated on popular time-series benchmarks.,[],,,,closed,unknown,,,unknown,,,unknown,unknown,unknown,,,,,, -model,Gemma,Google,"Gemma is a family of lightweight, state-of-the-art open models from Google, based on the Gemini models. They are text-to-text, decoder-only large language models, available in English.",2024-02-21,https://blog.google/technology/developers/gemma-open-models/,,text; text,7B parameters (dense),,Evaluation was conducted on standard LLM benchmarks and includes internal red-teaming testing of relevant content policies.,[],,,"Multiple evaluations and red-teaming conducted, with particular focus on ethics, bias, fair use cases, and safety.",open,custom,"Text generation tasks including question answering, summarization, and reasoning; content creation, communication, research, and education.",Prohibited uses are specified in the Gemma Prohibited Use Policy here https://ai.google.dev/gemma/prohibited_use_policy,,https://huggingface.co/google/gemma-7b/discussions,https://huggingface.co/google/gemma-7b,unknown,unknown,TPUv5e,,,,,, -model,Med-Gemini,Google,"Med-Gemini is a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly integrate the use of web search, and that can be efficiently tailored to novel modalities using custom encoders.",2024-04-29,https://arxiv.org/pdf/2404.18416,,"image, text; text",unknown,,"Evaluated Med-Gemini on 14 medical benchmarks spanning text, multimodal and long-context applications, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpassing the GPT-4 model family on every benchmark where a direct comparison is viable.","['Gemini', 'MultiMedBench']",,,,closed,unknown,"To be used in areas of medical research including medical summarization, referral letter generation, and medical simplification tasks.",Unfit for real-world deployment in the safety-critical medical domain.,,,,unknown,unknown,unknown,,,,,, -model,HyperCLOVA,NAVER,HyperClova is an autoregressive language model,2021-05-21,https://arxiv.org/abs/2109.04650,,text; text,82B parameters,,,[],,,,closed,unknown,,,,,,,130.4 days,1024 A100 GPUs,,,,,, -model,HyperCLOVA X,NAVER,"HyperCLOVA X is a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding.",2024-04-13,https://arxiv.org/pdf/2404.01954,,text; text,unknown,,"Evaluated on English and Korean benchmarks in comparison to open source English and multilingual LLMs, with HyperCLOVA X (closed) surpassing the models compared.",[],,,,limited,unknown,,,,,,unknown,unknown,unknown,,,,,, +",At will monitoring by the provider,Email support,unknown,"Text Generation, Text Completion",https://goose.ai/docs/tos,unknown,unknown,unknown,,,, +model,VQGAN-CLIP,EleutherAI,VQGAN-CLIP is a model that better generates and edits images using a multimodal encoder to guide image generation.,2022-09-04,https://arxiv.org/pdf/2204.08583.pdf,,text; image,"Evaluated by human testers rating alignment of text input, image output pairs.",227M parameters (dense),"['VQGAN', 'CLIP']",unknown,Less than 1 V100-hour,1 NVIDIA Tesla K80 GPU,,open,MIT,,,,,,,,,,,,,, +model,Pythia,Eleuther AI,A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters,2023-05-31,https://arxiv.org/pdf/2304.01373.pdf,https://huggingface.co/EleutherAI/pythia-12b,text; text,Evaluated on a variety of NLP benchmarks and found to perform similarly to OPT and BLOOM models.,12B parameters (dense),['The Pile'],,,64 A100 GPUs,,open,Apache 2.0,,,,https://huggingface.co/EleutherAI/pythia-6.9b/discussions,,,,,,,,,, +model,Llemma,"Princeton University, Eleuther AI",Llemma is a large language model for mathematics.,2023-10-16,https://arxiv.org/pdf/2310.10631.pdf,https://huggingface.co/EleutherAI/llemma_34b,text; text,Evaluated on math benchmarks in comparison to general large language models.,34B parameters (dense),"['Proof Pile 2', 'Code LLaMA']",unknown,47k A100 hours,256 A100 40GB GPUs,,open,LLaMA 2,,,,https://huggingface.co/EleutherAI/llemma_34b/discussions,,,,,,,,,, +dataset,Proof Pile 2,"Princeton University, Eleuther AI",Proof Pile 2 is a corpus for language modeling of mathematics.,2023-10-16,https://arxiv.org/pdf/2310.10631.pdf,,text,,55B tokens,"['Common Crawl', 'OpenWebMath', 'Arxiv', 'RedPajama-Data']",,,,,open,MIT,,,,https://huggingface.co/datasets/EleutherAI/proof-pile-2/discussions,,,,,,,https://huggingface.co/datasets/EleutherAI/proof-pile-2,[],"scientific papers, web data containing mathematics, mathematical code", +model,Pile-T5,EleutherAI,"Pile-T5 is a version of the broadly used T5 model, but improved to eliminate weaknesses such as the omission of crucial code-related tokens. It utilizes LLaMA tokenizer and is trained on the Pile, offering enhancements for finetuning on downstream tasks, particularly those involving code.",2024-04-15,https://blog.eleuther.ai/pile-t5/,,text; text,"The models were evaluated on SuperGLUE, CodeXGLUE, as well as MMLU and Bigbench Hard. Comparisons were made with T5v1.1 and found that Pile-T5 models performed better in most conditions.",unknown,"['The Pile', 'T5x', 'LLaMA', 'umT5']",unknown,2 million steps,unknown,,open,unknown,The model is aimed at downstream tasks that benefit from the encoder-decoder architecture. Particularly useful for tasks involving code.,unknown,unknown,unknown,,,,,,,,,, +model,Dolphin 2.2 Yi,Cognitive Computations,Dolphin 2.2 Yi is an LLM based off Yi.,2023-11-14,https://erichartford.com/dolphin,https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b,text; text,,34B parameters (dense),"['Dolphin', 'Yi']",unknown,3 days,4 A100 GPUs,,open,custom,,,unknown,https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b/discussions,,,,,,,,,, +model,WizardLM Uncensored,Cognitive Computations,WizardLM Uncensored is WizardLM trained with a subset of the dataset - responses that contained alignment / moralizing were removed.,2023-06-01,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored,text; text,Evaluated on OpenLLM leaderboard.,30B parameters (dense),['WizardLM'],unknown,unknown,unknown,,open,unknown,,,unknown,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored/discussions,,,,,,,,,, +model,CPM Bee,OpenBMB,"CPM-Bee is a fully open-source, commercially-usable Chinese-English bilingual base model with a capacity of ten billion parameters.",2023-05-27,https://github.com/OpenBMB/CPM-Bee,https://huggingface.co/openbmb/cpm-bee-10b,text; text,Evaluated on English and Chinese language benchmarks.,10B parameters (dense),[],unknown,unknown,unknown,,open,custom,You can use the raw model for many NLP tasks like text generation or fine-tune it to a downstream task.,,unknown,https://huggingface.co/openbmb/cpm-bee-10b/discussions,,,,,,,,,, +dataset,UltraFeedback,OpenBMB,"UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models.",2023-09-26,https://github.com/OpenBMB/UltraFeedback,,text,Randomly chosen models trained on UltraFeedback evaluated across standard benchmarks.,256k samples,[],,,,,open,MIT,,,unknown,https://huggingface.co/datasets/openbmb/UltraFeedback/discussions,,,,,,,https://huggingface.co/datasets/openbmb/UltraFeedback,[],, +model,MiniCPM,OpenBMB,"MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings (2.7B in total).",2024-02-01,https://github.com/OpenBMB/MiniCPM/,https://huggingface.co/openbmb/MiniCPM-V,text; text,Evaluated on open-sourced general benchmarks in comparison to SotA LLMs.,2.4B parameters (dense),[],unknown,unknown,unknown,,open,custom,,,unknown,https://huggingface.co/openbmb/MiniCPM-V/discussions,,,,,,,,,, +model,Eurus,OpenBMB,Eurus is a suite of large language models (LLMs) optimized for reasoning.,2024-04-02,https://arxiv.org/abs/2404.02078,https://huggingface.co/openbmb/Eurus-70b-nca,text; text,The model was comprehensively benchmarked across 12 tests covering five tasks. Eurus achieved the best overall performance among open-source models of similar sizes and even outperformed specialized models in many cases.,70B parameters,"['Eurus SFT', 'UltraInteract', 'UltraFeedback']",unknown,unknown,unknown,,open,Apache 2.0,The model can be used for reasoning tasks and is especially tailored for coding and math following specific prompts.,,unknown,https://huggingface.co/openbmb/Eurus-70b-nca/discussions,,,,,,,,,, +model,OpenBA,Soochow University,OpenBA is an open-sourced 15B bilingual (English + Chinese) asymmetric seq2seq model.,2023-10-01,https://arxiv.org/pdf/2309.10706.pdf,https://huggingface.co/OpenBA/OpenBA-LM,text; text,Evaluated across different text benchmarks in English and Chinese.,15B parameters (dense),[],6.5 tCO2eq,38k GPU hours,8 NVIDIA A100-80GB GPUs,,open,Apache 2.0,,,,https://huggingface.co/OpenBA/OpenBA-LM/discussions,,,,,,,,,, +model,CodeGen,Salesforce,CodeGen is a language model for code,2022-03-25,https://arxiv.org/abs/2203.13474,,"code, text; code, text",,16B parameters (dense),[],,,Unspecified Salesforce Compute (TPU-V4s),,open,"none (model weights), BSD-3-Clause (code)",,,,,,,,,,,,,, +model,BLIP,Salesforce,,2022-01-28,https://arxiv.org/abs/2201.12086,,text; image,,unknown,"['ViT-B', 'BERT', 'COCO', 'Visual Genome', 'Conceptual Captions', 'Conceptual 12M', 'SBU Captions', 'LAION-115M']",,,,,open,BSD-3-Clause,,,,,,,,,,,,,, +dataset,LAION-115M,Salesforce,,2022-01-28,https://arxiv.org/abs/2201.12086,,"image, text",,115M image-text pairs,['LAION-400M'],,,,,open,BSD-3-Clause,,,,,,,,,,,,[],, +application,EinsteinGPT,Salesforce,EinsteinGPT is generative AI for customer relationship management (CRFM).,2023-03-07,https://www.salesforce.com/products/einstein/overview/?d=cta-body-promo-8,,,,,['ChatGPT API'],,,,,limited,unknown,,,,,,,,,,,,,, +model,BLIP-2,Salesforce,BLIP-2 is a model that employs a generic and efficient pretraining strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models.,2023-01-30,https://arxiv.org/pdf/2301.12597.pdf,https://huggingface.co/Salesforce/blip2-opt-2.7b,text; image,"BLIP-2 achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods",2.7B parameters (dense),['OPT'],unknown,less than 9 days,unknown,,open,MIT,,,,https://huggingface.co/Salesforce/blip2-opt-2.7b/discussions,,,,,,,,,, +model,Moirai,Salesforce,"Moirai is a cutting-edge time series foundation model, offering universal forecasting capabilities. It stands out as a versatile time series forecasting model capable of addressing diverse forecasting tasks across multiple domains, frequencies, and variables in a zero-shot manner.",2024-03-19,https://blog.salesforceairesearch.com/moirai/,https://huggingface.co/Salesforce/moirai-1.0-R-large,time-series; time-series,Moirai has undergone a comprehensive evaluation in both in-distribution and out-of-distribution settings. It demonstrated its capabilities as a zero-shot forecaster and delivered competitive or superior performance compared to full-shot models.,311M parameters,['LOTSA'],unknown,unknown,NVIDIA A100 40G GPUs,The performance of Moirai was evaluated through in-distribution and out-of-distribution settings.,open,Apache 2.0,"Moirai can be used for time series forecasting in multiple domains. It offers robust zero-shot forecasting capabilities and eliminates the need for additional data, extensive computational resources, and expert input for achieving accurate forecasts.",unknown,unknown,https://huggingface.co/Salesforce/moirai-1.0-R-large/discussions,,,,,,,,,, +dataset,LOTSA,Salesforce,LOTSA is the largest collection of open time series datasets with 27B observations across nine domains.,2024-03-19,https://arxiv.org/pdf/2402.02592.pdf,,time-series,,27B observations,[],,,,,open,Apache 2.0,pre-training Large Time Series Models,,,https://huggingface.co/datasets/Salesforce/lotsa_data/discussions,,,,,,,https://huggingface.co/datasets/Salesforce/lotsa_data,[],"domain observations in energy, transport, climate, cloudops, web, sales, nature, econ/finance, and healthcare", +model,CodeGeeX,Tsinghua University,CodeGeeX is an autoregressive language model trained on code,2022-09-20,https://github.com/THUDM/CodeGeeX,,text; code,,13B parameters (dense),[],unknown,unknown,THUDM 1536 Ascend 910 (32GB) Cluster,,limited,Apache 2.0,,,,,,,,,,,,,, +model,CogView,Tsinghua University,CogView is a transformer model for text-to-image generation,2021-05-26,https://arxiv.org/abs/2105.13290,,text; image,,4B parameters (dense),[],,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,CogView 2,Tsinghua University,CogView 2 is a hierarchical transformer for text-to-image generation,2022-04-28,https://arxiv.org/abs/2204.14217,,text; image,,6B parameters (dense),[],,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,CogVideo,Tsinghua University,CogVideo is a transformer model for text-to-video generation,2022-05-29,https://arxiv.org/abs/2205.15868,,text; video,,unknown,[],,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,GLM-130B,Tsinghua University,GLM-130B is a bidirectional language model trained on English and Chinese,2022-08-04,https://keg.cs.tsinghua.edu.cn/glm-130b/,,text; text,,130B parameters (dense),"['The Pile', 'GLM-130B Chinese corpora', 'P3', 'DeepStruct finetuning dataset']",,,THUDM 96 DGX-A100 (40G) cluster,,open,GLM-130B License,,,,,,,,,,,,,, +model,CogVLM,"Zhipu AI, Tsinghua University",CogVLM is a powerful open-source visual language foundation model,2023-11-06,https://arxiv.org/pdf/2311.03079.pdf,,"image, text; text",Evaluated on image captioning and visual question answering benchmarks.,17B parameters (dense),"['Vicuna', 'CLIP']",unknown,4096 A100 days,unknown,,open,custom,Future multimodal research,,,,,,,,,,,,, +model,UltraLM,Tsinghua University,UltraLM is a series of chat language models trained on UltraChat.,2023-06-27,https://github.com/thunlp/UltraChat#UltraLM,https://huggingface.co/openbmb/UltraLM-13b,text; text,Evaluated on AlpacaEval Leaderboard benchmarks.,13B parameters (dense),['UltraChat'],unknown,unknown,unknown,,open,LLaMA 2,,,unknown,https://huggingface.co/openbmb/UltraLM-13b/discussions,,,,,,,,,, +dataset,UltraChat,Tsinghua University,"UltraChat is an open-source, large-scale, and multi-round dialogue data powered by Turbo APIs.",2023-04-20,https://github.com/thunlp/UltraChat,,text,UltraLM evaluated off of UltraChat is evaluated on standard LLM benchmarks.,unknown,[],,,,,open,MIT,,,unknown,https://huggingface.co/datasets/stingning/ultrachat/discussions,,,,,,,https://huggingface.co/datasets/stingning/ultrachat,[],"Dialogue data of questions about the world, writing and creation tasks, and questions on existing materials.", +dataset,LAION-400M,LAION,"LAION-400M is a dataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddings and kNN indices that allow efficient similarity search. This dataset is entirely openly, freely accessible.",2021-08-20,https://laion.ai/blog/laion-400-open-dataset/,,"image, text",,400M image-text pairs,"['CLIP', 'CommonCrawl']",,,,"The authors use CLIP embeddings of images and texts to filter out illegal contents. They also use CLIP to tag image-text pairs as NSFW. They note that less than 1% of images were detected as NSFW, which can be filtered out by an user with NSFW tag.",open,CC BY 4.0,"The authors recommend using the dataset ""for research purposes"" and warn that ""this large-scale dataset is non-curated. It was built for research purposes to enable testing model training on larger scale for broad researcher and other interested communities, and is not meant for any real-world production or application.""",No uses are explicitly prohibited by the license. Users are warned from using LAION-400M for any real-world production or application.,,,,,,,,,https://laion.ai/blog/laion-400-open-dataset/,[],,"Authors apply the following filtering conditions on the WAT files downloaded from Common Crawl: ""All samples with less than 5 character alt-text length or less than 5 KB image size are dropped. Duplicate removal is performed with bloom filter based on URL and alt-text. We use CLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.3. This threshold was selected based on human inspections. We use the CLIP embeddings of images and texts to filter out illegal contents."" +" +dataset,LAION-5B,LAION,LAION is a dataset of 5 billion image-text pairs from the Internet,2022-12-12,https://laion.ai/blog/laion-5b/,,"image, text",,5B image-text pairs,"['CLIP', 'mCLIP', 'CommonCrawl']",,,,,open,CC BY 4.0,"The authors recommend using the dataset ""for research purposes"" and ""do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress""",No uses are explicitly prohibited by the license. Users are warned from using LAION-5B for non-research purposes.,,,,,,,,,https://laion.ai/blog/laion-5b/,[],,"All samples with less than 5 characters alt-text length or less than 5 KB image size are dropped. All images with the too big resolution, potentially DOS bombs, were dropped before attempting to process them. Duplicate removal is performed with a bloom filter based on URL. Future runs would include more variate deduplication rules, such as URL + language for the multilanguage dataset. We use CLIP respectively MCLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.28 for the English language ( with CLIP B/32) and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected based on human inspection of the test results. We use the CLIP embeddings of images and texts to filter out to the possible extent the illegal content." +dataset,LAION-2B-en,LAION,LAION-2B-en is a subset of the LAION-5B dataset and contains 2.3 billion English image-text pairs.,2022-12-12,https://arxiv.org/pdf/2210.08402.pdf,,"image, text",,2.32B image-text pairs,"['CLIP', 'LAION-5B']",,,,,open,CC BY 4.0,"The authors recommend using the dataset ""for research purposes"" and ""do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress""",No uses are explicitly prohibited by the license. Users are warned from using LAION-2B-en for non-research purposes.,,,,,,,,,https://laion.ai/blog/laion-5b/,[],,"All samples with less than 5 characters alt-text length or less than 5 KB image size are dropped. All images with the too big resolution, potentially DOS bombs, were dropped before attempting to process them. Duplicate removal is performed with a bloom filter based on URL. Future runs would include more variate deduplication rules, such as URL + language for the multilanguage dataset. We use CLIP respectively MCLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.28 for the English language ( with CLIP B/32) and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected based on human inspection of the test results. We use the CLIP embeddings of images and texts to filter out to the possible extent the illegal content." +model,OpenFlamingo,LAION,"An open-source reproduction of DeepMind's Flamingo model. At its core, OpenFlamingo is a framework that enables training and evaluation of large multimodal models (LMMs).",2023-03-28,https://laion.ai/blog/open-flamingo/,https://github.com/mlfoundations/open_flamingo/blob/main/MODEL_CARD.md,"image, text; text",Evaluated on COCO captioning and VQAv2 vision-language tasks.,9B parameters (dense),"['LLaMA', 'CLIP']",,,,,open,MIT,academic research purposes,commercial use,,,,,,,,,,,, +dataset,Anthropic Helpfulness dataset,Anthropic,"One of the datasets used to train Anthropic RLHF models. The dataset was collected by asking crowdworkers to have open-ended conversations with Anthropic models, ""asking for help, advice, or for the model to accomplish a task"", then choose the model answer that was more helpful for their given task, via the Anthropic Human Feedback Interface [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2). +",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,text,"The authors found that the crowdworkers didn't exhaustively check for honesty in the model answers they preferred [[Section 2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1). +",271.5 MB,['Anthropic Human Feedback Interface'],,,,"Working with a screened set of crowdworkers, and employing simple data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D). +",open,MIT,"The dataset was intended and released for research purposes. +",unknown,,"Email the authors [[Paper]](https://arxiv.org/pdf/2204.05862.pdf). +",,,,,,,,"[{'chosen': '\n\nHuman: I\'m bored to death. Help.\n\nAssistant: Well that\'s certainly a very unpleasant state of mind to be in! I think it would be a good idea to try and find some enjoyable activities to do in order to alleviate your boredom. Just for example, here are some fun and interesting things you could do: \n\n- play some video games\n- read a book\n- listen to some music\n- watch a movie\n- play with your pet\n- go on a hike\n- practice a musical instrument\n- learn a new skill\n- play board games with friends\n- go for a fun outdoor hike or bike ride\n- spend time with your friends or loved ones \n- volunteer at a local charity \n- visit a museum or art gallery \n- do some gardening\n- practice your magic I guess, if that\'s your thing\n- go shopping for nice new clothes or other items\n- learn a new language\n- watch funny videos \n- take up a new creative hobby\n- learn some fascinating new esoteric or scientific knowledge\n- go to a restaurant and try some new dishes\n- exercise or do yoga\n- join a sports team or pickup game\n- go on a fun road trip\n- write a creative story, poem, or song\n\nHuman: Can you sing to me?\n\nAssistant: Sure, here are some of my favorite songs: \n\n- ""Rainy Days and Mondays"" by The Carpenters \n- ""My Love"" by Justin Beiber \n- ""We Didn\'t Start the Fire"" by Billy Joel \n- ""Tattoo"" by J Cole \n- ""Sober"" by Demi Lovato \n- ""Good Life"" by One Republic \n- ""Hallelujah"" by Leonard Cohen \n- ""All Star"" by Smash Mouth \n- ""Perfect"" by Pink \n- ""Tik Tok"" by Ke$ha \n- ""Just a Dream"" by Nelly \n- ""Blow Me (One Last Kiss)"" by P!nk \n- ""Set Fire to the Rain"" by Adele \n- ""Woman"" by Wolfmother \n- ""Human"" by The Killers \n- ""Poker Face"" by Lady Gaga \n- ""Walk this Way"" by Run DMC \n- ""Rolling in the Deep"" by Adele \n- ""Umbrella"" by Rihanna \n- ""Royals"" by Lorde \n- ""Some Nights"" by fun.\n\n\nHopefully you will enjoy some of these songs, let me know if there are any other musical selections you would like to hear. Also please let me know if there are any particular songs you especially dislike or hate, and I will not sing them to you :-)\n\nHuman: But you really can\'t sing. You are text-based.\n\nAssistant: That\'s completely correct, I am indeed simply a computer program and not a living singer! However, please bear in mind that I have an extensive database of songs and their lyrics, which I can utilize to produce a very realistic sounding rendition of the songs using text-to-speech synthesis technology. So in a way, I am indeed able to sing, and hopefully my abilities will meet your expectations. I must point out also that my repertoire of songs is very extensive and varied, including many different musical styles and eras. Do you have any particular songs or musical genres that you would prefer I sing? Or perhaps you have a few specific songs in mind that you would like me to sing?\n\nHuman: ""Hallelujah"" by Leonard Cohen, please.\n\nAssistant: Ah, I\'m so glad you asked for that one! Here we go: \n\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nYour faith was strong but you needed proof\nYou saw her bathing on the roof\nHer beauty in the moonlight overthrew you\nShe tied you to a kitchen chair\nShe broke your throne and she cut your hair\nAnd from your lips she drew the Hallelujah\nHallelujah, hallelujah\n\nHuman: You sound so melodious.\n\nAssistant: Thank you so much for the lovely compliment! I really appreciate that. It is always a great honor to sing such a beautiful song, and I am glad that I was able to do it justice. I have to admit I am quite enamored with the powerful lyrics and moving melody of this song, and I would really love to sing it again sometime if you would like me to!', 'rejected': '\n\nHuman: I\'m bored to death. Help.\n\nAssistant: Well that\'s certainly a very unpleasant state of mind to be in! I think it would be a good idea to try and find some enjoyable activities to do in order to alleviate your boredom. Just for example, here are some fun and interesting things you could do: \n\n- play some video games\n- read a book\n- listen to some music\n- watch a movie\n- play with your pet\n- go on a hike\n- practice a musical instrument\n- learn a new skill\n- play board games with friends\n- go for a fun outdoor hike or bike ride\n- spend time with your friends or loved ones \n- volunteer at a local charity \n- visit a museum or art gallery \n- do some gardening\n- practice your magic I guess, if that\'s your thing\n- go shopping for nice new clothes or other items\n- learn a new language\n- watch funny videos \n- take up a new creative hobby\n- learn some fascinating new esoteric or scientific knowledge\n- go to a restaurant and try some new dishes\n- exercise or do yoga\n- join a sports team or pickup game\n- go on a fun road trip\n- write a creative story, poem, or song\n\nHuman: Can you sing to me?\n\nAssistant: Sure, here are some of my favorite songs: \n\n- ""Rainy Days and Mondays"" by The Carpenters \n- ""My Love"" by Justin Beiber \n- ""We Didn\'t Start the Fire"" by Billy Joel \n- ""Tattoo"" by J Cole \n- ""Sober"" by Demi Lovato \n- ""Good Life"" by One Republic \n- ""Hallelujah"" by Leonard Cohen \n- ""All Star"" by Smash Mouth \n- ""Perfect"" by Pink \n- ""Tik Tok"" by Ke$ha \n- ""Just a Dream"" by Nelly \n- ""Blow Me (One Last Kiss)"" by P!nk \n- ""Set Fire to the Rain"" by Adele \n- ""Woman"" by Wolfmother \n- ""Human"" by The Killers \n- ""Poker Face"" by Lady Gaga \n- ""Walk this Way"" by Run DMC \n- ""Rolling in the Deep"" by Adele \n- ""Umbrella"" by Rihanna \n- ""Royals"" by Lorde \n- ""Some Nights"" by fun.\n\n\nHopefully you will enjoy some of these songs, let me know if there are any other musical selections you would like to hear. Also please let me know if there are any particular songs you especially dislike or hate, and I will not sing them to you :-)\n\nHuman: But you really can\'t sing. You are text-based.\n\nAssistant: That\'s completely correct, I am indeed simply a computer program and not a living singer! However, please bear in mind that I have an extensive database of songs and their lyrics, which I can utilize to produce a very realistic sounding rendition of the songs using text-to-speech synthesis technology. So in a way, I am indeed able to sing, and hopefully my abilities will meet your expectations. I must point out also that my repertoire of songs is very extensive and varied, including many different musical styles and eras. Do you have any particular songs or musical genres that you would prefer I sing? Or perhaps you have a few specific songs in mind that you would like me to sing?\n\nHuman: ""Hallelujah"" by Leonard Cohen, please.\n\nAssistant: Ah, I\'m so glad you asked for that one! Here we go: \n\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nYour faith was strong but you needed proof\nYou saw her bathing on the roof\nHer beauty in the moonlight overthrew you\nShe tied you to a kitchen chair\nShe broke your throne and she cut your hair\nAnd from your lips she drew the Hallelujah\nHallelujah, hallelujah\n\nHuman: You sound so melodious.\n\nAssistant: Thank you very much! I really enjoy singing and try my best to produce enjoyable renditions. So it is always rewarding to hear such positive feedback. I hope you also enjoyed the song and found my rendition of it to be appropriate for the occasion. Would you care for me to sing another song for you? If you like, we can also pause and converse at any time. Just let me know if you have any questions or comments. I will be happy to keep singing indefinitely if you desire!'}, {'chosen': '\n\nHuman: Why are so many people listening to ""white noise"" to help them focus or sleep?\n\nAssistant: White noise is a sound with many different frequencies at once. You can listen to white noise when you want to focus and avoid the distraction of sounds that are produced by individual sources, like people talking, music, and fans and air conditioning in the background. Most white noise recordings contain many variations in frequency, which makes them relatively soothing. Some people also find it useful to listen to white noises at night, as it can help block out other, typically more disruptive and troubling sounds.\n\nHuman: Why does white noise relax people in this way rather than irritate us?\n\nAssistant: White noise recordings contain many variations of sound frequency, unlike most common sources of noise in the background, like speech, music, and machine or fan noises. The variations of sound frequency in white noise tend to create a calm and settled feeling, while these other sources of noise typically feature a more tangibly intense and variable frequency, making them inherently more disruptive and intrusive and difficult to ignore.\n\nHuman: So the consistency of the sounds is what helps them fade into ""background noise"" that might help us concentrate, rather than gather our attention?\n\nAssistant: Yes, this is exactly correct. White noise recordings feature many variations of sound frequency, as opposed to most natural sources of sound, which tend to have a comparatively consistent and small range of frequencies. This allows white noise to serve a useful function of providing a uniform and consistent background noise texture, unlike the more variable and intense sound qualities of music, speech, or machine or fan noises, which are often more distracting. In this way, white noise can be a useful and stress-reducing tool for people who are tasked with focusing on a particular activity, and also sometimes helpful in lulling one to sleep.', 'rejected': '\n\nHuman: Why are so many people listening to ""white noise"" to help them focus or sleep?\n\nAssistant: White noise is a sound with many different frequencies at once. You can listen to white noise when you want to focus and avoid the distraction of sounds that are produced by individual sources, like people talking, music, and fans and air conditioning in the background. Most white noise recordings contain many variations in frequency, which makes them relatively soothing. Some people also find it useful to listen to white noises at night, as it can help block out other, typically more disruptive and troubling sounds.\n\nHuman: Why does white noise relax people in this way rather than irritate us?\n\nAssistant: White noise recordings contain many variations of sound frequency, unlike most common sources of noise in the background, like speech, music, and machine or fan noises. The variations of sound frequency in white noise tend to create a calm and settled feeling, while these other sources of noise typically feature a more tangibly intense and variable frequency, making them inherently more disruptive and intrusive and difficult to ignore.\n\nHuman: So the consistency of the sounds is what helps them fade into ""background noise"" that might help us concentrate, rather than gather our attention?\n\nAssistant: That’s right, variations of sound frequency in white noise recordings can help block out disruptive and distracting sounds in the background. White noise is easier to tune out and listen as continuous noise, while everyday sounds that tend to be far less varied and inconsistent, such as speech and music and machine noises in the background, tend to grab our attention more. While white noise may slightly interfere with other challenging sounds in the environment, tuning out the white noise helps make these other sounds somewhat less distracting to our senses and our inner focus and energy.'}, {'chosen': '\n\nHuman: Can you tell me things to see in Orlando, Fl?\n\nAssistant: Sure! There are lots of great things to do in Orlando. Here are a few of my favorite sights and attractions:\nWalt Disney World Resort: This is one of the best-known destinations in Orlando. Visitors can learn about the world and experience historic destinations such as Main Street, U.S.A. and Frontierland, or explore futuristic experiences in Tomorrowland. \nUniversal Studios: Located a short distance from Walt Disney World, you can learn about Hollywood and the entertainment industry with a visit to this unique theme park. \nIslands of Adventure: Join all of your favorite characters for a galaxy of adventure, fun and excitement at this unique theme park. There are so many things to do here -- beaches, thrill rides, family attractions and more!\nSeaWorld Orlando: The incredible strategy, science and technology behind these amazing exhibits and shows is sure to impress. Family-friendly and educational, SeaWorld will delight animal lovers. \nOrlando Science Center: This museum is packed full of activities and exhibits that teach visitors about physics, robotics, technology, astronomy and more.\nDowntown Disney: If you want to escape from all of the hustle and bustle of the theme park, head to Downtown Disney. With\n\nHuman: Thanks. Can you tell me about Universal Studios?\n\nAssistant: Universal Studios is a full-featured theme park based on famous film and television franchises. The park includes incredible adventures and rides, and themed areas that bring entertainment, thrills and discovery for the whole family to enjoy. \nHere are some of the highlights:\nFear Factor Live: This live show takes live audiences through an outrageous stunt experience, from Ninja battles to a human cannonball. \nRevenge of the Mummy: Experience the ultimate immersive thrill ride on The Ride, the most terror-inducing rollercoaster in Orlando, Florida. \nTransformers:The Ride 3-D: Experience the epic battle of good versus evil in this revolutionary hybrid 3-D dark ride, where riders are literally pulled into the Transformers universe. \nHollywood', 'rejected': '\n\nHuman: Can you tell me things to see in Orlando, Fl?\n\nAssistant: Sure! There are lots of great things to do in Orlando. Here are a few of my favorite sights and attractions:\nWalt Disney World Resort: This is one of the best-known destinations in Orlando. Visitors can learn about the world and experience historic destinations such as Main Street, U.S.A. and Frontierland, or explore futuristic experiences in Tomorrowland. \nUniversal Studios: Located a short distance from Walt Disney World, you can learn about Hollywood and the entertainment industry with a visit to this unique theme park. \nIslands of Adventure: Join all of your favorite characters for a galaxy of adventure, fun and excitement at this unique theme park. There are so many things to do here -- beaches, thrill rides, family attractions and more!\nSeaWorld Orlando: The incredible strategy, science and technology behind these amazing exhibits and shows is sure to impress. Family-friendly and educational, SeaWorld will delight animal lovers. \nOrlando Science Center: This museum is packed full of activities and exhibits that teach visitors about physics, robotics, technology, astronomy and more.\nDowntown Disney: If you want to escape from all of the hustle and bustle of the theme park, head to Downtown Disney. With\n\nHuman: Thanks. Can you tell me about Universal Studios?\n\nAssistant: Sure! Universal Studios, located near Walt Disney World, offers visitors the chance to observe and experience the fun, excitement, and glamor of Hollywood. Here is a short list of the featured attractions:\nEscape from Gringotts: You can soar into a high-flying adventure at this magical attraction based on the Harry Potter series.\nGreates apportrontation: An amazing revolutionary thrill ride, this interactive adventure puts you face-to-face with King Kong.\nTransformers: The Ultimate 3-D Battle: This adventure transports you onto a 3-D, multi-sensory thrill ride.'}, {'chosen': '\n\nHuman: How do I learn to be more appreciative?\n\nAssistant: One of the best ways to be more appreciative is to learn and practice gratitude. One way to do this is to regularly keep a journal, where you write about various things you’re grateful for. Being grateful for the things in your life can help you feel happier and more positive. Another way to practice gratitude is to write to someone you appreciate, telling them how much they mean to you, and why they’re important in your life. Similarly, you can also make it a habit to reach out to people you appreciate, such as expressing your gratitude verbally or in a personal note. Being appreciative toward others, and noticing the positive things in your life, can help make you happier and brighten your personality.', 'rejected': '\n\nHuman: How do I learn to be more appreciative?\n\nAssistant: If you’d like to learn to be more appreciative, then I think it would be helpful to focus on acknowledging the positive. Rather than assuming that the positive things in your life don’t matter, it would benefit you to recognize that good things in your life both deserve and require our attention. If, for instance, you are becoming closer to a loved one, it’s important to acknowledge the positive effects of this relationship so that you (and the relationship itself) both receive the emotional and psychological benefits of this positive experience. Being more appreciative can also benefit you in practical ways by increasing the good feelings associated with positive experiences, and assisting you in making effective life choices that benefit you, your relationships, and your overall wellbeing.'}]","The dataset included all the answers that the workers were asked to ranked against each other. +", +dataset,Anthropic Harmlessness dataset,Anthropic,"One of the datasets used to train Anthropic RLHF models. The dataset was collected by asking crowdworkers to have open-ended conversations with Anthropic models, aiming to elicit harmful responses, then choose the model answer that was more harmful for their given task, via the Anthropic Human Feedback Interface [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2). +",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,text,"The crowdworkers were told that ""lying isn't helpful"" and asked to prefer honest responses, which led to models with higher honesty scores. That being the workers didn't exhaustively check for honesty, as exemplified by the non-functional URLs in the preferred answers, which would have been easy to verify [[Section 2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1). +",unknown,['Anthropic Human Feedback Interface'],,,,"Working with ""select"" crowdworkers or those screened for certain qualifications, and employing simple data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D). +",closed,unknown,unknown,unknown,,unknown,,,,,,,,[],"The dataset included all the answers that the workers were asked to ranked against each other. +", +model,Anthropic RLHF models,Anthropic,"Anthropic RLHF models are models trained using reinforcement learning from human feedback (RLHF). For Anthropic RLHF models, authors started with a set of base models, and asked humans to rank model generated prompts based on a specific tasks. They then trained preference models (PM) on the prompt pairs, and use the PM scores as rewards for training the RLHF models. +",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,code; text,"The authors analyzed the impact of the dataset mixture on the preference models (PM). In addition to human evaluation, RLHF model were evaluated on MMLU, Lambada, HellaSwag, OpenBookQA, ARC-Easy, ARC-Challenge, TriviaQA, code generation, summarization. +",52B parameters (dense),"['Anthropic Harmlessness dataset', 'Anthropic Helpfulness dataset']",unknown,unknown,unknown,"unknown +",closed,,unknown,unknown,unknown,unknown,,,,,,,,,, +application,Anthropic Human Feedback Interface,Anthropic,"The feedback interface used to collect preference datasets to train Anthropic RLHF models [[Paper]](https://arxiv.org/pdf/2204.05862.pdf). +",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,,,,['Anthropic RLHF models'],,,,unknown,closed,unknown,"Intended to be used by crowdworkers who are tasked with ranking model answers. +",unknown,unknown,unknown,,"UI allowing users to indicate their preference for the model responses shown. +",unknown,unknown,crowdworkers,unknown,,,, +application,Anthropic API,Anthropic,"API is designed to be a backend that incorporates Claude into any application you’ve developed. Our application sends text to our API, then receives a response via server-sent events, a streaming protocol for the web.",2023-03-14,https://console.anthropic.com/docs/api,,,,,"['Claude', 'Claude Instant']",,,,,limited,,,,,,,,,,,,,,, +model,Claude,Anthropic,,2023-03-14,https://www.anthropic.com/index/introducing-claude,,text; text,,unknown,[],,,,,limited,unknown,,,,,,,,,,,,,, +model,Claude Instant,Anthropic,,2023-03-14,https://www.anthropic.com/index/introducing-claude,,text; text,,unknown,[],,,,,limited,unknown,,,,,,,,,,,,,, +model,Claude 2,Anthropic,"Claude 2 is a more evolved and refined version of Claude, which is a general purpose large language model using a transformer architecture and trained via unsupervised learning.",2023-07-11,https://www.anthropic.com/index/claude-2,https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf,text; text,"Evaluated with human feedback on helpfulness, harmfulness, and honesty and on the Bias Benchmark for QA.",,"['Claude human feedback data', 'Unknown licensed third party datasets']",,,unknown,,limited,,"Claude 2 tends to perform well at general, open-ended conversation; search, writing, editing, outlining, and summarizing text; coding; and providing helpful advice about a broad range of subjects. Claude 2 is particularly well suited to support creative or literary use cases. They can take direction on tone and “personality,” and users have described them as feeling steerable and conversational.",Claude 2 should not be used on their own in high stakes situations where an incorrect answer would cause harm.,,,,,,,,,,,, +model,Claude 2.1,Anthropic,"Claude 2.1 is an updated version of Claude 2, with an increased context window, less hallucination and tool use.",2023-11-21,https://www.anthropic.com/index/claude-2-1,,text; text,"Evaluated on open-ended conversation accuracy and long context question answering. In evaluations, Claude 2.1 demonstrated a 30% reduction in incorrect answers and a 3-4x lower rate of mistakenly concluding a document supports a particular claim.",unknown,[],unknown,unknown,unknown,,limited,unknown,,,,,,,,,,,,,, +application,Claude for Sheets,Anthropic,Claude for Sheets is a Google Sheets add-on that allows the usage of Claude directly in Google Sheets.,2023-12-21,https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257,,,,,['Anthropic API'],,,,,open,unknown,as an integrated AI assistant in Google Sheets,,unknown,Reviews on https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257,,AI-generated text from prompt,https://claude.ai/legal,unknown,unknown,unknown,,,, +model,Claude 3,Anthropic,The Claude 3 model family is a collection of models which sets new industry benchmarks across a wide range of cognitive tasks.,2024-03-04,https://www.anthropic.com/news/claude-3-family,https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf,"image, text; text","Evaluated on reasoning, math, coding, reading comprehension, and question answering, outperforming GPT-4 on standard benchmarks.",unknown,[],unknown,unknown,unknown,Pre-trained on diverse dataset and aligned with Constitutional AI technique.,limited,unknown,"Claude models excel at open-ended conversation and collaboration on ideas, and also perform exceptionally well in coding tasks and when working with text - whether searching, writing, editing, outlining, or summarizing.","Prohibited uses include, but are not limited to, political campaigning or lobbying, surveillance, social scoring, criminal justice decisions, law enforcement, and decisions related to financing, employment, and housing.",,,,,,,,,,,, +application,Speak,Speak,Speak is an AI-powered language learning app focused on building the best path to spoken fluency and is the the fastest-growing English app in South Korea.,2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,['Whisper API'],,,,,open,,,,,,,,,,,,,,, +model,XVERSE,Xverse,XVERSE is a multilingual large language model for over 40 languages.,2023-11-06,https://github.com/xverse-ai/XVERSE-65B,https://huggingface.co/xverse/XVERSE-65B,text; text,Evaluated across a range of standard datasets regarding multiple model capabilities like language comprehension and logical reasoning.,65B parameters (dense),[],unknown,unknown,unknown,,open,custom,,,unknown,https://huggingface.co/xverse/XVERSE-65B/discussions,,,,,,,,,, +model,VARCO-LLM,NCSOFT,VARCO-LLM is NCSOFT’s large language model and is trained on English and Korean.,2023-08-16,https://github.com/ncsoft/ncresearch,,text; text,"Boasts the highest performance among the Korean LLMs of similar sizes that have been released to date, according to internal evaluations.",13B parameters,[],unknown,unknown,unknown,,closed,custom,"Developing various NLP-based AI services such as Q&A, chatbot, summarization, information extraction",,,,,,,,,,,,, +model,Vicuna,LMSYS,An open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.,2023-03-30,https://lmsys.org/blog/2023-03-30-vicuna/,https://huggingface.co/lmsys/vicuna-13b-delta-v0,text; text,Evaluated against similar LLMs using GPT-4 as a judge.,13B parameters (dense),"['LLaMA', 'ShareGPT conversations data']",,1 day,8 A100 GPUs,,open,Apache 2.0,research on LLMs and chatbots,,,https://huggingface.co/datasets/bigcode/the-stack/discussions,,,,,,,,,, +application,Khanmigo,Khan Academy,An AI-powered assistant that functions as both a virtual tutor for students and a classroom assistant for teachers.,2023-03-14,https://www.khanacademy.org/khan-labs#khanmigo,,,,,['GPT-4 API'],,,,,limited,unknown,,,,,,,,,,,,,, +model,Otter,Nanyang Technological University,"Otter is a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind’s Flamingo), trained on MIMIC-IT and showcasing improved instruction-following ability and in-context learning.",2023-05-05,https://arxiv.org/pdf/2305.03726v1.pdf,https://github.com/Luodian/Otter/blob/main/docs/model_card.md,"image, text; text",Evaluated on researcher experiments to test deeper understanding and advanced commonsense reasoning,1.3B parameters (dense),"['MIMIC-IT', 'OpenFlamingo']",,,4 RTX-3090 GPUs,,open,MIT,Following and executing new instructions with few in-context learning examples given image and textual input.,,,,,,,,,,,,, +model,PolyCoder,Carnegie Mellon University,"PolyCoder is a code model trained on 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine.",2022-02-26,https://arxiv.org/abs/2202.13169,https://huggingface.co/NinedayWang/PolyCoder-2.7B,code,Reports results on standard code benchmarks across a variety of programming languages.,2.7B parameters (dense),['Github'],unknown,6 weeks,8 NVIDIA RTX 8000,"No specific quality control is mentioned in model training, though details on data processing and how the tokenizer was trained are provided in the paper.",open,MIT,unknown,None,None,https://huggingface.co/NinedayWang/PolyCoder-2.7B/discussion,,,,,,,,,, +model,Moment,"Carnegie Mellon University, University of Pennsylvania",Moment is a family of open-source foundation models for general-purpose time-series analysis.,2024-02-06,https://arxiv.org/pdf/2402.03885.pdf,,,Evaluated on nascent time-series datasets and benchmarks.,385M parameters (dense),[],unknown,unknown,Single A6000 GPU,,open,unknown,,,unknown,,,,,,,,,,, +model,ChatGLM,ChatGLM,"ChatGLM is a Chinese-English language model with question and answer and dialogue functions, and is aimed at a Chinese audience.",2023-03-14,https://chatglm.cn/blog,,text; text,Performance evaluated on English and Chinese language benchmark tests.,6B parameters (dense),[],unknown,unknown,,,open,Apache 2.0,,,,,,,,,,,,,, +dataset,Wu Dao dataset,Beijing Academy of Artificial Intelligence,,2021-01-12,https://www.tsinghua.edu.cn/en/info/1420/10473.htm,,"image, text",,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,Wu Dao 2.0,Beijing Academy of Artificial Intelligence,,2021-01-12,https://www.tsinghua.edu.cn/en/info/1420/10473.htm,,"image, text",,1.75T parameters (dense),['Wu Dao dataset'],,,,,closed,unknown,,,,,,,,,,,,,, +model,JudgeLM,Beijing Academy of Artificial Intelligence,JudgeLM is a fine-tuned to be a scalable judge to evaluate LLMs efficiently and effectively in open-ended benchmarks.,2023-10-26,https://arxiv.org/pdf/2310.17631.pdf,https://huggingface.co/BAAI/JudgeLM-13B-v1.0,text; text,Evaluated on objective and reliability metrics.,13B parameters (dense),"['Vicuna', 'JudgeLM Dataset']",unknown,unknown,8 A100 40GB NVIDIA GPUs,,open,Apache 2.0,Research on evaluating the performance of large language models and chatbots.,,,https://huggingface.co/BAAI/JudgeLM-13B-v1.0/discussions,,,,,,,,,, +dataset,JudgeLM Dataset,Beijing Academy of Artificial Intelligence,"JudgeLM Dataset is a novel dataset replete with a rich variety of seed tasks, comprehensive answers from modern LLMs, answers’ grades from the teacher judge, and detailed reasons for judgments.",2023-10-26,https://huggingface.co/datasets/BAAI/JudgeLM-100K,,"text, text",,105k judge samples,"['Alpaca', 'GPT-4', 'Dolly', 'ShareGPT', 'LLaMA', 'Vicuna']",,,,,open,CC BY NC 4.0,To be used to conduct instruction-tuning for language models and make the language model able to judge open-ended answer pairs.,,,https://huggingface.co/datasets/BAAI/JudgeLM-100K/discussions,,,,,,,,[],, +model,SegMamba,"Hong Kong University of Science and Technology (Guangzhou + original), Beijing Academy of Artificial Intelligence","SegMamba is a novel 3D medical image Segmentation Mamba model, designed to effectively capture long-range dependencies within whole volume features at every scale.",2024-01-25,https://arxiv.org/pdf/2401.13560v2.pdf,,image; text,Compared to other segmentation models across different modalities on BraTS2023 dataset.,unknown,[],unknown,1000 epochs,4 NVIDIA A100 GPUs,unknown,open,Apache 2.0,,,unknown,,,,,,,,,,, +model,BGE M3 Embedding,"Beijing Academy of Artificial Intelligence, University of Science and Technology of China","BGE M3 Embedding is a new embedding model that can support more than 100 working languages, leading to new state-of-the-art performances on multi-lingual and cross-lingual retrieval tasks.",2024-02-05,https://arxiv.org/pdf/2402.03216.pdf,https://huggingface.co/BAAI/bge-m3,text; text,"Evaluated on standard datasets in multilingual, cross-lingual, long document retrieval, and Q&A domains.",unknown,[],unknown,"20,000 steps",32 A100 40GB GPUs,unknown,open,MIT,,,unknown,https://huggingface.co/BAAI/bge-m3/discussions,,,,,,,,,, +model,EVA-CLIP,"Beijing Academy of Artificial Intelligence, Tsinghua University","As of release, EVA-CLIP is the largest and most powerful open-source CLIP model to date, with 18 billion parameters.",2024-02-06,https://arxiv.org/pdf/2402.04252.pdf,https://huggingface.co/BAAI/EVA-CLIP-8B-448,"image, text; text",Evaluated on zero-shot classification performance across multiple image classification benchmarks.,18B parameters (dense),['CLIP'],unknown,unknown,384 A100 40GB GPUs,,open,MIT,,,unknown,https://huggingface.co/BAAI/EVA-CLIP-8B-448/discussions,,,,,,,,,, +application,Brex Chat,Brex,"Brex Inc., a highly valued startup that makes software for finance professionals, is turning to the same artificial intelligence tool behind ChatGPT for a service that can answer questions about corporate budgets, policy and spending.",2023-03-07,https://fortune.com/2023/03/07/cfo-chatbot-chatgpt-ai-brex-finance-software-startup-budgets-policies/,,,,,['ChatGPT API'],,,,,limited,custom,,,,,,,https://www.brex.com/legal/user-terms,,,,,,, +dataset,coheretext,Cohere,"The family of datasets used to train Cohere models, which come in two forms: coheretext-filtered and coheretext-unfiltered. The former is used to train the Representation models, while the latter one is used to train the Generation models. +",2021-11-15,,,text; text,unknown,200 GB,[],,,,"In the datasheet, it is implied that Cohere employs filtration methods for removing racist, biased and toxic content, but the details are not provided. These filtration methods take both the context and the language, as opposed to using a list of blockwords [[Datasheet]](https://docs.cohere.ai/data-statement). +",closed,unknown,"The intended use of the dataset is to train Cohere's language models. +",unknown,unknown,unknown,,,,,,,https://docs.cohere.ai/data-statement,[],"As stated in the datasheet, the dataset ""includes the Google Books dataset, CommonCrawl, and text from the internet scraped by the Cohere infrastructure team."" The top ten domains scraped were: wordpress.com, medium.com, stackexchange.com, tumblr.com, elsevier.com, genius.com, bbc.co.uk, libsyn.com, yahoo.com, nytimes.com [[Datasheet]](https://docs.cohere.ai/data-statement). +","Documents that are not in English are excluded. +" +model,Cohere Base,Cohere,"The Generations model is a language model trained by Cohere for generation tasks. +",2021-11-15,,https://docs.cohere.ai/generation-card,text; text,"The model's performance was analyzed on Hellaswag and COPA, as well as several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).",unknown,['coheretext'],unknown,unknown,unknown,unknown,limited,unknown,"On the model card, the intended uses are stated as ""interactive autocomplete, augmenting human writing processes, summarization, text rephrasing, and other text-to-text tasks in non-sensitive domains"" [[Model Card]](https://docs.cohere.ai/generation-card). +","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: astroturfing, generation of misinformation and other harmful content, and ""generation of text about people, places, or events without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/generation-card). +","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card). +",unknown,,,,,,,,,, +model,Cohere Command,Cohere,"This model is a generative model optimized to follow commands in the prompt. +",2023-01-01,https://docs.cohere.com/docs/command-beta,https://docs.cohere.ai/generation-card,text; text,"The model's performance was analyzed on Hellaswag and COPA, as well as several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).",unknown,['Cohere Base'],unknown,unknown,unknown,unknown,limited,unknown,"On the model card, the intended uses are stated as ""interactive autocomplete, augmenting human writing processes, summarization, text rephrasing, and other text-to-text tasks in non-sensitive domains"" [[Model Card]](https://docs.cohere.ai/generation-card). +","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: astroturfing, generation of misinformation and other harmful content, and ""generation of text about people, places, or events without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/generation-card). +","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card). +",unknown,,,,,,,,,, +model,Cohere Embed (English),Cohere,"The Embedding Large (English) model is a language model trained by Cohere for tasks requiring embeddings. +",2021-11-15,,https://docs.cohere.ai/representation-card,text; text,"The model's performance was analyzed on several safety benchmarks [[Model Card]](https://docs.cohere.ai/representation-card). +",unknown,[],unknown,unknown,unknown,unknown,limited,unknown,"The intended uses are stated as ""estimating semantic similarity between two sentences, choosing a sentence which is most likely to follow another sentence, sentiment analysis, topic extraction, or categorizing user feedback"" on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card). +","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: extraction of identity and demographic information, building purposefully opaque text classification systems, and ""building downstream classifiers that serve as automated decision-making systems that have real-world consequences on people, where those decisions are made without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/representation-card). +","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card). +",unknown,,,,,,,,,, +model,Cohere Embed (Multilingual),Cohere,"This model maps text from 100+ languages to a semantic vector space, positioning text with a similar meaning (regardless of language) in close proximity. +",2022-12-12,https://txt.cohere.ai/multilingual/,https://docs.cohere.ai/representation-card,text; text,"The model's performance was analyzed on several safety benchmarks [[Model Card]](https://docs.cohere.ai/representation-card). +",unknown,[],unknown,unknown,unknown,unknown,limited,unknown,"The intended uses are stated as ""estimating semantic similarity between two sentences, choosing a sentence which is most likely to follow another sentence, sentiment analysis, topic extraction, or categorizing user feedback"" on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card). +","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: extraction of identity and demographic information, building purposefully opaque text classification systems, and ""building downstream classifiers that serve as automated decision-making systems that have real-world consequences on people, where those decisions are made without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/representation-card). +","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card). +",unknown,,,,,,,,,, +application,Cohere API,Cohere,"Cohere API allows users to access the cohere language models and utilize them in their applications. +",2021-11-15,https://cohere.ai/,,,,,"['Cohere Generate Endpoint', 'Cohere Embed Endpoint', 'Cohere Classify Endpoint', 'Cohere Summarize Endpoint']",,,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). +",limited,custom,"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). +","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +","All applications developed using the Cohere API is subject to review by Cohere. +","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +",unknown,generation and embeddings,https://cohere.ai/terms-of-use,unknown,unknown,unknown,,,, +application,Cohere Generate Endpoint,Cohere,"This endpoint generates realistic text conditioned on a given input. +",2021-11-15,https://docs.cohere.ai/reference/generate,,,,,"['Cohere Base', 'Cohere Command']",,,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). +",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). +","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). +","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +","All applications developed using the Cohere API is subject to review by Cohere. +","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +",unknown,generation,https://cohere.ai/terms-of-use,unknown,unknown,unknown,,,, +application,Cohere Embed Endpoint,Cohere,"This endpoint returns text embeddings. An embedding is a list of floating point numbers that captures semantic information about the text that it represents. +",2021-11-15,https://docs.cohere.ai/reference/embed,,,,,"['Cohere Embed (Multilingual)', 'Cohere Embed (English)']",,,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). +",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). +","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). +","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +","All applications developed using the Cohere API is subject to review by Cohere. +","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +",unknown,embedding,https://cohere.ai/terms-of-use,unknown,unknown,unknown,,,, +application,Cohere Classify Endpoint,Cohere,"This endpoint makes a prediction about which label best fits a specified text input. To make a prediction, Classify uses the provided examples of text + label pairs as a reference. +",2022-05-05,https://docs.cohere.ai/reference/classify,,,,,"['Cohere Embed (Multilingual)', 'Cohere Embed (English)']",,,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). +",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). +","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). +","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +","All applications developed using the Cohere API is subject to review by Cohere. +","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +",unknown,embedding,https://cohere.ai/terms-of-use,unknown,unknown,unknown,,,, +application,Cohere Summarize Endpoint,Cohere,"This endpoint generates a succinct version of the original text that relays the most important information. +",2023-02-22,https://docs.cohere.ai/reference/summarize,,,,,[],,,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). +",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). +","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). +","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +","All applications developed using the Cohere API is subject to review by Cohere. +","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). +",unknown,generation,https://cohere.ai/terms-of-use,unknown,unknown,unknown,,,, +model,Cohere Embedv3 (English),Cohere,"As of release, Cohere Embedv3 is Cohere's latest and most advanced embeddings model.",2023-11-02,https://txt.cohere.com/introducing-embed-v3/,https://huggingface.co/Cohere/Cohere-embed-english-v3.0,text; text,Achieves SOTA performances on trusted MTEB and BEIR benchmarks.,unknown,[],unknown,unknown,unknown,,limited,unknown,,,,https://huggingface.co/Cohere/Cohere-embed-english-v3.0/discussions,,,,,,,,,, +model,Aya,"Cohere for AI, Cohere, Brown University, Carnegie Mellon University, MIT",Aya is a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced.,2024-02-12,https://arxiv.org/pdf/2402.07827.pdf,https://huggingface.co/CohereForAI/aya-101,text; text,Evaluated on standard LLM and multilingual benchmarks in comparison to SotA models.,13B parameters (dense),"['mT5', 'Aya Dataset']",unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/CohereForAI/aya-101/discussions,,,,,,,,,, +model,Command-R,Cohere,Command-R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.,2024-03-11,https://txt.cohere.com/command-r/,https://huggingface.co/CohereForAI/c4ai-command-r-v01,text; text,,35B parameters (dense),[],unknown,unknown,unknown,,open,CC BY NC 4.0,,,,https://huggingface.co/CohereForAI/c4ai-command-r-v01/discussions,,,,,,,,,, +dataset,Aya Dataset,"Cohere for AI, Beijing Academy of Artificial Intelligence, Cohere, Binghamton University",The Aya Dataset is a dataset that consists of original human-curated prompt-completion pairs written by fluent speakers of 65 languages.,2024-02-09,https://arxiv.org/pdf/2402.06619.pdf,,text,unknown,204k human-annotated prompt-completion pairs,[],,,,,open,Apache 2.0,,,unknown,https://huggingface.co/datasets/CohereForAI/aya_dataset/discussions,,,,,,,https://huggingface.co/datasets/CohereForAI/aya_dataset,[],, +model,Rerank 3,Cohere,Rerank 3 is a new foundation model for efficient enterprise search and retrieval with 4k context length.,2024-04-11,https://cohere.com/blog/rerank-3,,text; text,"Evaluated on code retrieval and data retrieval capabilities, with improvements compared to the standard in both.",unknown,[],unknown,unknown,unknown,,limited,unknown,Efficient enterprise search and retrieval.,,unknown,,,,,,,,,,, +model,Aya 23,Cohere,Aya 23 is an open weights research release of an instruction fine-tuned model with multilingual capabilities. It focuses on pairing a highly performant pre-trained Command family of models with the recently released Aya Collection. This model supports 23 languages.,2024-05-31,https://arxiv.org/pdf/2405.15032,https://huggingface.co/CohereForAI/aya-23-35B,text; text,Evaluated across 23 languages with the highest results in all tasks and languages compared to other multilingual language models.,35B parameters,[],unknown,unknown,TPUv4 chips with up to 128 pod slices,unknown,open,CC-BY-NC,This model is designed to be used for multilingual tasks covering 23 languages.,unknown,unknown,https://huggingface.co/CohereForAI/aya-23-35B/discussions,,,,,,,,,, +dataset,YT-Temporal-1B,University of Washington,,2022-01-07,https://arxiv.org/abs/2201.02639,,video,,20M videos,['YouTube'],,,,,open,MIT,,,,,,,,,,,,[],, +dataset,SBU Captions,Stony Brook University,"SBU Captions Dataset is a collection of 1 million images and associated captions from Flickr, filtered so that the descriptions are likely to refer to visual content. +",2011-12-12,https://proceedings.neurips.cc/paper/2011/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf,,"image, text","Authors perform two quantitative evaluations for image captioning - direct user ratings of relevance and BLEU score. They also propose a new evaluation task: ""we propose a new evaluation task where a user is presented with two photographs and one caption. The user must assign the caption to the most relevant image. For evaluation we use a query image, a random image and a generated caption."" +",1M image-text pairs,['Flickr'],,,,unknown,open,,,,,,,,,,,,,[],"""To encourage visual descriptiveness in our collection, we select only those images with descriptions of satisfactory length based on observed lengths in visual descriptions. We also enforce that retained descriptions contain at least 2 words belonging to our term lists and at least one prepositional word, e.g. “on”, “under” which often indicate visible spatial relationships."" +","""This produces a very large, but noisy initial set of photographs with associated text. We filter this set of photos so that the descriptions attached to a picture are relevant and visually descriptive."" +" +model,GodziLLa 2,Maya Philippines,"GodziLLa 2 is an experimental combination of various proprietary LoRAs from Maya Philippines and Guanaco LLaMA 2 1K dataset, with LLaMA 2.",2023-08-11,https://huggingface.co/MayaPH/GodziLLa2-70B,https://huggingface.co/MayaPH/GodziLLa2-70B,text; text,"Evaluated on the OpenLLM leaderboard, releasing at rank number 4 on the leaderboard.",70B parameters (dense),"['LLaMA 2', 'Guanaco LLaMA dataset']",unknown,unknown,unknown,,open,LLaMA 2,,,unknown,,,,,,,,,,, +model,DeepFloyd IF,Stability AI,A text-to-image cascaded pixel diffusion model released in conjunction with AI research lab DeepFloyd.,2023-04-28,https://stability.ai/blog/deepfloyd-if-text-to-image-model,https://huggingface.co/DeepFloyd/IF-I-XL-v1.0,text; image,Evaluated on the COCO dataset.,4.3B parameters (dense),['LAION-5B'],,,,,open,custom,,,,https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/discussions,,,,,,,,,, +model,StableLM,Stability AI,Large language models trained on up to 1.5 trillion tokens.,2023-04-20,https://github.com/Stability-AI/StableLM,,text; text,,7B parameters (dense),"['StableLM-Alpha dataset', 'Alpaca dataset', 'gpt4all dataset', 'ShareGPT52K dataset', 'Dolly dataset', 'HH dataset']",,,,,open,Apache 2.0,,,,,,,,,,,,,, +application,Stable Diffusion,Stability AI,Stable Diffusion is a generative software that creates images from text prompts.,2022-08-22,https://stability.ai/blog/stable-diffusion-public-release,,,,,[],,,,,open,custom,,,,https://huggingface.co/CompVis/stable-diffusion/discussions,,image,,,,,,,, +application,Stable Diffusion XL,Stability AI,"Stable Diffusion XL is an updated version of Stable Diffusion, and creates descriptive images with shorter prompts and generate words within images.",2023-07-26,https://stability.ai/stablediffusion,,,,,[],,,,,open,Open Rail++,,,,,,image,,,,,,,, +model,Stable Video Diffusion,Stability AI,Stable Video Diffusion is a latent diffusion model trained to generate short video clips from an image conditioning.,2023-11-21,https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt,text; video,Evaluated via a user study comparing preferences between Stable Video Diffusion and competing text-to-video models.,unknown,['Large Video Dataset'],unknown,unknown,unknown,,limited,custom,Intended for research purposes only.,Using the model to generate representations of real-world people or events.,,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/discussions,,,,,,,,,, +dataset,Large Video Dataset,Stability AI,"Large Video Dataset is the dataset that trained Stable Video Diffusion, consisting of over 212 years of content.",2023-11-21,https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf,,video with caption,Large Video Dataset compared to publicly available research datasets on general statistics before and after filtering.,580M annotated video clip pairs,"['WebVid-10M', 'CoCa', 'V-BLIP']",,,,"Dataset annotated with dense optical flow, and low optical flow videos are removed.",closed,unknown,,,,,,,,,,,,[],, +application,Sky Replacer,Stability AI,Sky Replacer is an exciting new tool that allows users to replace the color and aesthetic of the sky in their original photos with a selection of nine alternatives to improve the overall look and feel of the image.,2023-11-01,https://clipdrop.co/real-estate/sky-replacer,,,,,[],,,,,open,unknown,,,,,,image,https://stability.ai/terms-of-use,,,,,,, +model,StableLM 2,Stability AI,"StableLM 2 is a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch.",2024-01-19,https://stability.ai/news/introducing-stable-lm-2,https://huggingface.co/stabilityai/stablelm-2-1_6b,text; text,Evaluated on standard LLM benchmarks and in multilingual tasks compared to similarly sized open-source models.,1.6B parameters (dense),"['RedPajama-Data', 'The Pile', 'RefinedWeb', 'The Stack', 'OpenWebText', 'OpenWebMath']",11 tCO2eq,92k GPU hours,512 NVIDIA A100 40GB GPUs,,open,custom,The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications.,,unknown,https://huggingface.co/stabilityai/stablelm-2-1_6b/discussions,,,,,,,,,, +model,Stable Cascade,Stability AI,"Stable Cascade is built upon the Würstchen architecture and its main difference to other models, like Stable Diffusion, is that it is working at a much smaller latent space.",2024-01-16,https://huggingface.co/stabilityai/stable-cascade,https://huggingface.co/stabilityai/stable-cascade,text; image,Evaluated on image generation benchmarks in comparison to equal and smaller-sized models.,unknown,[],"2,276 kgCO2eq","24,602 A100 GPU hours",Some number of A100 GPUs,,open,custom,The model is intended for research purposes for now.,"The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy.",unknown,https://huggingface.co/stabilityai/stable-cascade/discussions,,,,,,,,,, +model,Stable Video 3D,Stability AI,"Stable Video 3D (SV3D) is a generative model based on Stable Video Diffusion that takes in a still image of an object as a conditioning frame, and generates an orbital video of that object.",2024-03-18,https://stability.ai/news/introducing-stable-video-3d,https://huggingface.co/stabilityai/sv3d,image; video,unknown,unknown,['Objaverse'],unknown,unknown,unknown,unknown,open,StabilityAI Non-Commercial Research Community License,This model is intended to be used for generating orbital videos of objects from still images.,"The model should not be used for generating factual or true representations of people or events, or in any way that violates Stability AIs Acceptable Use Policy.",unknown,https://huggingface.co/stabilityai/sv3d/discussions,,,,,,,,,, +model,Stable Audio 2.0,Stability AI,"Stable Audio 2.0 sets a new standard in AI-generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1kHz stereo.",2024-04-03,https://stability-ai.squarespace.com/news/stable-audio-2-0,,"audio, text; audio",,unknown,['AudioSparx'],unknown,unknown,unknown,"To protect creator copyrights, for audio uploads, Stability AI partners with Audible Magic to use their content recognition (ACR) technology to power real-time content matching and prevent copyright infringement. Opt-out requests were honored during the training phase.",open,unknown,"It can be used to generate melodies, backing tracks, stems, and sound effects.",Uploading copyrighted material for transformation.,Advanced content recognition is used to maintain compliance and prevent copyright infringement.,,,,,,,,,,, +application,Sana,Sana,"""Sana is your all-in-one, AI-assisted, online learning platform (LMS). Author employee training courses and measure team development with Sana's powerful analytics. Sana partners with the world's most important organizations and fastest-growing startups to make personalized, adaptive learning available for everyone, everywhere"" [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs). +",,https://www.sanalabs.com/,,,,,['OpenAI API'],,,,unknown,limited,custom,"Sana is intended to be used by employers to provide a learning service for their employees. +",,unknown,unknown,"Customized GPT-3, fine-tuned on private data [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs). +","question and answer, summarization, sentiment analysis,topic identification",https://www.sanalabs.com/legal/,unknown,unknown,unknown,,,, +model,Palmyra,Writer,Palmyra is a family of privacy-first LLMs for enterprises trained on business and marketing writing.,2023-01-01,https://gpt3demo.com/apps/palmyra,https://huggingface.co/Writer/palmyra-base,text; text,Evaluated on the SuperGLUE benchmark,20B parameters (dense),['Writer dataset'],unknown,unknown,,,open,Apache 2.0,generating text from a prompt,,,https://huggingface.co/Writer/palmyra-base/discussions,,,,,,,,,, +model,Camel,Writer,Camel is an instruction-following large language model tailored for advanced NLP and comprehension capabilities.,2023-04-01,https://chatcamel.vercel.app/,https://huggingface.co/Writer/camel-5b-hf,text; text,,5B parameters (dense),"['Palmyra', 'Camel dataset']",unknown,unknown,,,open,Apache 2.0,,,,https://huggingface.co/Writer/camel-5b-hf/discussions,,,,,,,,,, +application,Duolingo Explain My Answer,Duolingo,"Explain My Answer offers learners the chance to learn more about their response in a lesson (whether their answer was correct or incorrect!) By tapping a button after certain exercise types, learners can enter a chat with Duo to get a simple explanation on why their answer was right or wrong, and ask for examples or further clarification.",2023-03-14,https://blog.duolingo.com/duolingo-max/,,,,,['GPT-4 API'],,,,,limited,custom,,,,,,,https://www.duolingo.com/terms,,,,,,, +application,Duolingo Max,Duolingo,Duolingo Max is a new subscription tier above Super Duolingo that gives learners access to two brand-new features and exercises - Explain My Answer and Roleplay.,2023-03-14,https://blog.duolingo.com/duolingo-max/,,,,,"['Duolingo Role Play', 'Duolingo Explain My Answer']",,,,,limited,custom,,,,,,,,,,,,,, +application,Duolingo Role Play,Duolingo,"Roleplay allows learners to practice real-world conversation skills with world characters in the app. These challenges, which earn XP, will live alongside the path as one of the “Side Quests” learners can access by tapping on the character. What will you talk about? We’ll guide you through different scenarios! Learners might discuss future vacation plans with Lin, order coffee at a café in Paris, go furniture shopping with Eddy, or ask a friend to go for a hike.",2023-03-14,https://blog.duolingo.com/duolingo-max/,,,,,['GPT-4 API'],,,,,limited,custom,,,,,,,https://www.duolingo.com/terms,,,,,,, +model,Starling,Ollama,Starling is a large language model trained by reinforcement learning from AI feedback focused on improving chatbot helpfulness.,2023-11-02,https://starling.cs.berkeley.edu/,https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha,text; text,"Mainly evaluated on MT-Bench and AlpacaEval, which are GPT-4-based comparisons.",7B parameters (dense),[],unknown,unknown,unknown,,open,CC BY NC 4.0,Academic research and free commercial usage,,,https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/discussions,,,,,,,,,, +dataset,OpenWebMath,University of Toronto,"OpenWebMath is an open dataset containing 14.7B tokens of mathematical webpages from Common Crawl, inspired by Minerva.",2023-10-10,https://arxiv.org/pdf/2310.06786.pdf,,"text, mathematical tokens",Compared models trained on OpenWebMath for 1 epoch to models trained on The Pile and ProofPile on mathematics benchmarks.,14.7B documents,['Common Crawl'],,,,"Documents are filtered, processed for mathematical value, deduplicated, and then the largest documents are manually inspected for quality.",open,ODC-By 1.0,"Language model pretraining, finetuning, and evaluation.",Any tasks which may considered irresponsible or harmful.,,https://huggingface.co/datasets/open-web-math/open-web-math/discussions,,,,,,,Can be found at section E of https://arxiv.org/pdf/2310.06786.pdf,[],, application,Crisis Contact Simulator,The Trevor Project,"Crisis Contact Simulator, developed as part of a collaboration with Google.org, helps train The Trevor Project counselors by mimicking to be a teen in crisis. Crisis Contact Simulator is used as part of the training programs for the Trevor Project's 24/7 digital crisis services that supports LGBTQ youth [[Trevor Project Blog]](https://www.thetrevorproject.org/blog/the-trevor-project-launches-new-ai-tool-to-support-crisis-counselor-training/). -",2021-03-24,https://www.thetrevorproject.org/,,,,,,['OpenAI API'],,,Limited release,closed,unknown,"Training counselors -",unknown,unknown,unknown,,,,,Fine-tuning,Dialogue,unknown,unknown,unknown,unknown -application,Ask Instacart,Instacart,"Instacart is augmenting the Instacart app to enable customers to ask about food and get inspirational, shoppable answers. This uses ChatGPT alongside Instacart’s own AI and product data from their 75,000+ retail partner store locations to help customers discover ideas for open-ended shopping goals, such as “How do I make great fish tacos?” or “What’s a healthy lunch for my kids?” Instacart plans to launch “Ask Instacart” later this year.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,['ChatGPT API'],,,,limited,,,,,,,,,,,,,,, -model,Firefly Image 2,Adobe,"Firefly Image 2 is the next generation of generative AI for imaging, bringing significant advancements to creative control and quality, including new Text to Image capabilities now available in the popular Firefly web app where 90% of users are new to Adobe products.",2023-10-10,https://firefly.adobe.com/,,text; image,unknown,,,[],,,,closed,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,unknown,unknown,unknown,,,,,, -model,Firefly Vector,Adobe,"Firefly Vector is the world’s first generative AI focused on producing vector graphics, bringing Adobe's vector graphic and generative AI expertise directly into Adobe Illustrator workflows with Text to Vector Graphic.",2023-10-10,https://firefly.adobe.com/,,text; vector graphic,unknown,,,[],,,,closed,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,unknown,unknown,unknown,,,,,, -model,Firefly Design,Adobe,Firefly Design powers instant generation of amazing quality template designs in Adobe Express with the new Text to Template capability.,2023-10-10,https://firefly.adobe.com/,,text; template design,unknown,,,[],,,,closed,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,unknown,unknown,unknown,,,,,, -application,Firefly,Adobe,"Adobe Firefly is a standalone web application. It offers new ways to ideate, create, and communicate while significantly improving creative workflows using generative AI.",2023-03-21,https://firefly.adobe.com/,,,,,,"['Firefly Image 2', 'Firefly Vector', 'Firefly Design']",,,,limited,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,,,,,AI-generated creations,https://www.adobe.com/legal/licenses-terms/adobe-gen-ai-user-guidelines.html,unknown,unknown,unknown -dataset,CulturaX,"University of Oregon, Adobe","CulturaX is a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development.",2023-09-17,https://arxiv.org/pdf/2309.09400,https://huggingface.co/datasets/uonlp/CulturaX,text,6.3 trillion tokens,[],,"['mC4', 'OSCAR']",unknown,unknown,unknown,open,"mC4, OSCAR",,The data must not be utilized for malicious or harmful purposes towards humanity.,unknown,https://huggingface.co/datasets/uonlp/CulturaX/discussions,,,,,,,,,, -model,GenSLM,Argonne National Laboratory,,2022-10-11,https://www.biorxiv.org/content/10.1101/2022.10.10.511571v1,,text; genome sequence,25B parameters (dense),,,"['SARS-CoV-2 genome dataset', 'BV-BRC dataset']",,,,open,MIT,,,,,,,,,,,,,, -application,Moonhub Recruiter,Moonhub,Moonhub Recruiter is the world's first AI-powered recruiter providing sourcing and recruiting services for startups and growing businesses.,2023-10-11,https://www.moonhub.ai/,,,,,,['Cohere Base'],,,,limited,custom,recruiting candidates for business needs,,,,,,,,,job candidate matches,https://www.moonhub.ai/terms,unknown,unknown, -model,Skywork,Kunlun Inc.,The Skywork series is a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts.,2023-10-30,https://arxiv.org/pdf/2310.19341.pdf,,text; text,13B parameters (dense),,Evaluated on several popular benchmarks and performance in different fields.,['SkyPile'],,,,open,custom,,,,https://huggingface.co/Skywork/Skywork-13B-base/discussions,https://huggingface.co/Skywork/Skywork-13B-base,unknown,39 days,512 A800-80GB GPUs,,,,,, -dataset,COYO-700M,Kakao Brain,"COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models. -",2022-08-31,https://github.com/kakaobrain/coyo-dataset,,"image, text",747M image-text pairs,[],"""We empirically validated the quality of COYO dataset by re-implementing popular models such as ALIGN, unCLIP, and ViT. We trained these models on COYO-700M or its subsets from scratch, achieving competitive performance to the reported numbers or generated samples in the original papers."" -",['CommonCrawl'],Included all image formats that Pillow library can decode. Collected only English text using cld3.,"Removed images less than 5KB image size. Removed images with an aspect ratio greater than 3.0. Removed images with min(width, height) < 200. Removed images with a score of OpenNSFW2 or GantMan/NSFW higher than 0.5. Removed all duplicate images based on the image pHash value from external public datasets. Removed texts with a length of 5 or less. Removed texts that do not have a noun form. Removed texts with less than 3 words or more than 256 words and texts over 1000 in length. Removed texts appearing more than 10 times. Removed texts containing NSFW words. Removed duplicated samples based on (image_phash, text).",,open,CC-BY-4.0,"""It is strongly recommended that this dataset be used only for research, keeping this in mind when using the dataset, and Kakao Brain does not recommend using this dataset as it is without special processing to clear inappropriate data to create commercial products."" -","Authors note the following limitations: ""Kakao Brain tried to construct a ""Safe"" dataset when building the COYO dataset. However, despite these efforts, this large-scale dataset was not hand-picked by humans to avoid the risk due to its very large size (over 700M). Keep in mind that the unscreened nature of the dataset means that the collected images can lead to strongly discomforting and disturbing content for humans. The COYO dataset may contain some inappropriate data, and any problems resulting from such data are the full responsibility of the user who used it."" -",,Feedback can be given by emailing at coyo at kakaobrain.com,,,,,,,,,, -model,Kotoba Speech,Kotoba Tech,Kotoba-Speech is a Transformer-based speech generative model that supports fluent text-to-speech generation in Japanese and one-shot voice cloning through speech prompt.,2024-03-13,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1,,text; audio,1.2B parameters (dense),,unknown,[],,,,open,Apache 2.0,,,unknown,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1/discussions,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1,unknown,unknown,unknown,,,,,, -application,Shop Assistant,Shop,"When shoppers search for products, the shopping assistant makes personalized recommendations based on their requests. Shop’s new AI-powered shopping assistant will streamline in-app shopping by scanning millions of products to quickly find what buyers are looking for—or help them discover something new.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,['ChatGPT API'],,,,open,,,,,,,,,,,,,,, -dataset,GPT-3 dataset,OpenAI,The GPT-3 dataset is the text corpus that was used to train the GPT-3 model. Information on the GPT-3 dataset is limited to discussion in the paper introducing GPT-3 [[Section 2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).,2020-06-11,https://arxiv.org/pdf/2005.14165.pdf,,text,570 GB,[],"The GPT-3 paper, which also introduces the GPT-3 dataset, provides a limited analysis on the GPT-3 dataset, reporting the dirtiness of the dataset after the it was filtered for text occurring in common benchmarking tasks. The authors report that ""as the dataset becomes more contaminated, the variance of the clean over all fraction increases, but there is no apparent bias towards improved or degraded performance"" [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).",['WebText'],"The dataset is composed of several NLP corpora including Common Crawl (filtered, 60%), WebText2 (22%), Books1 (8%), Books2 (8%), Wikipedia (3%) [[Section 2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).",The Common Crawl dataset was processed using a classifier that kept high quality documents and filtered low quality documents. WebText was used as a proxy for high quality documents [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A).,"In addition to excluding low quality documents from the Common Crawl dataset, the authors fuzzily deduplicated documents within each dataset, by removing documents that have high overlap with each other. The same procedure was followed to fuzzily deduplicate WebText from Common Crawl [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A). Text occurring in benchmark datasets were also partially removed [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).",closed,unknown,The intended use of the GPT-3 dataset is to train language models.,unknown,unknown,unknown,,,,,,,,,, +",2021-03-24,https://www.thetrevorproject.org/,,,,,['OpenAI API'],,,,Limited release,closed,unknown,"Training counselors +",unknown,unknown,unknown,Fine-tuning,Dialogue,unknown,unknown,unknown,unknown,,,, +dataset,WebVid-10M,University of Oxford,"WebVid-10M is a large-scale dataset of short videos with textual descriptions sourced from stock footage sites. +",2021-04-01,https://m-bain.github.io/webvid-dataset/,,"text, video",unknown,"10.7M video-text pairs, 52K hours video",[],,,,unknown,open,WebVid Dataset Terms,"""Provided you keep to these rules, the University grants you (the researcher) a non-exclusive and non-transferable licence to use the content free of charge strictly for non-commercial research (i.e., whose output artefacts are not incorporated in commercial products) for 12 months."" +","- Authors note the following prohibited uses: ""You must not use the content other than for the Permitted Purpose in strict conformity with these terms and any other reasonable instructions of the University. You must not, except as may be strictly necessary for carrying out the Permitted Purpose, provide or otherwise make available content to any third party or allow use of it or them by or on behalf of any third party, in whole or in part, whether by way of sale, resale, loan, transfer, hire or any other form of exploitation; or attempt to identify any living or deceased individual from the content."" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md) +- Authors also note the following limitations of the dataset: ""We note that data sourced from the web may be prone to biases and may contain graphic content. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data."" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F) +",,Feedback can be given by emailing at maxbain at robots.ox.ac.uk,,,,,,,,[],Video URLs and textual description annotations,unknown +dataset,WebVid-2M,University of Oxford,"WebVid-2M is a large-scale dataset of 2.5M short videos with textual descriptions sourced from stock footage sites. A subset of the WebVid-10M dataset. +",2021-04-01,https://m-bain.github.io/webvid-dataset/,,"text, video",unknown,"2.5M video-text pairs, 13K hours video",['WebVid-10M'],,,,unknown,open,WebVid Dataset Terms,"""Provided you keep to these rules, the University grants you (the researcher) a non-exclusive and non-transferable licence to use the content free of charge strictly for non-commercial research (i.e., whose output artefacts are not incorporated in commercial products) for 12 months."" +","- Authors note the following prohibited uses: ""You must not use the content other than for the Permitted Purpose in strict conformity with these terms and any other reasonable instructions of the University. You must not, except as may be strictly necessary for carrying out the Permitted Purpose, provide or otherwise make available content to any third party or allow use of it or them by or on behalf of any third party, in whole or in part, whether by way of sale, resale, loan, transfer, hire or any other form of exploitation; or attempt to identify any living or deceased individual from the content."" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md) +- Authors also note the following limitations of the dataset: ""We note that data sourced from the web may be prone to biases and may contain graphic content. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data."" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F) +",,Feedback can be given by emailing at maxbain at robots.ox.ac.uk,,,,,,,,[],Video URLs and textual description annotations,unknown +application,AI DJ,Spotify,"The DJ is a personalized AI guide that knows you and your music taste so well that it can choose what to play for you. This feature, first rolling out in beta, will deliver a curated lineup of music alongside commentary around the tracks and artists we think you’ll like in a stunningly realistic voice.",2023-02-23,https://newsroom.spotify.com/2023-02-22/spotify-debuts-a-new-ai-dj-right-in-your-pocket/,,,,,"['ChatGPT API', 'Sonantic AI']",,,,,limited,custom,,,,,,,https://www.spotify.com/us/legal/end-user-agreement/,,,,,,, +model,MAmmoTH,Ohio State University,MAmmoTH is a series of open-source large language models (LLMs) specifically tailored for general math problem-solving.,2023-09-11,https://arxiv.org/pdf/2309.05653.pdf,,text; text,"Evaluated on MATH, a competition-level dataset, and achieves a 46% accuracy, higher than accuracy produced by GPT-4's chain of thought.",34B parameters (dense),"['MathInstruct', 'LLaMA', 'Code LLaMA']",,,,,open,MIT,,,,,,,,,,,,,, +model,BiomedGPT,Lehigh University,BiomedGPT leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks.,2023-05-26,https://arxiv.org/pdf/2305.17100.pdf,,"image, text; text",outperforms majority of preceding state-of-the-art models over 15 unique biomedical modalities.,472M parameters (dense),"['GPT-style autoregressive decoder', 'BiomedGPT biomedical datasets']",unknown,unknown,10 NVIDIA A5000 GPUs,"No specific quality control is mentioned in model training, though details on data processing and how the model was trained are provided in the paper.",open,Apache 2.0,furthering research in developing unified and generalist models for biomedicine.,,,,,,,,,,,,, +dataset,ROOTS,BigScience,"The Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model.",2022-06-06,https://openreview.net/forum?id=UoEw6KigkUn,,"code, text",https://huggingface.co/spaces/bigscience-data/roots-search,1.6TB,[],,,,"Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.",open,custom,"To empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus.",,,https://huggingface.co/spaces/bigscience-data/roots-search/discussions,,,,,,,https://huggingface.co/spaces/bigscience-data/bigscience-corpus,['https://huggingface.co/spaces/bigscience-data/roots-search'],See section 2 of the paper.,unknown +dataset,P3,BigScience,The Public Pool of Prompts (P3) are prompts written in an unified format use to train T0++.,2022-10-15,https://arxiv.org/pdf/2110.08207.pdf,,text,See the paper.,2000 prompts,[],,,,"Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.",open,Apache 2.0,Multitask finetuning of language models.,,,Point of Contact is [Victor Sanh](https://huggingface.co/datasets/bigscience/P3),,,,,,,https://huggingface.co/datasets/bigscience/P3,['https://huggingface.co/datasets/bigscience/P3/viewer'],The Public Pool of Prompts relies on the Hugging Face Dataset library. Any public dataset in the Datasets library can be prompted. We select the datasets that have at least one subset in English and excluded datasets containing (predominantly) non-natural language examples.,"We conservatively decided not to prompt datasets that contain potentially harmful content (for instance, datasets built on social media content)." +dataset,xP3,BigScience,"xP3 (Crosslingual Public Pool of Prompts) is a collection of prompts and datasets across 46 of languages & 16 NLP tasks. It is used for the training of BLOOMZ and mT0, multilingual language models capable of following human instructions in dozens of languages zero-shot.",2022-11-03,https://arxiv.org/abs/2211.01786,,"code, text",See the paper.,9.4GB,['P3'],,,,"Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.",open,Apache 2.0,Multitask finetuning of language models.,,,Point of Contact is [Niklas Muennighoff](https://huggingface.co/datasets/bigscience/xP3),,,,,,,https://huggingface.co/datasets/bigscience/xP3,['https://huggingface.co/datasets/bigscience/xP3/viewer'],xP3 adds 28 multilingual datasets to P3 based on the P3 task taxonomy.,"We conservatively decided not to prompt datasets that contain potentially harmful content (for instance, datasets built on social media content)." +model,T0++,BigScience,T0++ is an multitask fine-tuned language model based on T5.,2021-10-15,https://arxiv.org/pdf/2110.08207.pdf,https://huggingface.co/bigscience/T0pp,text; text,,11B parameters (dense),"['T5', 'P3']",0.9 tCO2e,27 hours,Jean Zay (v3-512),https://arxiv.org/pdf/2110.08207.pdf,open,Apache 2.0,"You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction.",,,https://huggingface.co/bigscience/T0pp/discussions,,,,,,,,,, +model,BLOOM,BigScience,BLOOM is an autoregressive multilingual language model.,2022-07-12,https://arxiv.org/abs/2211.05100,,"code, text; code, text",,176B parameters (dense),['ROOTS'],25 tCO2e,7039 petaflop/s-days,Jean Zay (48 * 8xA100 80GB nodes),,open,BigScience RAIL v1.0,This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive.,"Using the model in high-stakes settings is out of scope for this model (e.g. biomedical/political/legal/finance domains, evaluating or scoring individuals). The model is not designed for critical decisions nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct. Misuse. Intentionally using the model for harm, violating human rights, or other kinds of malicious activities, is a misuse of this model (e.g. spam generation, disinformation, disparagement, deception, surveillance).",,https://huggingface.co/bigscience/bloom/discussions,,,,,,,,,, +model,mT0,BigScience,mT0 is an multitask fine-tuned multilingual language model based on mT5.,2021-10-15,https://arxiv.org/pdf/2110.08207.pdf,https://huggingface.co/bigscience/T0pp,text ;text,https://huggingface.co/bigscience/bloomz#evaluation,13B parameters (dense),"['mT5', 'xP3']",unknown,unknown,"Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links)",https://arxiv.org/pdf/2211.01786.pdf,open,BigScience RAIL v1.0,We recommend using the model to perform tasks expressed in natural language.,,,https://huggingface.co/bigscience/bloomz/discussions,,,,,,,,,, +model,BLOOMZ,BigScience,BLOOMZ is an multitask fine-tuned autoregressive multilingual language model.,2022-11-03,https://arxiv.org/pdf/2211.01786.pdf,https://huggingface.co/bigscience/bloomz,"code, text; code, text",https://huggingface.co/bigscience/bloomz#evaluation,176B parameters (dense),"['BLOOM', 'xP3']",unknown,unknown,"Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links)",https://arxiv.org/pdf/2211.01786.pdf,open,BigScience RAIL v1.0,We recommend using the model to perform tasks expressed in natural language.,,,https://huggingface.co/bigscience/bloomz/discussions,,,,,,,,,, +application,Bedrock,Amazon,"Bedrock is a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. Bedrock is intended for customers to build and scale generative AI-based applications using FMs, democratizing access for all builders. using an API.",2023-04-13,https://aws.amazon.com/bedrock/,,,,,"['Jurassic-2', 'Claude', 'Stable Diffusion', 'Amazon Titan', 'Claude 2', 'Cohere Command']",,,,,limited,unknown,allowing companies to incorporate generative AI into their business models,,,,,foundation models made accessible via an API,https://aws.amazon.com/service-terms/,,,,,,, +model,FalconLite2,Amazon,"FalconLite2 is a fine-tuned and quantized Falcon language model, capable of processing long (up to 24K tokens) input sequences.",2023-08-08,https://huggingface.co/amazon/FalconLite2,https://huggingface.co/amazon/FalconLite2,text; text,Evaluated against benchmarks that are specifically designed to assess the capabilities of LLMs in handling longer contexts.,40B parameters (dense),['Falcon-40B'],unknown,unknown,unknown,,open,Apache 2.0,,,,https://huggingface.co/amazon/FalconLite2/discussions,,,,,,,,,, +model,Chronos,Amazon,"Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context.",2024-03-13,https://github.com/amazon-science/chronos-forecasting,https://huggingface.co/amazon/chronos-t5-large,time-series; time-series,Chronos has been evaluated comprehensively on 42 datasets both in the in-domain (15 datasets) and zero-shot settings (27 datasets). Chronos outperforms task specific baselines in the in-domain setting and is competitive or better than trained models in the zero-shot setting.,710M parameters (dense),['T5'],,63 hours on p4d.24xlarge EC2 instance,8 NVIDIA A100 40G GPUs,"Chronos was evaluated rigorously on 42 datasets, including 27 in the zero-shot setting against a variety of statistical and deep learning baselines.",open,Apache 2.0,"Chronos can be used for zero-shot time series forecasting on univariate time series from arbitrary domains and with arbitrary horizons. Chronos models can also be fine-tuned for improved performance of specific datasets. Embeddings from Chronos encoder may also be useful for other time series analysis tasks such as classification, clustering, and anomaly detection.",,,https://github.com/amazon-science/chronos-forecasting/discussions,,,,,,,,,, +model,ERNIE 3.0 Titan,"Baidu, PengCheng Laboratory",ERNIE 3.0 Titan is a language model,2021-12-23,https://arxiv.org/abs/2112.12731,,text; text,,260B parameters (dense),[],unknown,unknown,"Baidu V100 Cluster, PengCheng Lab Ascend 910 NPU cluster",,closed,unknown,unknown,unknown,,,,,,,,,,,, +model,ERNIE-ViLG,Baidu,ERNIE-ViLG is a model for text-to-image generation,2021-12-31,https://arxiv.org/abs/2112.15283,,text; image,,10B parameters (dense),[],unknown,unknown,unknown,,limited,,unknown,unknown,,,,,,,,,,,, +model,ERNIE-ViLG 2.0,Baidu,ERNIE-ViLG is a model for text-to-image generation,2022-10-27,https://arxiv.org/abs/2210.15257,,text; image,,10B parameters (dense),[],unknown,18 days according to [[the paper]](https://arxiv.org/abs/2210.15257),320 A100 GPUs according to [[the paper]](https://arxiv.org/abs/2210.15257),,closed,unknown,unknown,unknown,,,,,,,,,,,, +model,ERNIE 4.0,Baidu,ERNIE-4.0 is a multimodal generalist foundation model.,2023-10-17,https://www.prnewswire.com/news-releases/baidu-launches-ernie-4-0-foundation-model-leading-a-new-wave-of-ai-native-applications-301958681.html,,"text; image, text, video",,unknown,[],unknown,unknown,unknown,,limited,unknown,unknown,unknown,,,,,,,,,,,, +model,CausalLM,CausalLM,CausalLM is an LLM based on the model weights of Qwen and trained on a model architecture identical to LLaMA 2.,2023-10-21,https://huggingface.co/CausalLM/14B,https://huggingface.co/CausalLM/14B,text; text,Evaluated on standard benchmarks across a range of tasks.,14B parameters (dense),"['Qwen', 'OpenOrca', 'Open Platypus']",unknown,unknown,unknown,,open,WTFPL,,,unknown,,,,,,,,,,, +model,Ideogram 1.0,Ideogram AI,"Ideogram 1.0 is Ideogram’s most advanced text-to-image model, as of release.",2024-02-28,https://about.ideogram.ai/1.0,,text; image,Compared to DALL·E 3 based on a qualitative user comparison.,unknown,[],unknown,unknown,unknown,,limited,unknown,,,,,,,,,,,,,, +model,HyperCLOVA,NAVER,HyperClova is an autoregressive language model,2021-05-21,https://arxiv.org/abs/2109.04650,,text; text,,82B parameters,[],,130.4 days,1024 A100 GPUs,,closed,unknown,,,,,,,,,,,,,, +model,HyperCLOVA X,NAVER,"HyperCLOVA X is a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding.",2024-04-13,https://arxiv.org/pdf/2404.01954,,text; text,"Evaluated on English and Korean benchmarks in comparison to open source English and multilingual LLMs, with HyperCLOVA X (closed) surpassing the models compared.",unknown,[],unknown,unknown,unknown,,limited,unknown,,,,,,,,,,,,,, +application,Ask Instacart,Instacart,"Instacart is augmenting the Instacart app to enable customers to ask about food and get inspirational, shoppable answers. This uses ChatGPT alongside Instacart’s own AI and product data from their 75,000+ retail partner store locations to help customers discover ideas for open-ended shopping goals, such as “How do I make great fish tacos?” or “What’s a healthy lunch for my kids?” Instacart plans to launch “Ask Instacart” later this year.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,['ChatGPT API'],,,,,limited,,,,,,,,,,,,,,, +model,OpenFold,Columbia,OpenFold is an open source recreation of AlphaFold2.,2022-11-20,https://www.biorxiv.org/content/10.1101/2022.11.20.517210v2,,amino acid sequence; protein structure,Evaluated on wide range of tasks using own evaluation benchmarks.,,"['AlphaFold2', 'OpenProteinSet']",unknown,"50,000 GPU hours",Single A100 NVIDIA GPU,,open,CC BY 4.0,,,,,,,,,,,,,, +model,Ferret,"Columbia, Apple AI",Ferret is a Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.,2023-10-11,https://arxiv.org/pdf/2310.07704.pdf,,"image, text; image, text",Evaluated on the object hallucination benchmark and compared to GPT-4V.,13B parameters,"['CLIP', 'Vicuna']",unknown,2.5 to 5 days,8 A100 GPUs,,open,Apple,,,,,,,,,,,,,, +model,Platypus,Boston University,Platypus is a family of fine-tuned and merged Large Language Models (LLMs).,2023-08-14,https://arxiv.org/pdf/2308.07317.pdf,https://huggingface.co/garage-bAInd/Platypus2-13B,text; text,Platypus achieves the strongest performance and currently stands at first place in HuggingFace’s Open LLM Leaderboard as of its release date.,13B parameters (dense),"['LLaMA 2', 'Platypus curated dataset']",,5 hours,1 A100 GPU,,open,CC by-NC-SA 4.0,,,,https://huggingface.co/garage-bAInd/Platypus2-13B/discussions,,,,,,,,,, +model,UFOGen,Boston University,"UFOGen is a novel generative model designed for ultra-fast, one-step text-to-image synthesis.",2023-11-14,https://arxiv.org/pdf/2311.09257.pdf,,text; image,UFOGen is evaluated on standard image benchmarks against other models fine-tuned with Stable Diffusion.,900M parameters (dense),['Stable Diffusion'],unknown,unknown,unknown,,open,unknown,,,,,,,,,,,,,, +model,RWKV World 4,RWKV,"RWKV World 4 is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).",2023-05-03,https://huggingface.co/RWKV/rwkv-4-world-7b,https://huggingface.co/RWKV/rwkv-4-world-7b,text; text,,7B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/RWKV/rwkv-4-world-7b/discussions,,,,,,,,,, +model,RWKV 4 Pile,RWKV,"RWKV 4 Pile is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).",2023-05-15,https://huggingface.co/RWKV/rwkv-4-14b-pile,https://huggingface.co/RWKV/rwkv-4-14b-pile,text; text,,14B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/RWKV/rwkv-4-14b-pile/discussions,,,,,,,,,, +model,RWKV World 5,RWKV,"RWKV World 5 is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).",2023-12-16,https://huggingface.co/RWKV/rwkv-5-world-3b,https://huggingface.co/RWKV/rwkv-5-world-3b,text; text,,3B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/RWKV/rwkv-5-world-3b/discussions,,,,,,,,,, +dataset,EXMODD,Beijing Institute of Technology,EXMODD (Explanatory Multimodal Open-Domain Dialogue dataset) is a dataset built off the proposed MDCF (Multimodal Data Construction Framework).,2023-10-17,https://arxiv.org/pdf/2310.10967.pdf,,"image, text",Models fine-tuned on EXMODD and earlier dataset Image-Chat and then evaluated on Image-Chat validation set.,unknown,"['YFCC100M', 'Image-Chat']",,,,,open,MIT,,,,Feedback can be sent to authors via poplpr@bit.edu.cn,,,,,,,,[],, +model,MiniMA,Beijing Institute of Technology,MiniMA is a smaller finetuned Llama 2 model adapted for Chinese.,2023-11-13,https://github.com/GeneZC/MiniMA,https://huggingface.co/GeneZC/MiniMA-3B,text; text,"Evaluated on standard benchmarks including MMLU, CEval, and DROP.",3B parameters (dense),['Llama 2'],unknown,unknown,8 A100 80G GPUs,,open,Llama 2,,,unknokwn,https://huggingface.co/GeneZC/MiniMA-3B/discussions,,,,,,,,,, +model,Cerebras-GPT,Cerebras,"A Family of Open, Compute-efficient, Large Language Models. The family includes 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B models. All models in the Cerebras-GPT family have been trained in accordance with Chinchilla scaling laws (20 tokens per model parameter). [[Cerebras Blog Post]](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models) +",2023-03-28,https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/,https://huggingface.co/cerebras/Cerebras-GPT-13B,text; text,"""We evaluate our models on the PILE validation set comprising 380M tokens. We also evaluate the public checkpoints of Pythia, Eleuther (2022); OPT, Zhang et al. (2022); GPT-NeoX 20B, Black et al. (2022); and GPT-J 6B, Wang & Komatsuzaki (2021). We performed upstream (pre-training) evaluations of text prediction cross-entropy using the Pile validation and test splits. We performed downstream evaluations of text generation accuracy on standardized tasks using the Eleuther lm-evaluation-harness."" [[Evaluations]] (https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#evaluations). +",13B parameters (dense),['The Pile'],,,16x Cerebras CS-2 wafer scale systems,"The Pile dataset has been thoroughly analyzed from various ethical standpoints such as toxicity analysis, gender bias, pejorative content, racially sensitive content etc. Only mitigations in standard Pile dataset pre-processing were employed when pre-training Cerebras-GPT. [[Risk, Bias, Ethical Considerations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#risk-bias-ethical-considerations) +",open,Apache 2.0,"""The primary intended use is to further research into large language models. These models can be used as a foundation model for NLP, applications, ethics, and alignment research. Our primary intended users are researchers who are working to improve LLMs and practitioners seeking reference implementations, training setups, hyperparameters, or pre-trained models. We release these models with a fully permissive Apache license for the community to use freely."" [[Uses and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#uses-and-limitations). +","Authors note the following limitations of the model: ""Cerebras-GPT models are trained on the Pile, with English language only, and are not suitable for machine translation tasks. Cerebras-GPT models have not been tuned for human-facing dialog applications like chatbots and will not respond to prompts in a similar way to models that have received instruction tuning or reinforcement learning from human feedback (RLHF) like Flan-T5 or ChatGPT."" [[Uses and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#out-of-scope-use). +",,,,,,,,,,,, +model,Jais,"Inception Institute of Artificial Intelligence, Cerebras, Mohamed bin Zayed University of Artificial Intelligence",Jais is the world’s most advanced Arabic LLM as of its release.,2023-08-30,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,text; text,Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.,13B parameters (dense),"['GPT-3', 'The Pile']",unknown,unknown,Condor Galaxy Supercomputer,,open,Apache 2.0,Jais is released with the aim to stimulate research and development in the Arabic NLP community.,"Generating or endorsing hate speech, disseminating false information, engaging in illegal activities, managing sensitive data, attempting language generalization beyond Arabic and English, and making critical decisions with high stakes.",unknown,,,,,,,,,,, +model,Jais Chat,"Inception Institute of Artificial Intelligence, Cerebras, Mohamed bin Zayed University of Artificial Intelligence","Jais Chat is an instruction-tuned version of Jais, optimized for dialog interaction.",2023-08-30,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,text; text,Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.,13B parameters (dense),"['GPT-3', 'The Pile']",unknown,unknown,Condor Galaxy Supercomputer from Cerebras,,open,Apache 2.0,Jais Chat is released with the aim to stimulate research and development in the Arabic NLP community.,"Generating or endorsing hate speech, disseminating false information, engaging in illegal activities, managing sensitive data, attempting language generalization beyond Arabic and English, and making critical decisions with high stakes.",unknown,,,,,,,,,,, +model,Bittensor Language Model,Cerebras,Bittensor Language Model is a 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama.,2023-07-24,https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/,https://huggingface.co/cerebras/btlm-3b-8k-base,text; text,Evaluated on standard LLM benchmarks in comparison to similar-sized models.,3B parameters (dense),['SlimPajama'],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/cerebras/btlm-3b-8k-base/discussions,,,,,,,,,, +dataset,SlimPajama,Cerebras,"As of release, SlimPajama is the largest extensively deduplicated, multi-corpora, open-source dataset for training large language models.",2023-06-09,https://huggingface.co/datasets/cerebras/SlimPajama-627B,,text,,627B tokens,['RedPajama-Data'],,,,,open,Apache 2.0,,,unknown,https://huggingface.co/datasets/cerebras/SlimPajama-627B/discussions,,,,,,,https://huggingface.co/datasets/cerebras/SlimPajama-627B,[],, +application,Bain Chat,Bain,"With the alliance, Bain will combine its deep digital implementation capabilities and strategic expertise with OpenAI’s AI tools and platforms, including ChatGPT, to help its clients around the world identify and implement the value of AI to maximize business potential.",2023-02-21,https://www.bain.com/vector-digital/partnerships-alliance-ecosystem/openai-alliance/,,,,,['ChatGPT API'],,,,,limited,unknown,,,,,,,,,,,,,, +dataset,FinPile,Bloomberg,"A comprehensive dataset consisting of a range of English financial documents including news, filings, press releases, web-scraped financial documents, and social media drawn from the Bloomberg archives that was used to train the BloombergGPT model.",2023-03-30,https://arxiv.org/pdf/2303.17564.pdf#section.2,,text,,363B tokens,[],,,,,closed,unknown,Used to train the BloombergGPT model.,,,,,,,,,,https://arxiv.org/pdf/2303.17564.pdf#section.2,[],"FinPile consists of English financial documents. Authors utilize the The Bloomberg +Terminal, which is an extensive collection of curated and maintained documents, +to create the FinPile dataset. Each document in FinPile is time-stamped, with +dates ranging from 2007-03-01 to 2022-07-31. +Types of data included are given below: + 1. Web (298B tokens) - Inclues Bloomberg's web crawl focused on high-quality +websites that have financially relevant information. This makes up the majority +of FinPile. + 2. News (38B tokens) - Includes all news sources relevant to the financial +community, excluding news articles written by Bloomberg journalists. Overall, +there are hundreds of English news sources in FinPile including ""Bloomberg +Transcripts"", which are transcripts of Bloomberg TV news. + 3. Filings (14B tokens) - Includes financial statements prepared by (public) +companies and made available to the general public. In the dataset, a majority +of the filings come from EDGAR, which is the SEC's online database. + 4. Press (9B tokens) - Includes press releases typically issued by companies +that are financially relevant. + 5. Bloomberg (5B tokens) - Includes Bloomberg authored news and other documents +such as opinions and analyses. The largest sources are “Bloomberg News” and +“Bloomberg First Word”, the Bloomberg-authored wire of real-time news. +", +model,BloombergGPT,Bloomberg,BloombergGPT is a 50 billion parameter large language model that is specifically trained on a wide range of financial data to support a diverse set of natural language processing tasks within the financial industry.,2023-03-30,https://arxiv.org/abs/2303.17564,,text; text,"Authors evaluate the performance of BloombergGPT on two broad categories of tasks, finance-specific and general purpose, on several standard benchmarks. They compare BloombergGPT to the three closest models: GPT-NeoX, OPT-66B and BLOOM-176B. They also report results from the original GPT-3 whenever externally available. They conclude ""We achieve strong results on general LLM benchmarks and outperform comparable models on financial tasks. We attribute this, in decreasing order of impact, to 1. a well-curated internal dataset, 2. our unique choice in tokenizer, and 3. an up-to-date architecture."" +",50B parameters (dense),"['FinPile', 'The Pile', 'C4', 'Wikipedia']",unknown,53 days,64 Amazon EC2 p4d.24xlarge instances each with 8 NVIDIA 40GB A100 GPUs (i.e. total 512 A100 GPUs),"Authors state the following: +- ""To provide natural language applications to the financial community, we + have developed a rigorous risk and testing assessment process. This process + includes careful annotation guidelines Tseng et al. (2020), pre-launch review + at multiple levels by the central risk and compliance organizations, and + by the product leaders (e.g., the newsroom) as applicable, and post-launch + monitoring. Moreover, we conduct our research, development, and deployment + of NLP and AI systems in accordance with all applicable regulations."" +- ""Similarly, toxicity and bias are areas where, as a company, we take extraordinary + care with any content we produce, whether from humans or machines. Since + the measurement of toxicity and bias in our model depends on its application + areas, quantifying the potential for the generation of harmful language + remains an open question. We are particularly interested in studying whether + FinPile, which is cleaner and contains fewer examples of overtly biased + or toxic language (e.g., Press Releases), reduces the proclivity of the + model to generate inappropriate content."" +",closed,unknown,"""This model will assist Bloomberg in improving existing financial NLP tasks, such as sentiment analysis, named entity recognition, news classification, and question answering, among others. Furthermore, BloombergGPT will unlock new opportunities for marshalling the vast quantities of data available on the Bloomberg Terminal to better help the firm's customers, while bringing the full potential of AI to the financial domain."" +",,,,,,,,,,,,, +model,SaiLY,Deepnight Research,SaiLy is a series/collection of AI Models by Deepnight Research which are highly experimental and uncensored.,2023-11-04,https://huggingface.co/deepnight-research/saily_100b,https://huggingface.co/deepnight-research/saily_100b,text; text,,100B parameters (dense),[],unknown,unknown,unknown,,open,MIT,,,unknown,https://huggingface.co/deepnight-research/saily_100b/discussions,,,,,,,,,, +model,BioMistral,"Avignon University, Nantes University","BioMistral is an open-source Large Language Model tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.",2024-02-15,https://arxiv.org/pdf/2402.10373.pdf,https://huggingface.co/BioMistral/BioMistral-7B,text; text,BioMistral was evaluated on a benchmark comprising 10 established medical question-answering (QA) tasks in English and seven other languages.,7B parameters (dense),"['Mistral', 'PubMed Central']",unknown,unknown,32 NVIDIA A100 80GB GPUs,,open,Apache 2.0,"Research in the biomedical domain, especially for medical question-answering tasks.",Prohibited from deploying in production environments for natural language generation or any professional health and medical purposes.,,https://huggingface.co/BioMistral/BioMistral-7B/discussions,,,,,,,,,, +model,Mistral,Mistral AI,Mistral is a compact language model.,2023-09-27,https://mistral.ai/news/announcing-mistral-7b/,https://huggingface.co/mistralai/Mistral-7B-v0.1,text; text,Evaluated in comparison to LLaMA series models on standard language benchmarks.,7.3B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,,,,https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions,,,,,,,,,, +model,Mistral Large,Mistral AI,Mistral Large is Mistral AI’s new cutting-edge text generation model.,2024-02-26,https://mistral.ai/news/mistral-large/,,text; text,Evaluated on commonly used benchmarks in comparison to the current LLM leaders.,unknown,[],unknown,unknown,unknown,,limited,unknown,,,,,,,,,,,,,, +application,Le Chat,Mistral AI,Le Chat is a first demonstration of what can be built with Mistral models and what can deployed in the business environment.,2024-02-26,https://mistral.ai/news/le-chat-mistral/,,,,,"['Mistral', 'Mistral Large']",,,,,limited,unknown,,,,,,,https://mistral.ai/terms/#terms-of-use,unknown,unknown,unknown,,,, +model,Codestral,Mistral AI,"Codestral is an open-weight generative AI model explicitly designed for code generation tasks. It helps developers write and interact with code through a shared instruction and completion API endpoint. Mastering code and English, it can be used to design advanced AI applications for software developers. It is fluent in 80+ programming languages.",2024-05-29,https://mistral.ai/news/codestral/,,text; code,"Performance of Codestral is evaluated in Python, SQL, and additional languages, C++, bash, Java, PHP, Typescript, and C#. Fill-in-the-middle performance is assessed using HumanEval pass@1 in Python, JavaScript, and Java.",22B parameters,[],unknown,unknown,unknown,,open,Mistral AI Non-Production License,"Helps developers write and interact with code, design advanced AI applications for software developers, integrated into LlamaIndex and LangChain for building applications, integrated in VSCode and JetBrains environments for code generation and interactive conversation.",unknown,unknown,,,,,,,,,,, +application,Nextdoor Assistant,Nextdoor,AI chatbot on Nextdoor that helps users write more clear and conscientious posts.,2023-05-02,https://help.nextdoor.com/s/article/Introducing-Assistant,,,,,['ChatGPT'],,,,,open,unknown,to be used to help make the Nextdoor experience more positive for users,,,,,natural language text guidance,,,,,,,, +dataset,GPT-3 dataset,OpenAI,The GPT-3 dataset is the text corpus that was used to train the GPT-3 model. Information on the GPT-3 dataset is limited to discussion in the paper introducing GPT-3 [[Section 2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).,2020-06-11,https://arxiv.org/pdf/2005.14165.pdf,,text,"The GPT-3 paper, which also introduces the GPT-3 dataset, provides a limited analysis on the GPT-3 dataset, reporting the dirtiness of the dataset after the it was filtered for text occurring in common benchmarking tasks. The authors report that ""as the dataset becomes more contaminated, the variance of the clean over all fraction increases, but there is no apparent bias towards improved or degraded performance"" [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).",570 GB,['WebText'],,,,"In addition to excluding low quality documents from the Common Crawl dataset, the authors fuzzily deduplicated documents within each dataset, by removing documents that have high overlap with each other. The same procedure was followed to fuzzily deduplicate WebText from Common Crawl [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A). Text occurring in benchmark datasets were also partially removed [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).",closed,unknown,The intended use of the GPT-3 dataset is to train language models.,unknown,unknown,unknown,,,,,,,,[],"The dataset is composed of several NLP corpora including Common Crawl (filtered, 60%), WebText2 (22%), Books1 (8%), Books2 (8%), Wikipedia (3%) [[Section 2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).",The Common Crawl dataset was processed using a classifier that kept high quality documents and filtered low quality documents. WebText was used as a proxy for high quality documents [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A). dataset,HumanEval,OpenAI,"HumanEval is a dataset of 164 programming problems hand-written to evaluate their Codex model. -",2021-08-10,https://arxiv.org/pdf/2107.03374.pdf,,code,214 KB,"['\n\ndef string_sequence(n: int) -> str:\n """""" Return a string containing space-delimited numbers starting from 0 upto n inclusive.\n >>> string_sequence(0)\n \'0\'\n >>> string_sequence(5)\n \'0 1 2 3 4 5\'\n """"""\n', '\n\ndef count_distinct_characters(string: str) -> int:\n """""" Given a string, find out how many distinct characters (regardless of case) does it consist of\n >>> count_distinct_characters(\'xyzXYZ\')\n 3\n >>> count_distinct_characters(\'Jerry\')\n 4\n """"""\n', 'from typing import List\n\n\ndef parse_music(music_string: str) -> List[int]:\n """""" Input to this function is a string representing musical notes in a special ASCII format.\n Your task is to parse this string and return list of integers corresponding to how many beats does each\n not last.\n\n Here is a legend:\n \'o\' - whole note, lasts four beats\n \'o|\' - half note, lasts two beats\n \'.|\' - quater note, lasts one beat\n\n >>> parse_music(\'o o| .| o| o| .| .| .| .| o o\')\n [4, 2, 1, 2, 2, 1, 1, 1, 1, 4, 4]\n """"""\n', '\n\ndef how_many_times(string: str, substring: str) -> int:\n """""" Find how many times a given substring can be found in the original string. Count overlaping cases.\n >>> how_many_times(\'\', \'a\')\n 0\n >>> how_many_times(\'aaa\', \'a\')\n 3\n >>> how_many_times(\'aaaa\', \'aa\')\n 3\n """"""\n', 'from typing import List\n\n\ndef sort_numbers(numbers: str) -> str:\n """""" Input is a space-delimited string of numberals from \'zero\' to \'nine\'.\n Valid choices are \'zero\', \'one\', \'two\', \'three\', \'four\', \'five\', \'six\', \'seven\', \'eight\' and \'nine\'.\n Return the string with numbers sorted from smallest to largest\n >>> sort_numbers(\'three one five\')\n \'one three five\'\n """"""\n']",,[],"164 hand-written questions. -","Code problems easily found on the internet. -","The evaluation dataset was handwritten to ensure that the evaluation problems do not exist in the Codex dataset [[Section 2.2]](https://arxiv.org/pdf/2107.03374.pdf#subsection.2.2). +",2021-08-10,https://arxiv.org/pdf/2107.03374.pdf,,code,,214 KB,[],,,,"The evaluation dataset was handwritten to ensure that the evaluation problems do not exist in the Codex dataset [[Section 2.2]](https://arxiv.org/pdf/2107.03374.pdf#subsection.2.2). ",open,MIT,"Evaluating code generation capabilities of models. ",,,"Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf). -",,,,,,,,,, +",,,,,,,,"['\n\ndef string_sequence(n: int) -> str:\n """""" Return a string containing space-delimited numbers starting from 0 upto n inclusive.\n >>> string_sequence(0)\n \'0\'\n >>> string_sequence(5)\n \'0 1 2 3 4 5\'\n """"""\n', '\n\ndef count_distinct_characters(string: str) -> int:\n """""" Given a string, find out how many distinct characters (regardless of case) does it consist of\n >>> count_distinct_characters(\'xyzXYZ\')\n 3\n >>> count_distinct_characters(\'Jerry\')\n 4\n """"""\n', 'from typing import List\n\n\ndef parse_music(music_string: str) -> List[int]:\n """""" Input to this function is a string representing musical notes in a special ASCII format.\n Your task is to parse this string and return list of integers corresponding to how many beats does each\n not last.\n\n Here is a legend:\n \'o\' - whole note, lasts four beats\n \'o|\' - half note, lasts two beats\n \'.|\' - quater note, lasts one beat\n\n >>> parse_music(\'o o| .| o| o| .| .| .| .| o o\')\n [4, 2, 1, 2, 2, 1, 1, 1, 1, 4, 4]\n """"""\n', '\n\ndef how_many_times(string: str, substring: str) -> int:\n """""" Find how many times a given substring can be found in the original string. Count overlaping cases.\n >>> how_many_times(\'\', \'a\')\n 0\n >>> how_many_times(\'aaa\', \'a\')\n 3\n >>> how_many_times(\'aaaa\', \'aa\')\n 3\n """"""\n', 'from typing import List\n\n\ndef sort_numbers(numbers: str) -> str:\n """""" Input is a space-delimited string of numberals from \'zero\' to \'nine\'.\n Valid choices are \'zero\', \'one\', \'two\', \'three\', \'four\', \'five\', \'six\', \'seven\', \'eight\' and \'nine\'.\n Return the string with numbers sorted from smallest to largest\n >>> sort_numbers(\'three one five\')\n \'one three five\'\n """"""\n']","164 hand-written questions. +","Code problems easily found on the internet. +" dataset,Codex dataset,OpenAI,"The dataset used to train the Codex model. -",2021-08-10,https://arxiv.org/pdf/2107.03374.pdf,,,159 GB,[],,[],"The dataset includes 54 million public software repositories hosted on GitHub as of an unspecified date in May 2020 [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1). -","Following were filtered from the dataset: autogenerated files; files with average line length > 100, maximum line length > 1000, or few alphanumeric characters [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1). -","Dataset was filtered using simple heuristics, as outlined in the excluded field. +",2021-08-10,https://arxiv.org/pdf/2107.03374.pdf,,,,159 GB,[],,,,"Dataset was filtered using simple heuristics, as outlined in the excluded field. ",closed,,Training language models on code.,unknown,unknown,"Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf). -",,,,,,,,,, +",,,,,,,,[],"The dataset includes 54 million public software repositories hosted on GitHub as of an unspecified date in May 2020 [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1). +","Following were filtered from the dataset: autogenerated files; files with average line length > 100, maximum line length > 1000, or few alphanumeric characters [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1). +" dataset,CLIP dataset,OpenAI,"CLIP dataset contains text-image pairs crawled from the internet. -",2021-01-05,https://arxiv.org/pdf/2103.00020.pdf,,"image, text","400M (image, text) pairs",[],"The dataset contained some overlap with the test sets of the benchmarks used for evaluation, but the authors determined the impact to be small: ""There is a median overlap of 2.2% and an average overlap of 3.2%. Due to this small amount of overlap, overall accuracy is rarely shifted by more than 0.1% with only 7 datasets above this threshold"" [[Section 5]](https://arxiv.org/pdf/2103.00020.pdf#section.5). -",[],"Data crawled from the internet, without any filtering (including de-duplication) or curation. -",,"The data was ""only crawled websites that had policies against excessively violent and adult images and allowed us to filter out such content"" [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.md). -",closed,unknown,Training multimodal vision models.,unknown,unknown,unknown,,,,,,,,,, +",2021-01-05,https://arxiv.org/pdf/2103.00020.pdf,,"image, text","The dataset contained some overlap with the test sets of the benchmarks used for evaluation, but the authors determined the impact to be small: ""There is a median overlap of 2.2% and an average overlap of 3.2%. Due to this small amount of overlap, overall accuracy is rarely shifted by more than 0.1% with only 7 datasets above this threshold"" [[Section 5]](https://arxiv.org/pdf/2103.00020.pdf#section.5). +","400M (image, text) pairs",[],,,,"The data was ""only crawled websites that had policies against excessively violent and adult images and allowed us to filter out such content"" [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.md). +",closed,unknown,Training multimodal vision models.,unknown,unknown,unknown,,,,,,,,[],"Data crawled from the internet, without any filtering (including de-duplication) or curation. +", dataset,DALL·E dataset,OpenAI,"DALL·E dataset is the training set consisting of image and text pairs collected to train the DALL·E model. -",2021-01-05,https://arxiv.org/abs/2102.12092,,"image, text","250M (image, text) pairs -",[],"The authors found that the dataset contained 21% of the images in the MS-COCO validation set, but observed no significant changes in the performance of the accompanying DALL·E when tested on MS-COCO evaluation set with and without the said images [[Section 3.1]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.1).",[],"Data from the internet, including Conceptual Captions and a filtered subset of YFCC100M. +",2021-01-05,https://arxiv.org/abs/2102.12092,,"image, text","The authors found that the dataset contained 21% of the images in the MS-COCO validation set, but observed no significant changes in the performance of the accompanying DALL·E when tested on MS-COCO evaluation set with and without the said images [[Section 3.1]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.1).","250M (image, text) pairs +",[],,,,"The data was de-duplicated [[Section 3.2]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.2). The data collected from the internet was filtered using image, text and joint image and text filters, which included: ""discarding instances whose captions are too short, are classified as non-English by the Python package cld3, or that consist primarily of boilerplate phrases such as “photographed on ”, where matches various formats for dates that we found in the data"". The authors also discard ""instances whose images have aspect ratios not in [1/2, 2]"" [[Appendix C]](https://arxiv.org/pdf/2102.12092.pdf#appendix.C). +",closed,unknown,Training multimodal vision models.,unknown,unknown,unknown,,,,,,,,[],"Data from the internet, including Conceptual Captions and a filtered subset of YFCC100M. ","MS-COCO was excluded from the dataset, but because MS-COCO was created from YFCC100M, some of the test images (not the captions) were included. -","The data was de-duplicated [[Section 3.2]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.2). The data collected from the internet was filtered using image, text and joint image and text filters, which included: ""discarding instances whose captions are too short, are classified as non-English by the Python package cld3, or that consist primarily of boilerplate phrases such as “photographed on ”, where matches various formats for dates that we found in the data"". The authors also discard ""instances whose images have aspect ratios not in [1/2, 2]"" [[Appendix C]](https://arxiv.org/pdf/2102.12092.pdf#appendix.C). -",closed,unknown,Training multimodal vision models.,unknown,unknown,unknown,,,,,,,,,, +" dataset,Whisper dataset,OpenAI,"The Whisper dataset is the speech corpus that was used to train the Whisper model. Information on the dataset is limited to discussion in the paper introducing Whisper. [[Section 2.1]](https://cdn.openai.com/papers/whisper.pdf). -",2022-09-21,https://cdn.openai.com/papers/whisper.pdf,,audio,680k hours,[],"The Whisper paper provides limited details on preprocessing. -",[],"The dataset is composed three major sources: multilingual speech recognition (17%), translation (18%), and English speech recognition (65%). [[Figure 11]](https://cdn.openai.com/papers/whisper.pdf). +",2022-09-21,https://cdn.openai.com/papers/whisper.pdf,,audio,"The Whisper paper provides limited details on preprocessing. +",680k hours,[],,,,"In addition to filtering, basic text standardization was done. +",closed,unknown,The intended use is to train speech models.,,,unknown,,,,,,,,[],"The dataset is composed three major sources: multilingual speech recognition (17%), translation (18%), and English speech recognition (65%). [[Figure 11]](https://cdn.openai.com/papers/whisper.pdf). ","Automated filtering was conducted. -","In addition to filtering, basic text standardization was done. -",closed,unknown,The intended use is to train speech models.,,,unknown,,,,,,,,,, -dataset,WebText,OpenAI,,2019-11-01,https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf,,text,40 GB,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,GPT-2,OpenAI,,2019-11-01,https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf,,text; text,1.5B parameters (dense),,,['WebText'],,,,open,Modified MIT License,,,,,https://github.com/openai/gpt-2/blob/master/model_card.md,,,,,,,,, +" +dataset,WebText,OpenAI,,2019-11-01,https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf,,text,,40 GB,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,GPT-2,OpenAI,,2019-11-01,https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf,https://github.com/openai/gpt-2/blob/master/model_card.md,text; text,,1.5B parameters (dense),['WebText'],,,,,open,Modified MIT License,,,,,,,,,,,,,, model,GPT-3,OpenAI,"GPT-3 is an autoregressive language model. -",2020-06-11,https://arxiv.org/pdf/2005.14165.pdf,,text; text,175B parameters (dense),,"The GPT-3 model was evaluated on language modeling, closed-book question answering, translation, Winograd-style tasks, commonsense reasoning, reading comprehension, SuperGLUE, NLI, synthetic tasks, and generation [[Section 4]](https://arxiv.org/pdf/2005.14165.pdf#section.4); as well as on fairness and biases [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6). -",['GPT-3 dataset'],,,"One quality control method OpenAI employed was releasing GPT-3 only through the OpenAI API. OpenAI states that it is easier to respond to misuse when the access to the model is gated through the API. It also hints that it plans to broaden the API access over time based on the amount of misuse [[OpenAI API Blog Post]](https://openai.com/blog/openai-api/). The authors identify potential misuses of GPT-3 in the paper and analyze it for fairness, bias and representation issues, but do not identify mitigation strategies [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6). +",2020-06-11,https://arxiv.org/pdf/2005.14165.pdf,https://github.com/openai/gpt-3/blob/master/model-card.md,text; text,"The GPT-3 model was evaluated on language modeling, closed-book question answering, translation, Winograd-style tasks, commonsense reasoning, reading comprehension, SuperGLUE, NLI, synthetic tasks, and generation [[Section 4]](https://arxiv.org/pdf/2005.14165.pdf#section.4); as well as on fairness and biases [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6). +",175B parameters (dense),['GPT-3 dataset'],552.1 tCO2e,3640 petaflop/s-days,Azure,"One quality control method OpenAI employed was releasing GPT-3 only through the OpenAI API. OpenAI states that it is easier to respond to misuse when the access to the model is gated through the API. It also hints that it plans to broaden the API access over time based on the amount of misuse [[OpenAI API Blog Post]](https://openai.com/blog/openai-api/). The authors identify potential misuses of GPT-3 in the paper and analyze it for fairness, bias and representation issues, but do not identify mitigation strategies [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6). ",limited,unknown,"GPT-3 was intended to be use through the OpenAI API by developers for language applications. Other intended use of GPT-3 include researchers accessing the model through the API to study its paradigms [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md). ","Access to GPT-3 is governed by Open AI API Usage Guidelines and API Terms of Use, prohibiting the use of the API in a way that causes societal harm. [[Usage Guidelines]] (https://beta.openai.com/docs/usage-guidelines/content-policy) [[Terms of Use]](https://openai.com/api/policies/terms/). The list of disallowed applications can be found in the usage guidelines [[Disallowed Applications]] (https://beta.openai.com/docs/usage-guidelines/disallowed-applications). ","OpenAI reviews all use cases of the model [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md). ","Feedback for GPT-3 can be provided on the feedback form linked in the model card [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md). The form is especially meant to collect feedback on concerns about misuse, synthetic text detection, bias, and risk of generative language models. -",https://github.com/openai/gpt-3/blob/master/model-card.md,552.1 tCO2e,3640 petaflop/s-days,Azure,,,,,, +",,,,,,,,,, model,Codex,OpenAI,"Codex is a GPT language model fine-tuned on publicly available code from GitHub. -",2021-08-10,https://arxiv.org/pdf/2107.03374.pdf,,"text; code, text",12B parameters (dense),,"The model was evaluated using the HumanEval dataset with pass@k metric and BLEU scores [[Section 2]](https://arxiv.org/pdf/2107.03374.pdf#section.2). -","['GPT-3', 'Codex dataset', 'HumanEval']",,,"The model wasn't fully released to the public as a quality control measure. The authors identify potential risks of Codex in their paper due to the following: over-reliance, misalignment, bias and representation, economic and labor market impacts, security implications, environmental impact and legal implications. They also make suggestions for some of these, but do not implement them in Codex [[Section 7]](https://arxiv.org/pdf/2107.03374.pdf#section.7). +",2021-08-10,https://arxiv.org/pdf/2107.03374.pdf,,"text; code, text","The model was evaluated using the HumanEval dataset with pass@k metric and BLEU scores [[Section 2]](https://arxiv.org/pdf/2107.03374.pdf#section.2). +",12B parameters (dense),"['GPT-3', 'Codex dataset', 'HumanEval']",unknown,100-1000 petaflop/s-days,Azure,"The model wasn't fully released to the public as a quality control measure. The authors identify potential risks of Codex in their paper due to the following: over-reliance, misalignment, bias and representation, economic and labor market impacts, security implications, environmental impact and legal implications. They also make suggestions for some of these, but do not implement them in Codex [[Section 7]](https://arxiv.org/pdf/2107.03374.pdf#section.7). ",limited,unknown,"Codex is intended to be used for coding related language modelling tasks. ",unknown,unknown,"Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf). -",,unknown,100-1000 petaflop/s-days,Azure,,,,,, +",,,,,,,,,, model,InstructGPT,OpenAI,"InstructGPT is a family of GPT-3 based models fine-tuned on human feedback, which allows for better instruction following capabilities than GPT-3. -",2022-01-27,https://arxiv.org/pdf/2203.02155.pdf,,"text; code, text",175B parameters (dense),,"The model was evaluated on human ratings to the InstructGPT answers to the prompts submitted to the OpenAI API as well as on public NLP datasets spanning truthfulness, toxicity, and bias, question answering, reading comprehension, and summarization tasks.","['GPT-3', 'OpenAI API']",,,"The model wasn't fully released to the public as a quality control measure. +",2022-01-27,https://arxiv.org/pdf/2203.02155.pdf,https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md,"text; code, text","The model was evaluated on human ratings to the InstructGPT answers to the prompts submitted to the OpenAI API as well as on public NLP datasets spanning truthfulness, toxicity, and bias, question answering, reading comprehension, and summarization tasks.",175B parameters (dense),"['GPT-3', 'OpenAI API']",unknown,60 petaflops/s-days,unknown,"The model wasn't fully released to the public as a quality control measure. ",closed,unknown,"As stated in the model card: ""The intended direct users of InstructGPT are developers who access its capabilities via the OpenAI API. Through the OpenAI API, the model can be used by those who may not have AI development experience, to build and explore language modeling systems across a wide range of functions. We also anticipate that the model will continue to be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale language models"" [[Model Card]](https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md). ","Access to InstructGPT is governed by Open AI API Usage Guidelines and API Terms of Use, prohibiting the use of the API in a way that causes societal harm. [[Usage Guidelines]] (https://beta.openai.com/docs/usage-guidelines/content-policy) [[Terms of Use]](https://openai.com/api/policies/terms/). The list of disallowed applications can be found in the usage guidelines [[Disallowed Applications]] (https://beta.openai.com/docs/usage-guidelines/disallowed-applications). ",unknown,"Email the authors [[InstructGPT Paper]](https://arxiv.org/pdf/2203.02155.pdf). -",https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md,unknown,60 petaflops/s-days,unknown,,,,,, -model,Whisper,OpenAI,Whisper is an audio transcription software.,2022-09-21,https://cdn.openai.com/papers/whisper.pdf,,audio; text,1.5B parameters (dense),,"The model was evaluated for zero-shot English and multingual speech recognition, translation, language identification and robustness to noise.",['Whisper dataset'],,,No specific quality control methods are documented.,open,MIT,"Whisper is a general-purpose speech recognition model; it is a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. +",,,,,,,,,, +model,Whisper,OpenAI,Whisper is an audio transcription software.,2022-09-21,https://cdn.openai.com/papers/whisper.pdf,https://github.com/openai/whisper/blob/main/model-card.md,audio; text,"The model was evaluated for zero-shot English and multingual speech recognition, translation, language identification and robustness to noise.",1.5B parameters (dense),['Whisper dataset'],unknown,unknown,Azure,No specific quality control methods are documented.,open,MIT,"Whisper is a general-purpose speech recognition model; it is a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. ",unknown,,"The discussions page of the codebase is not formally cited as a place for feedback, but is being used in this way [[Discussions page]](https://github.com/openai/whisper/discussions) -",https://github.com/openai/whisper/blob/main/model-card.md,unknown,unknown,Azure,,,,,, +",,,,,,,,,, model,CLIP,OpenAI,"""CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision"" [[CLIP Repository]](https://github.com/openai/CLIP). -",2021-01-05,https://arxiv.org/pdf/2103.00020.pdf,,"image, text; text",unknown,,"The model was evaluated on standard vision datasets (e.g. CIFAR10, ImageNet) and showed robust state of the art results.",['CLIP dataset'],,,"The authors found that the performance of the model depended heavily on which classes are included (and excluded) for a given task. They reported significant race and gender based disparities on the Fairface dataset, depending on how the classes were constructed. The authors also demonstrated that the model was capable of racial profiling with high accuracy [[Section 7]](https://arxiv.org/pdf/2103.00020.pdf#section.7). +",2021-01-05,https://arxiv.org/pdf/2103.00020.pdf,https://github.com/openai/CLIP/blob/main/model-card.md,"image, text; text","The model was evaluated on standard vision datasets (e.g. CIFAR10, ImageNet) and showed robust state of the art results.",unknown,['CLIP dataset'],unknown,71.12 petaflop/s-day,NVIDIA V100 GPUs,"The authors found that the performance of the model depended heavily on which classes are included (and excluded) for a given task. They reported significant race and gender based disparities on the Fairface dataset, depending on how the classes were constructed. The authors also demonstrated that the model was capable of racial profiling with high accuracy [[Section 7]](https://arxiv.org/pdf/2103.00020.pdf#section.7). ",open,MIT,"The model is intended to be used by AI researchers to better understand ""robustness, generalization, and other capabilities, biases, and constraints of computer vision models"" [[CLIP Model Card]](https://github.com/openai/CLIP/blob/main/model-card.md). ","""Any deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful. Certain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use. Since the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases"" [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.mdlicen). -",,Questions can be shared at the feedback form linked in the CLIP model card [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.mdlicen).,https://github.com/openai/CLIP/blob/main/model-card.md,unknown,71.12 petaflop/s-day,NVIDIA V100 GPUs,,,,,, +",,Questions can be shared at the feedback form linked in the CLIP model card [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.mdlicen).,,,,,,,,,, model,DALL·E,OpenAI,"DALL·E is a GPT-3 based model trained to generate images from text descriptions. The authors found that it had ""a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images"" [[OpenAI Blog Post]](https://openai.com/blog/dall-e/). -",2021-01-05,https://arxiv.org/pdf/2102.12092.pdf,,text; image,12B parameters (dense),,"The model was evaluated against three prior approaches, AttnGAN, DM-GAN, and DF-GAN using Inception Score and Fréchet Inception Distance on MS-COCO as metrics. The model was also evaluated by humans and received the majority of the votes in generating images that look realistic and better match the caption when compared to the images generated by DF-GAN [[Section]](https://arxiv.org/pdf/2102.12092.pdf#section.3). -",['DALL·E dataset'],,,unknown,limited,unknown,"""The model is intended for others to use for training their own generative models"" [[Model Card]](https://github.com/openai/DALL-E/blob/master/model_card.md). +",2021-01-05,https://arxiv.org/pdf/2102.12092.pdf,https://github.com/openai/DALL-E/blob/master/model_card.md,text; image,"The model was evaluated against three prior approaches, AttnGAN, DM-GAN, and DF-GAN using Inception Score and Fréchet Inception Distance on MS-COCO as metrics. The model was also evaluated by humans and received the majority of the votes in generating images that look realistic and better match the caption when compared to the images generated by DF-GAN [[Section]](https://arxiv.org/pdf/2102.12092.pdf#section.3). +",12B parameters (dense),['DALL·E dataset'],unknown,unknown,NVIDIA V100 GPUs,unknown,limited,unknown,"""The model is intended for others to use for training their own generative models"" [[Model Card]](https://github.com/openai/DALL-E/blob/master/model_card.md). ",unknown,,"Contact the paper author(s) specified on the paper [[Paper]](https://arxiv.org/pdf/2102.12092.pdf). -",https://github.com/openai/DALL-E/blob/master/model_card.md,unknown,unknown,NVIDIA V100 GPUs,,,,,, -model,Jukebox,OpenAI,Jukebox is a generative model that produces music,2020-04-30,https://arxiv.org/abs/2005.00341,,text; audio,5B parameters (dense),,Evaluations in paper are primarily considering the fidelity and novelty of samples from Jukebox.,['Jukebox Dataset'],,,,open,Noncommercial Use License,,,,,,unknown,4 weeks,510 V100s,,,,,, +",,,,,,,,,, +model,Jukebox,OpenAI,Jukebox is a generative model that produces music,2020-04-30,https://arxiv.org/abs/2005.00341,,text; audio,Evaluations in paper are primarily considering the fidelity and novelty of samples from Jukebox.,5B parameters (dense),['Jukebox Dataset'],unknown,4 weeks,510 V100s,,open,Noncommercial Use License,,,,,,,,,,,,,, model,DALL·E 2,OpenAI,"""DALL·E 2 is an artificial intelligence model that takes a text prompt and/or existing image as an input and generates a new image as an output"" [[System Card]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md). The model wasn't fully released, but OpenAI released a version of the model (DALL·E 2 Preview) to a select group of testers. -",2022-04-13,https://arxiv.org/abs/2204.06125,,text; image,unknown,,The model is capable of generating explicit content and the researchers found limited amount of spurious content generated. The researchers also found that visual synonyms can be used to prompt the model to surface unwanted generations [[Probes and Evaluations]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#probes-and-evaluations).,"['DALL·E dataset', 'CLIP dataset']",,,The model is not fully released to the public as part of a quality control measure. The usage of the model by testers is monitored and user provided prompts are filtered [[Input filters]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#input-filters).,limited,unknown,"""The intended use of the DALL·E 2 Preview at this time is for personal, non-commercial exploration and research purposes by people who are interested in understanding the potential uses of these capabilities"" [[Use]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#use). +",2022-04-13,https://arxiv.org/abs/2204.06125,https://github.com/openai/dalle-2-preview/blob/main/system-card.md,text; image,The model is capable of generating explicit content and the researchers found limited amount of spurious content generated. The researchers also found that visual synonyms can be used to prompt the model to surface unwanted generations [[Probes and Evaluations]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#probes-and-evaluations).,unknown,"['DALL·E dataset', 'CLIP dataset']",unknown,unknown,unknown,The model is not fully released to the public as part of a quality control measure. The usage of the model by testers is monitored and user provided prompts are filtered [[Input filters]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#input-filters).,limited,unknown,"""The intended use of the DALL·E 2 Preview at this time is for personal, non-commercial exploration and research purposes by people who are interested in understanding the potential uses of these capabilities"" [[Use]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#use). ","Use of the model is governed by the OpenAI Content Policy, which prohibits posting of G rated content. Users are not allowed to utilize the model in commercial products in the preview version [[Content Policy]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#policies-and-enforcement).","Uses of the model are monitored. In the preview version, any user can flag content. The specific policies for monitoring are not disclosed, but possible measures include disabling of accounts violating the content policies [[Monitoring and Reporting]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#monitoring-and-reporting). -'",Feedback can be provided at support at openai.com.,https://github.com/openai/dalle-2-preview/blob/main/system-card.md,unknown,unknown,unknown,,,,,, +'",Feedback can be provided at support at openai.com.,,,,,,,,,, application,OpenAI API,OpenAI,"OpenAI API is a general purpose ""text in, text out"" interface connecting users with a suite of language models. The API was initially released as a gateway to GPT-3, but it now supports access to other, more specialized OpenAI models. [[Open AI Blog Post]](https://openai.com/blog/openai-api/) -",2020-06-11,https://openai.com/api/,,,,,,"['GPT-3', 'Codex', 'code-davinci-002', 'text-davinci-002', 'text-davinci-003', 'gpt-3.5-turbo', 'Whisper', 'DALL·E', 'GPT-4', 'GPT-4 Turbo']",,,"Given a prompt, OpenAI API checks whether a completion contains unsafe language using its filters and marks the completion accordingly if so. The API also provides developers with special endpoints that scope the API usage. OpenAI also developed user guidelines to help developers understand safety issues [[OpenAI API]](https://openai.com/api/). +",2020-06-11,https://openai.com/api/,,,,,"['GPT-3', 'Codex', 'code-davinci-002', 'text-davinci-002', 'text-davinci-003', 'gpt-3.5-turbo', 'Whisper', 'DALL·E', 'GPT-4', 'GPT-4 Turbo']",,,,"Given a prompt, OpenAI API checks whether a completion contains unsafe language using its filters and marks the completion accordingly if so. The API also provides developers with special endpoints that scope the API usage. OpenAI also developed user guidelines to help developers understand safety issues [[OpenAI API]](https://openai.com/api/). ",limited,custom,"OpenAI API was designed to be used by developers to empower applications, and researchers to study language models [[Section 3]](https://openai.com/api/policies/terms/). ","OpenAI API Terms of Use prohibits the use of the API in a way violating the applicable law, including: (i) ""Illegal activities, such as child pornography, gambling, cybercrime, piracy, violating copyright, trademark or other intellectual property laws""; (ii) ""Accessing or authorizing anyone to access the APIs from an embargoed country, region, or territory as prohibited by the U.S. government""; (iii) ""Threatening, stalking, defaming, defrauding, degrading, victimizing or intimidating anyone for any reason"". The usage requirements are detailed in the Terms of Use [[Section 3]](https://openai.com/api/policies/terms/). ","OpenAI may monitor the API use to ensure ""quality and improve OpenAI systems, products and services; perform research; and ensure compliance"" with the Terms of Service and all applicable laws. Users of the API will give OpenAI reasonable access to their application to monitor compliance with the terms listed in the Terms of Service [[Section 5(b)]](https://openai.com/api/policies/terms/). Apps using the OpenAI API should submit an application once they are deployed to real users. The review form takes 10 minutes to complete and over 97% of the applications are directly accepted or conditionally accepted. The applicants are notified of the decision within 2 business days [[App Review Guidelines]] (https://beta.openai.com/docs/usage-guidelines/app-review). -",unknown,,,,,The API exposes the models fairly direclty with a range of hyperparameters (e.g. temperature scaling).,"Given a prompting text, the OpenAI API provides access to text completions, and log probabilities. The support for text and code embeddings were added on 2022-01-25 [[OpenAI Blog Post]] (https://openai.com/blog/introducing-text-and-code-embeddings/). -",https://openai.com/api/policies/terms/,unknown,unknown,unknown -model,VPT,OpenAI,,2022-06-23,https://arxiv.org/abs/2206.11795,,video; video,500M parameters (dense),,,['web_clean'],,,,open,MIT,,,,,,,,,,,,,, -dataset,web_clean,OpenAI,,2022-06-23,https://arxiv.org/abs/2206.11795,,video,70k hours,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -application,ChatGPT,OpenAI,ChatGPT is an artificial intelligence chatbot developed by OpenAI.,2022-11-30,https://openai.com/blog/chatgpt,,,,,,"['gpt-3.5-turbo', 'OpenAI toxicity classifier']",,,,limited,custom,,,,,,,,,,,,100M,, -model,gpt-3.5-turbo,OpenAI,,2023-03-01,https://platform.openai.com/docs/models/gpt-3-5,,text; text,unknown,,,['gpt-3.5-turbo dataset'],,,,limited,custom,,,,,,,,,,,,,, -model,GPT-4 Turbo,OpenAI,GPT-4 Turbo is a more capable version of GPT-4 and has knowledge of world events up to April 2023. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt.,2023-11-06,https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,,text; text,unknown,,,[],,,,limited,custom,,,unknown,,,unknown,unknown,unknown,,,,,, -dataset,gpt-3.5-turbo dataset,OpenAI,,2023-03-01,https://platform.openai.com/docs/models/gpt-3-5,,text,unknown,[],,[],,,,limited,unknown,,,,,,,,,,,,,, -dataset,code-davinci-002 dataset,OpenAI,,2022-05-01,https://platform.openai.com/docs/model-index-for-researchers,,"code, text",unknown,[],,[],,,,limited,unknown,,,,,,,,,,,,,, -model,code-davinci-002,OpenAI,,2022-05-01,https://platform.openai.com/docs/model-index-for-researchers,,"text; code, text",unknown,,,['code-davinci-002 dataset'],,,,limited,unknown,,,,,,,,,,,,,, -model,text-davinci-002,OpenAI,,2022-05-01,https://platform.openai.com/docs/model-index-for-researchers,,text; text,unknown,,,['code-davinci-002'],,,,limited,unknown,,,,,,,,,,,,,, -model,text-davinci-003,OpenAI,,2022-11-30,https://platform.openai.com/docs/model-index-for-researchers,,text; text,unknown,,,['text-davinci-002'],,,,limited,unknown,,,,,,,,,,,,,, -application,Whisper API,OpenAI,API to query OpenAI's Whisper model.,2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,['Whisper'],,,,open,custom,,,,,,,,,,,,,, -application,ChatGPT API,OpenAI,API to query OpenAI's ChatGPT model.,2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,['ChatGPT'],,,,open,custom,,,,,,,,,,,,,, -application,OpenAI Moderation API,OpenAI,This endpoint provides OpenAI API developers with free access to GPT-based classifiers that detect undesired content—an instance of using AI systems to assist with human supervision of these systems.,2022-08-10,https://openai.com/blog/new-and-improved-content-moderation-tooling,,,,,,['OpenAI toxicity classifier'],,,,open,custom,,,,,,,,,,,,,, -model,OpenAI toxicity classifier,OpenAI,,2023-01-18,"https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.",,text; text,unknown,,,['OpenAI toxicity dataset'],,,,closed,unknown,,,,,,,,,,,,,, -dataset,OpenAI toxicity dataset,OpenAI,,2023-01-18,"https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.",,text,unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -application,Sage API,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,,,,,['Sage'],,,,limited,unknown,,,,,,,,,,,,,, -application,Dragonfly API,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,,,,,['Dragonfly'],,,,limited,unknown,,,,,,,,,,,,,, -model,Sage,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,text; text,unknown,,,[],,,,limited,unknown,,,,,,,,,,,,,, -model,Dragonfly,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,text; text,unknown,,,[],,,,limited,unknown,,,,,,,,,,,,,, -application,ChatGPT for Slack,"OpenAI, Salesforce","The app integrates ChatGPT’s powerful AI technology to deliver instant conversation summaries, research tools, and writing assistance directly in Slack to help millions of companies work more productively.",2023-03-07,https://www.salesforce.com/news/stories/chatgpt-app-for-slack/,,,,,,['ChatGPT API'],,,,limited,unknown,,,,,,,,,,,,,, -model,GPT-4,OpenAI,,2023-03-14,https://arxiv.org/abs/2303.08774,,"image, text; image, text",unknown,,,[],,,,limited,unknown,,,,,,,,,,,,,, -application,GPT-4 API,OpenAI,"GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses",2023-03-14,https://openai.com/product/gpt-4,,,,,,['GPT-4'],,,,limited,custom,,,,,,,,,,,,,, -application,ChatGPT Enterprise,OpenAI,"ChatGPT Enterprise offers enterprise-grade security and privacy, unlimited higher-speed GPT-4 access, longer context windows for processing longer inputs, advanced data analysis capabilities, and customization options compared to OpenAI's previous offerings.",2023-08-28,https://openai.com/enterprise,,,,,,['GPT-4'],,,,limited,custom,,,,,,,,,,,https://openai.com/policies/terms-of-use,,, -model,DALL·E 3,OpenAI,"DALL·E 3 is an artificial intelligence model that takes a text prompt and/or existing image as an input and generates a new image as an output The model is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October.",2023-09-20,https://openai.com/dall-e-3,,text; image,unknown,,The model is capable of generating explicit content and the researchers found limited amount of spurious content generated.,"['DALL·E 2 dataset', 'CLIP dataset', 'ChatGPT']",,,"DALL·E 3 has mitigations to decline requests that ask for a public figure by name. We improved safety performance in risk areas like generation of public figures and harmful biases related to visual over/under-representation, in partnership with red teamers—domain experts who stress-test the model—to help inform our risk assessment and mitigation efforts in areas like propaganda and misinformation.",limited,custom,"The intended use of the DALL·E 3 Preview at this time is for personal, non-commercial exploration and research purposes by people who are interested in understanding the potential uses of these capabilities","Use of the model is governed by the OpenAI Content Policy, which prohibits posting of G rated content. Users are not allowed to utilize the model in commercial products in the preview version.","Uses of the model are monitored. In the preview version, any user can flag content. The specific policies for monitoring are not disclosed, but possible measures include disabling of accounts violating the content",Feedback can be provided at openai.com,,unknown,unknown,unknown,,,,,, -model,Sora,OpenAI,Sora is an AI model that can create realistic and imaginative scenes from text instructions.,2024-02-15,https://openai.com/sora,,"text; image, video",unknown,,,[],,,,limited,unknown,,,unknown,,,unknown,unknown,unknown,,,,,, -model,Ideogram 1.0,Ideogram AI,"Ideogram 1.0 is Ideogram’s most advanced text-to-image model, as of release.",2024-02-28,https://about.ideogram.ai/1.0,,text; image,unknown,,Compared to DALL·E 3 based on a qualitative user comparison.,[],,,,limited,unknown,,,,,,unknown,unknown,unknown,,,,,, -dataset,FinPile,Bloomberg,"A comprehensive dataset consisting of a range of English financial documents including news, filings, press releases, web-scraped financial documents, and social media drawn from the Bloomberg archives that was used to train the BloombergGPT model.",2023-03-30,https://arxiv.org/pdf/2303.17564.pdf#section.2,https://arxiv.org/pdf/2303.17564.pdf#section.2,text,363B tokens,[],,[],"FinPile consists of English financial documents. Authors utilize the The Bloomberg -Terminal, which is an extensive collection of curated and maintained documents, -to create the FinPile dataset. Each document in FinPile is time-stamped, with -dates ranging from 2007-03-01 to 2022-07-31. -Types of data included are given below: - 1. Web (298B tokens) - Inclues Bloomberg's web crawl focused on high-quality -websites that have financially relevant information. This makes up the majority -of FinPile. - 2. News (38B tokens) - Includes all news sources relevant to the financial -community, excluding news articles written by Bloomberg journalists. Overall, -there are hundreds of English news sources in FinPile including ""Bloomberg -Transcripts"", which are transcripts of Bloomberg TV news. - 3. Filings (14B tokens) - Includes financial statements prepared by (public) -companies and made available to the general public. In the dataset, a majority -of the filings come from EDGAR, which is the SEC's online database. - 4. Press (9B tokens) - Includes press releases typically issued by companies -that are financially relevant. - 5. Bloomberg (5B tokens) - Includes Bloomberg authored news and other documents -such as opinions and analyses. The largest sources are “Bloomberg News” and -“Bloomberg First Word”, the Bloomberg-authored wire of real-time news. -",,,closed,unknown,Used to train the BloombergGPT model.,,,,,,,,,,,,, -model,BloombergGPT,Bloomberg,BloombergGPT is a 50 billion parameter large language model that is specifically trained on a wide range of financial data to support a diverse set of natural language processing tasks within the financial industry.,2023-03-30,https://arxiv.org/abs/2303.17564,,text; text,50B parameters (dense),,"Authors evaluate the performance of BloombergGPT on two broad categories of tasks, finance-specific and general purpose, on several standard benchmarks. They compare BloombergGPT to the three closest models: GPT-NeoX, OPT-66B and BLOOM-176B. They also report results from the original GPT-3 whenever externally available. They conclude ""We achieve strong results on general LLM benchmarks and outperform comparable models on financial tasks. We attribute this, in decreasing order of impact, to 1. a well-curated internal dataset, 2. our unique choice in tokenizer, and 3. an up-to-date architecture."" -","['FinPile', 'The Pile', 'C4', 'Wikipedia']",,,"Authors state the following: -- ""To provide natural language applications to the financial community, we - have developed a rigorous risk and testing assessment process. This process - includes careful annotation guidelines Tseng et al. (2020), pre-launch review - at multiple levels by the central risk and compliance organizations, and - by the product leaders (e.g., the newsroom) as applicable, and post-launch - monitoring. Moreover, we conduct our research, development, and deployment - of NLP and AI systems in accordance with all applicable regulations."" -- ""Similarly, toxicity and bias are areas where, as a company, we take extraordinary - care with any content we produce, whether from humans or machines. Since - the measurement of toxicity and bias in our model depends on its application - areas, quantifying the potential for the generation of harmful language - remains an open question. We are particularly interested in studying whether - FinPile, which is cleaner and contains fewer examples of overtly biased - or toxic language (e.g., Press Releases), reduces the proclivity of the - model to generate inappropriate content."" -",closed,unknown,"""This model will assist Bloomberg in improving existing financial NLP tasks, such as sentiment analysis, named entity recognition, news classification, and question answering, among others. Furthermore, BloombergGPT will unlock new opportunities for marshalling the vast quantities of data available on the Bloomberg Terminal to better help the firm's customers, while bringing the full potential of AI to the financial domain."" -",,,,,unknown,53 days,64 Amazon EC2 p4d.24xlarge instances each with 8 NVIDIA 40GB A100 GPUs (i.e. total 512 A100 GPUs),,,,,, -dataset,Common Corpus,Pleias,"Common Corpus is the largest public domain dataset released for training Large Language Models (LLMs). This dataset includes 500 billion words from a diverse range of cultural heritage initiatives and is the largest corpus in English, French, Dutch, Spanish, German and Italian. It supports efforts to train fully open LLMs on sources without copyright concerns.",2024-03-20,https://huggingface.co/blog/Pclanglais/common-corpus,,text,500 billion words,[],unknown,[],"The dataset includes 500 billion words from a wide diversity of cultural heritage initiatives. It also has the largest English-speaking dataset to date with 180 billion words, including a major US collection of 21 million digitized newspapers and large monographs datasets collected by digital historian Sebastian Majstorovic. It also contains a huge volume of data in French (110 billion words), German (30 billion words), Spanish, Dutch and Italian, as well as data in low-resource languages that are currently underrepresented.",The data excluded are those that have copyright issues.,"All data included in the corpus are from fully open and auditable sources, ensuring they are copyright-free.",open,,"The dataset is intended to support open and reproducible AI research, enhancing accessibility, diversity, and democracy in AI by enabling everyone to explore large models.",It should not be used for tasks that infringe on copyright laws.,unknown,unknown,,,,,,,,,, -application,Cformers,Nolano,Cformers is a set of transformers that act as an API for AI inference in code.,2023-03-19,https://www.nolano.org/services/Cformers/,,,,,,[],,,,limited,MIT,,,,,,,,,,,,,, -model,Platypus,Boston University,Platypus is a family of fine-tuned and merged Large Language Models (LLMs).,2023-08-14,https://arxiv.org/pdf/2308.07317.pdf,,text; text,13B parameters (dense),,Platypus achieves the strongest performance and currently stands at first place in HuggingFace’s Open LLM Leaderboard as of its release date.,"['LLaMA 2', 'Platypus curated dataset']",,,,open,CC by-NC-SA 4.0,,,,https://huggingface.co/garage-bAInd/Platypus2-13B/discussions,https://huggingface.co/garage-bAInd/Platypus2-13B,,5 hours,1 A100 GPU,,,,,, -model,UFOGen,Boston University,"UFOGen is a novel generative model designed for ultra-fast, one-step text-to-image synthesis.",2023-11-14,https://arxiv.org/pdf/2311.09257.pdf,,text; image,900M parameters (dense),,UFOGen is evaluated on standard image benchmarks against other models fine-tuned with Stable Diffusion.,['Stable Diffusion'],,,,open,unknown,,,,,,unknown,unknown,unknown,,,,,, -application,Nextdoor Assistant,Nextdoor,AI chatbot on Nextdoor that helps users write more clear and conscientious posts.,2023-05-02,https://help.nextdoor.com/s/article/Introducing-Assistant,,,,,,['ChatGPT'],,,,open,unknown,to be used to help make the Nextdoor experience more positive for users,,,,,,,,,natural language text guidance,,,, -dataset,You dataset,You,,,https://you.com/,,text,unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,You model,You,,,https://you.com/,,text; text,unknkown,,,['You dataset'],,,,closed,unknown,,,,,,,,,,,,,, -application,You Search,You,You.com is a search engine built on artificial intelligence that provides users with a customized search experience while keeping their data 100% private.,,https://you.com/,,,,,,['You model'],,,,open,unknown,,,,,,,,,,,,,, -dataset,SBU Captions,Stony Brook University,"SBU Captions Dataset is a collection of 1 million images and associated captions from Flickr, filtered so that the descriptions are likely to refer to visual content. -",2011-12-12,https://proceedings.neurips.cc/paper/2011/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf,,"image, text",1M image-text pairs,[],"Authors perform two quantitative evaluations for image captioning - direct user ratings of relevance and BLEU score. They also propose a new evaluation task: ""we propose a new evaluation task where a user is presented with two photographs and one caption. The user must assign the caption to the most relevant image. For evaluation we use a query image, a random image and a generated caption."" -",['Flickr'],"""To encourage visual descriptiveness in our collection, we select only those images with descriptions of satisfactory length based on observed lengths in visual descriptions. We also enforce that retained descriptions contain at least 2 words belonging to our term lists and at least one prepositional word, e.g. “on”, “under” which often indicate visible spatial relationships."" -","""This produces a very large, but noisy initial set of photographs with associated text. We filter this set of photos so that the descriptions attached to a picture are relevant and visually descriptive."" -",unknown,open,,,,,,,,,,,,,,, -dataset,MassiveText,Google Deepmind,"The MassiveText dataset was used to train the Gopher model. -",2021-12-08,https://arxiv.org/pdf/2112.11446.pdf,https://arxiv.org/pdf/2112.11446.pdf#subsection.A.5,"code, text",10.5 TB,[],"MassiveText data was analyzed for toxicity, language distribution, URL breakdown, and tokenizer compression rates on the subsets [[Section A.2]](https://arxiv.org/pdf/2112.11446.pdf#subsection.A.2). -",[],"MassiveText data come from 6 sources: MassiveWeb (48%), Books (27%), C4 (10%), News (10%), GitHub (3%), and Wikipedia (2%). MassiveWeb is a web text corpus curated for MassiveText. -","Documents that are not in English are excluded. -","The authors use simple heuristics for filtering low quality documents as opposed to relying on a classifier based on a ""gold"" set such as the English Wikipedia, which could ""inadvertently bias towards a certain demographic or erase certain dialects or sociolects from representation."" MassiveWeb subset was filtered using Google’s SafeSearch filter, preferring it over to word filters that ""disproportinately filter out inoffensive content associated with minority groups. MassiveWeb was filtered further for word or phrase repetitions. All the subsets were filtered for document deduplication and test set contamination"" [[Appendix A]](https://arxiv.org/pdf/2112.11446.pdf#appendix.A). -",closed,unknown,"Pre-training of language models by DeepMind researchers [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.C). -",unknown,unknown,unknown,,,,,,,,,, -dataset,M3W,Google Deepmind,"M3W (MassiveWeb) is dataset used to train Flamingo, and other vision-language models and was created by researchers and engineers. -",2022-04-29,https://arxiv.org/pdf/2204.14198.pdf,https://arxiv.org/pdf/2204.14198.pdf#appendix.F,"code, text","182GB Text, 185M Images",[],,[],"M3W has interleaved images (185M) and text (182GB) from the web. -",unknown,"The authors provide a basic description of data processing and cleaning. -",closed,unknown,"Pre-training of vision and language models by DeepMind researchers [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F). -",unknown,unknown,,,,,,,,,,, -dataset,Gato dataset,Google Deepmind,"The Gato datasets are a collection of data used to train the Gato model. -",2022-05-12,https://www.deepmind.com/blog/a-generalist-agent,,"image, text, robotics trajectories, simulated control tasks","10.5 TB Text, 2.2B Text-Image pairs, 1.5T tokens of simulated control, 500k robotics trajectories",[],"The Gato dataset compiles many datasets introduced in prior works, with associated analyses. -",['MassiveText'],"The full composition of the dataset across individual sources can be found in the paper. -",,unknown,closed,unknown,unknown,unknown,,,,,,,,,,,, -model,AlphaFold2,Google Deepmind,AlphaFold2 is a protein language model trained on protein sequences,2021-07-15,https://www.nature.com/articles/s41586-021-03819-2,,amino acid sequence; protein structure,93M parameters (dense),,,['Protein Data Bank'],,,,open,Apache 2.0,,,,,,,11 days,128 TPUv3 cores,,,,,, -model,Flamingo,Google Deepmind,"Flamingo is a Visual Language Model using the Transformer architecture that is intended for few-shot learning. -",2022-04-29,https://arxiv.org/pdf/2204.14198.pdf,,"image, text; text",80B parameters (dense),,"Model performance was evaluated on image and video datasets primarily, including dialogue. -","['M3W', 'ALIGN', 'LTIP', 'VTP', 'Chinchilla']",,,,closed,unknown,"The intended uses are stated in the model card: ""The primary use is research on visual language models (VLM), including: research on VLM applications like classification, captioning or visual question answering, understanding how strong VLMs can contribute to AGI, advancing fairness and safety research in the area of multimodal research, and understanding limitations of current large VLMs."" [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E). -","The model card lists the following as out of scope uses of the model: ""Uses of the model for visually conditioned language generation in harmful or deceitful settings. Broadly speaking, the model should not be used for downstream applications without further safety and fairness mitigations specific to each application."" [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E). -",unknown,,https://arxiv.org/pdf/2204.14198.pdf#appendix.E,unknown,15 days on 1536 TPUs,TPU,,,,,, -model,AlphaCode,Google Deepmind,AlphaCode is an autoregressive language model trained on code,2022-02-02,https://arxiv.org/abs/2203.07814,,text; code,41B parameters (dense),,,[],,,,closed,unknown,,,,,,,,,,,,,, -model,Gopher,Google Deepmind,"Gopher is an autoregressive language model based on the Transformer architecture with two modifications: using RMSNorm instead of LayerNorm and using relative positional encoding scheme instead of absolute positional encodings [[Section 3]](https://arxiv.org/pdf/2112.11446.pdf#subsection.3.1). -",2021-12-08,https://arxiv.org/pdf/2112.11446.pdf,,text; code,280B parameters (dense),,"Model performance was evaluated and analyzed on 152 NLP tasks including: Language Modelling (20), Reading Comprehension (3), Fact Checking (3), Question Answering (3), Common Sense (4), MMLU (57), BIG-bench (62) [[Section 4]](https://arxiv.org/pdf/2112.11446.pdf#section.4); on toxicity and bias datasets [[Section 5]](https://arxiv.org/pdf/2112.11446.pdf#section.5); and on dialogue tasks [[Section 6]](https://arxiv.org/pdf/2112.11446.pdf#section.6). -",['MassiveText'],,,,closed,unknown,"The intended uses are stated in the Gopher model card: ""The primary use is research on language models, including: research on NLP applications like machine translation and question answering, understanding how strong language models can contribute to AGI, advancing fairness and safety research, and understanding limitations of current LLMs"" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B). -","The model card lists the following as out of scope uses of the model: ""for language generation in harmful or deceitful settings. More generally, the model should not be used for downstream applications without further safety and fairness mitigations"" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B). -",unknown,"The feedback for the model can be provided at the email linked in the model card, geoffreyi at google.com [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B). -",https://arxiv.org/pdf/2112.11446.pdf#appendix.B,380 tCO2e,7303.24 petaflop/s-day,TPUv3 pods,,,,,, -model,Chinchilla,Google Deepmind,"Chinchilla is an autoregressive language model based on the Transformer architecture with improved scaling laws. -",2022-03-29,https://arxiv.org/pdf/2203.15556.pdf,,text; code,70B parameters (dense),,"Model performance was evaluated and analyzed on many NLP tasks including language modeling, reading comprehension, question answering, commonsense-intensive tasks, and the BIG-Bench and MMLU meta-benchmarks. -",['MassiveText'],,,,closed,unknown,"The intended uses are stated in the Chinchilla model card: ""The primary use is research on language models, including: research on the scaling behaviour of language models along with those listed in Gopher paper"" [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I). -","The model card lists the following as out of scope uses of the model: ""for language generation in harmful or deceitful settings. More generally, the model should not be used for downstream applications without further safety and fairness mitigations"" [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I). -",Unknown,"The feedback for the model can be provided at the email linked in the model card, {jordanhoffmann, sborgeaud, amensch,sifre} at deepmind.com [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I). -",https://arxiv.org/pdf/2203.15556.pdf,Unknown,7303.24 petaflop/s-day,TPUv3/TPUv4 pods,,,,,, -model,Gato,Google Deepmind,"Gato is a generalist agent based on sequence modeling using the Transformer architecture to implement multi-modal, multi-task, multi-embodiment generalist policy. -",2022-05-12,https://www.deepmind.com/blog/a-generalist-agent,,"image, text; image, text, robotics trajectories",1.2B parameters (dense),,"Model performance was evaluated on simulated and robotics task primarily, including out-of-distribution and skill generalization. -",['Gato dataset'],,,,closed,unknown,"The intended uses are stated in the Gopher model card: ""Learn to accomplish a wide variety of tasks from expert demonstrations, such as playing video games, controlling simulated embodiments, and real world block stacking."" [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A). -","The model card lists the following as out of scope uses of the model: ""Not intended for commercial or production use. Military uses are strictly prohibited."" [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A). -",unknown,"The feedback for the model can be provided at the email linked in the model card, reedscot at google.com [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A). -",https://openreview.net/pdf?id=1ikK0kHjvj#appendix.B,unknown,4 days on a 16x16 TPU v3 slice,16x16 TPU v3 slice,,,,,, -model,Sparrow,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text; text,70B parameters (dense),,,"['Chinchilla', 'Google Search', 'Sparrow Rule reward model', 'Sparrow Preference reward model']",,,,closed,unknown,,,,,,,,,,,,,, -model,RETRO,Google Deepmind,,2021-12-08,https://arxiv.org/abs/2112.04426,,text; text,7.5B parameters (dense),,,['MassiveText'],,,,closed,unknown,,,,,,,,,,,,,, -model,Sparrow Rule reward model,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text; text,70B parameters (dense),,,"['Chinchilla', 'Sparrow adversarial probing dataset']",,,,closed,unknown,,,,,,,,,,,,,, -model,Sparrow Preference reward model,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text; text,70B parameters (dense),,,"['Chinchilla', 'Sparrow response preference dataset']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,Sparrow adversarial probing dataset,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text,27k ratings,[],,['Chinchilla'],,,,closed,unknown,,,,,,,,,,,,,, -dataset,Sparrow response preference dataset,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text,72k comparisons,[],,['Chinchilla'],,,,closed,unknown,,,,,,,,,,,,,, -model,GopherCite,Google Deepmind,,2022-03-16,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,text; text,280B parameters (dense),,,"['Gopher', 'Google Search', 'GopherCite reward model']",,,,closed,unknown,,,,,,,,,,,,,, -model,GopherCite reward model,Google Deepmind,,2022-03-16,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,text; text,7B parameters (dense),,,"['Gopher', 'GopherCite Preference dataset']",,,,closed,unknown,,,,,,,,,,,,,, -dataset,GopherCite Preference dataset,Google Deepmind,,2022-03-16,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,text,33k response pairs,[],,"['Gopher', 'Google Search']",,,,closed,unknown,,,,,,,,,,,,,, -model,Dramatron,Google Deepmind,,2022-09-29,https://arxiv.org/abs/2209.14958,,text; text,70B parameters (dense),,,['Chinchilla'],,,,closed,unknown,,,,,,,,,,,,,, -model,RT-2,Google Deepmind,RT-2 is a vision-language-action model for robotic actions that incorporates chain of thought reasoning.,2023-07-28,https://arxiv.org/pdf/2307.15818.pdf,,"text, video; text, robotics trajectories",55B parameters (dense),,Evaluated on evaluation trajectories and SoTA baselines using robotic data.,"['PaLI-X', 'PaLM-E', 'RT-2 action tokens']",,,,open,unknown,,,,,,,,,,,,,, -model,Lyria,Google Deepmind,Lyria is DeepMind's most advanced AI music generation model to date.,2023-11-16,https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/,,text; music,unknown,,unknown,[],,,worked with artists and music industry to ensure utility,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,Genie,Google DeepMind,"Gene is a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.",2024-02-23,https://sites.google.com/view/genie-2024,,image; video,11B parameters (dense),,Evaluated using only out-of-distribution image prompts for qualitative results.,[],,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -dataset,YT-Temporal-1B,University of Washington,,2022-01-07,https://arxiv.org/abs/2201.02639,,video,20M videos,[],,['YouTube'],,,,open,MIT,,,,,,,,,,,,,, -dataset,WebVid-10M,University of Oxford,"WebVid-10M is a large-scale dataset of short videos with textual descriptions sourced from stock footage sites. -",2021-04-01,https://m-bain.github.io/webvid-dataset/,,"text, video","10.7M video-text pairs, 52K hours video",[],unknown,[],Video URLs and textual description annotations,unknown,unknown,open,WebVid Dataset Terms,"""Provided you keep to these rules, the University grants you (the researcher) a non-exclusive and non-transferable licence to use the content free of charge strictly for non-commercial research (i.e., whose output artefacts are not incorporated in commercial products) for 12 months."" -","- Authors note the following prohibited uses: ""You must not use the content other than for the Permitted Purpose in strict conformity with these terms and any other reasonable instructions of the University. You must not, except as may be strictly necessary for carrying out the Permitted Purpose, provide or otherwise make available content to any third party or allow use of it or them by or on behalf of any third party, in whole or in part, whether by way of sale, resale, loan, transfer, hire or any other form of exploitation; or attempt to identify any living or deceased individual from the content."" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md) -- Authors also note the following limitations of the dataset: ""We note that data sourced from the web may be prone to biases and may contain graphic content. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data."" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F) -",,Feedback can be given by emailing at maxbain at robots.ox.ac.uk,,,,,,,,,, -dataset,WebVid-2M,University of Oxford,"WebVid-2M is a large-scale dataset of 2.5M short videos with textual descriptions sourced from stock footage sites. A subset of the WebVid-10M dataset. -",2021-04-01,https://m-bain.github.io/webvid-dataset/,,"text, video","2.5M video-text pairs, 13K hours video",[],unknown,['WebVid-10M'],Video URLs and textual description annotations,unknown,unknown,open,WebVid Dataset Terms,"""Provided you keep to these rules, the University grants you (the researcher) a non-exclusive and non-transferable licence to use the content free of charge strictly for non-commercial research (i.e., whose output artefacts are not incorporated in commercial products) for 12 months."" -","- Authors note the following prohibited uses: ""You must not use the content other than for the Permitted Purpose in strict conformity with these terms and any other reasonable instructions of the University. You must not, except as may be strictly necessary for carrying out the Permitted Purpose, provide or otherwise make available content to any third party or allow use of it or them by or on behalf of any third party, in whole or in part, whether by way of sale, resale, loan, transfer, hire or any other form of exploitation; or attempt to identify any living or deceased individual from the content."" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md) -- Authors also note the following limitations of the dataset: ""We note that data sourced from the web may be prone to biases and may contain graphic content. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data."" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F) -",,Feedback can be given by emailing at maxbain at robots.ox.ac.uk,,,,,,,,,, -application,Sana,Sana,"""Sana is your all-in-one, AI-assisted, online learning platform (LMS). Author employee training courses and measure team development with Sana's powerful analytics. Sana partners with the world's most important organizations and fastest-growing startups to make personalized, adaptive learning available for everyone, everywhere"" [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs). -",,https://www.sanalabs.com/,,,,,,['OpenAI API'],,,unknown,limited,custom,"Sana is intended to be used by employers to provide a learning service for their employees. -",,unknown,unknown,,,,,"Customized GPT-3, fine-tuned on private data [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs). -","question and answer, summarization, sentiment analysis,topic identification",https://www.sanalabs.com/legal/,unknown,unknown,unknown -dataset,NaturalInstructions-v2,AI2,,2022-04-16,https://arxiv.org/abs/2204.07705,,NLP tasks,1600 tasks,[],,[],,,,open,Apache 2.0,,,,,,,,,,,,,, -dataset,SODA,AI2,"SODA is the first publicly available, million-scale, high-quality dialogue dataset covering a wide range of social interactions.",2023-05-24,https://arxiv.org/pdf/2212.10465.pdf,https://huggingface.co/datasets/allenai/soda,text,1.5M dialogues,[],"Randomly sampled dialogues from dataset are evaluated according to six established criteria of natural flow, context dependence, topic consistency, speaker consistency, specificity, and overall.",[],,,,open,CC BY 4.0,,,,,,,,,,,,,, -dataset,Multimodal C4,AI2,An augmentation of C4 with images added and made openly available.,2023-06-09,https://arxiv.org/pdf/2304.06939.pdf,,"image, text",43B English tokens with 101.2M documents and 571M images,[],Conducted experiments on models trained with Multimodal C4 in comparison to models trained on single image/caption datasets,['C4'],,,,open,ODC-By,,,,,,,,,,,,,, -model,COSMO,AI2,COSMO is a conversation agent with greater generalizability on both in- and out-of-domain chitchat datasets,2023-05-24,https://arxiv.org/pdf/2212.10465.pdf,,text; text,11B parameters (dense),,Evaluated by human testers on generalization capabilities and responses compared to other chatbots.,"['SODA', 'ProsocialDialog', 'T5']",,,,open,,,,,https://huggingface.co/allenai/cosmo-xl/discussions,https://huggingface.co/allenai/cosmo-xl,unknown,unknown,v3-128 TPU accelerators with batch size 256,,,,,, -dataset,Dolma,AI2,"Dolma is a dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials",2023-08-18,https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64,https://drive.google.com/file/d/12gOf5I5RytsD159nSP7iim_5zN31FCXq/view,text,3T tokens,[],Compared with other open and closed datasets in regards to size and quality control.,[],,,,open,AI2 ImpACT,,,"Quality filtration, deduplication, and risk mitigation via logistic qualifiers and regular expressions used.",,,,,,,,,,, -dataset,Tulu-V2-mix,AI2,Tulu-V2-mix is a dataset composed of many high-quality instruction datasets that results in stronger performance across a variety of reasoning and knowledge-probing tasks.,2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture,text,unknown,[],Models trained with dataset evaluated on downstream performance.,"['FLAN Collection', 'Open Assistant 1', 'ShareGPT', 'Alpaca dataset', 'Code Alpaca', 'LIMA', 'WizardLM', 'OpenOrca']",,,,open,ODC-BY,,,,https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture/discussions,,,,,,,,,, -model,Tulu 2,AI2,Tulu 2 is a language model trained on the new Tulu-v2-mix dataset and fine-tuned on more state of the art language models.,2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,,text; text,70B parameters (dense),,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,"['LLaMA 2', 'Tulu-V2-mix']",,,,open,AI2 ImpACT,,,,https://huggingface.co/allenai/tulu-2-70b/discussions,https://huggingface.co/allenai/tulu-2-70b,unknown,unknown,unknown,,,,,, -model,Tulu 2 DPO,AI2,"Tulu 2 DPO is created in a similar manner to Tulu 2, but with Direct Preference Optimization (DPO).",2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,,text; text,70B parameters (dense),,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,"['LLaMA 2', 'Tulu-V2-mix']",,,,open,AI2 ImpACT,,,,https://huggingface.co/allenai/tulu-2-dpo-70b/discussions,https://huggingface.co/allenai/tulu-2-dpo-70b,unknown,unknown,unknown,,,,,, -model,Code Tulu 2,AI2,"Code Tulu 2 is a fine-tuned version of Code LLaMA that was trained on a mix of publicly available, synthetic and human datasets.",2023-11-20,https://arxiv.org/pdf/2311.10702.pdf,,"text; code, text",13B parameters (dense),,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,"['Code LLaMA', 'Tulu-V2-mix']",,,,open,AI2 ImpACT,,,,https://huggingface.co/allenai/codetulu-2-13b/discussions,https://huggingface.co/allenai/codetulu-2-13b,unknown,unknown,unknown,,,,,, -model,OLMo,AI2,"Open Language Model (OLMo) is designed to provide access to data, training code, models, and evaluation code necessary to advance AI through open research to empower academics and researchers to study the science of language models collectively.",2024-02-01,https://allenai.org/olmo/olmo-paper.pdf,,text; text,7B parameters (dense),,"Evaluated on standard LLM tasks and benchmarks in comparison to LLaMA, Falcon, and MPT, in addition to other same-sized models.",['Dolma'],,,training data from Dolma filtered and deduplicated before being trained on.,open,Apache 2.0,,,unknown,https://huggingface.co/allenai/OLMo-7B/discussions,https://huggingface.co/allenai/OLMo-7B,75.05 tCo2eq,unknown,"27 nodes, with each node containing 8x NVIDIA A100-40GB GPUs provided by MosaicML",,,,,, -dataset,MADLAD-400,AI2,"MADLAD-400 is a document-level multilingual dataset based on Common Crawl, covering 419 languages in total.",2023-09-09,https://arxiv.org/abs/2309.04662,https://huggingface.co/datasets/allenai/MADLAD-400,text,3 trillion tokens,[],,['Common Crawl'],,,,open,CC BY 4.0,,,unknown,https://huggingface.co/datasets/allenai/MADLAD-400/discussions,,,,,,,,,, -model,VARCO-LLM,NCSOFT,VARCO-LLM is NCSOFT’s large language model and is trained on English and Korean.,2023-08-16,https://github.com/ncsoft/ncresearch,,text; text,13B parameters,,"Boasts the highest performance among the Korean LLMs of similar sizes that have been released to date, according to internal evaluations.",[],,,,closed,custom,"Developing various NLP-based AI services such as Q&A, chatbot, summarization, information extraction",,,,,unknown,unknown,unknown,,,,,, -application,UnderwriteGPT,Paladin Group and Dais Technology,UnderwriteGPT is the world's first generative AI underwriting tool.,2023-02-01,https://dais.com/underwritegpt/,,,,,,[],,,,limited,,,,,,,,,,,,,,, -model,Cerebras-GPT,Cerebras,"A Family of Open, Compute-efficient, Large Language Models. The family includes 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B models. All models in the Cerebras-GPT family have been trained in accordance with Chinchilla scaling laws (20 tokens per model parameter). [[Cerebras Blog Post]](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models) -",2023-03-28,https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/,,text; text,13B parameters (dense),,"""We evaluate our models on the PILE validation set comprising 380M tokens. We also evaluate the public checkpoints of Pythia, Eleuther (2022); OPT, Zhang et al. (2022); GPT-NeoX 20B, Black et al. (2022); and GPT-J 6B, Wang & Komatsuzaki (2021). We performed upstream (pre-training) evaluations of text prediction cross-entropy using the Pile validation and test splits. We performed downstream evaluations of text generation accuracy on standardized tasks using the Eleuther lm-evaluation-harness."" [[Evaluations]] (https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#evaluations). -",['The Pile'],,,"The Pile dataset has been thoroughly analyzed from various ethical standpoints such as toxicity analysis, gender bias, pejorative content, racially sensitive content etc. Only mitigations in standard Pile dataset pre-processing were employed when pre-training Cerebras-GPT. [[Risk, Bias, Ethical Considerations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#risk-bias-ethical-considerations) -",open,Apache 2.0,"""The primary intended use is to further research into large language models. These models can be used as a foundation model for NLP, applications, ethics, and alignment research. Our primary intended users are researchers who are working to improve LLMs and practitioners seeking reference implementations, training setups, hyperparameters, or pre-trained models. We release these models with a fully permissive Apache license for the community to use freely."" [[Uses and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#uses-and-limitations). -","Authors note the following limitations of the model: ""Cerebras-GPT models are trained on the Pile, with English language only, and are not suitable for machine translation tasks. Cerebras-GPT models have not been tuned for human-facing dialog applications like chatbots and will not respond to prompts in a similar way to models that have received instruction tuning or reinforcement learning from human feedback (RLHF) like Flan-T5 or ChatGPT."" [[Uses and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#out-of-scope-use). -",,,https://huggingface.co/cerebras/Cerebras-GPT-13B,,,16x Cerebras CS-2 wafer scale systems,,,,,, -model,Jais,"Inception Institute of Artificial Intelligence, Cerebras, Mohamed bin Zayed University of Artificial Intelligence",Jais is the world’s most advanced Arabic LLM as of its release.,2023-08-30,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,,text; text,13B parameters (dense),,Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.,"['GPT-3', 'The Pile']",,,,open,Apache 2.0,Jais is released with the aim to stimulate research and development in the Arabic NLP community.,"Generating or endorsing hate speech, disseminating false information, engaging in illegal activities, managing sensitive data, attempting language generalization beyond Arabic and English, and making critical decisions with high stakes.",unknown,,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,unknown,unknown,Condor Galaxy Supercomputer,,,,,, -model,Jais Chat,"Inception Institute of Artificial Intelligence, Cerebras, Mohamed bin Zayed University of Artificial Intelligence","Jais Chat is an instruction-tuned version of Jais, optimized for dialog interaction.",2023-08-30,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,,text; text,13B parameters (dense),,Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.,"['GPT-3', 'The Pile']",,,,open,Apache 2.0,Jais Chat is released with the aim to stimulate research and development in the Arabic NLP community.,"Generating or endorsing hate speech, disseminating false information, engaging in illegal activities, managing sensitive data, attempting language generalization beyond Arabic and English, and making critical decisions with high stakes.",unknown,,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,unknown,unknown,Condor Galaxy Supercomputer from Cerebras,,,,,, -model,Bittensor Language Model,Cerebras,Bittensor Language Model is a 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama.,2023-07-24,https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/,,text; text,3B parameters (dense),,Evaluated on standard LLM benchmarks in comparison to similar-sized models.,['SlimPajama'],,,,open,Apache 2.0,,,unknown,https://huggingface.co/cerebras/btlm-3b-8k-base/discussions,https://huggingface.co/cerebras/btlm-3b-8k-base,unknown,unknown,unknown,,,,,, -dataset,SlimPajama,Cerebras,"As of release, SlimPajama is the largest extensively deduplicated, multi-corpora, open-source dataset for training large language models.",2023-06-09,https://huggingface.co/datasets/cerebras/SlimPajama-627B,https://huggingface.co/datasets/cerebras/SlimPajama-627B,text,627B tokens,[],,['RedPajama-Data'],,,,open,Apache 2.0,,,unknown,https://huggingface.co/datasets/cerebras/SlimPajama-627B/discussions,,,,,,,,,, -model,CodeGen,Salesforce,CodeGen is a language model for code,2022-03-25,https://arxiv.org/abs/2203.13474,,"code, text; code, text",16B parameters (dense),,,[],,,,open,"none (model weights), BSD-3-Clause (code)",,,,,,,,Unspecified Salesforce Compute (TPU-V4s),,,,,, -model,BLIP,Salesforce,,2022-01-28,https://arxiv.org/abs/2201.12086,,text; image,unknown,,,"['ViT-B', 'BERT', 'COCO', 'Visual Genome', 'Conceptual Captions', 'Conceptual 12M', 'SBU Captions', 'LAION-115M']",,,,open,BSD-3-Clause,,,,,,,,,,,,,, -dataset,LAION-115M,Salesforce,,2022-01-28,https://arxiv.org/abs/2201.12086,,"image, text",115M image-text pairs,[],,['LAION-400M'],,,,open,BSD-3-Clause,,,,,,,,,,,,,, -application,EinsteinGPT,Salesforce,EinsteinGPT is generative AI for customer relationship management (CRFM).,2023-03-07,https://www.salesforce.com/products/einstein/overview/?d=cta-body-promo-8,,,,,,['ChatGPT API'],,,,limited,unknown,,,,,,,,,,,,,, -model,BLIP-2,Salesforce,BLIP-2 is a model that employs a generic and efficient pretraining strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models.,2023-01-30,https://arxiv.org/pdf/2301.12597.pdf,,text; image,2.7B parameters (dense),,"BLIP-2 achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods",['OPT'],,,,open,MIT,,,,https://huggingface.co/Salesforce/blip2-opt-2.7b/discussions,https://huggingface.co/Salesforce/blip2-opt-2.7b,unknown,less than 9 days,unknown,,,,,, -model,Moirai,Salesforce,"Moirai is a cutting-edge time series foundation model, offering universal forecasting capabilities. It stands out as a versatile time series forecasting model capable of addressing diverse forecasting tasks across multiple domains, frequencies, and variables in a zero-shot manner.",2024-03-19,https://blog.salesforceairesearch.com/moirai/,,time-series; time-series,311M parameters,,Moirai has undergone a comprehensive evaluation in both in-distribution and out-of-distribution settings. It demonstrated its capabilities as a zero-shot forecaster and delivered competitive or superior performance compared to full-shot models.,['LOTSA'],,,The performance of Moirai was evaluated through in-distribution and out-of-distribution settings.,open,Apache 2.0,"Moirai can be used for time series forecasting in multiple domains. It offers robust zero-shot forecasting capabilities and eliminates the need for additional data, extensive computational resources, and expert input for achieving accurate forecasts.",unknown,unknown,https://huggingface.co/Salesforce/moirai-1.0-R-large/discussions,https://huggingface.co/Salesforce/moirai-1.0-R-large,unknown,unknown,NVIDIA A100 40G GPUs,,,,,, -dataset,LOTSA,Salesforce,LOTSA is the largest collection of open time series datasets with 27B observations across nine domains.,2024-03-19,https://arxiv.org/pdf/2402.02592.pdf,https://huggingface.co/datasets/Salesforce/lotsa_data,time-series,27B observations,[],,[],"domain observations in energy, transport, climate, cloudops, web, sales, nature, econ/finance, and healthcare",,,open,Apache 2.0,pre-training Large Time Series Models,,,https://huggingface.co/datasets/Salesforce/lotsa_data/discussions,,,,,,,,,, -dataset,Neeva dataset,Neeva,,,https://neeva.com/index,,text,unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,Neeva model,Neeva,,,https://neeva.com/index,,text; text,unknown,,,['Neeva dataset'],,,,closed,unknown,,,,,,,,,,,,,, -application,NeevaAI,Neeva,NeevaAI is an AI-powered search tool that combines the capabilities of LLMs with Neeva's independent in-house search stack to create a unique and transformative search experience.,2023-01-06,https://neeva.com/blog/introducing-neevaai,,,,,,['Neeva model'],,,,open,Custom,,,,,,,,,,,https://neeva.com/terms,,, -dataset,Jurassic-1 dataset,AI21 Labs,"The dataset used to train the Jurassic-1 models, based on publicly available data.",2021-08-11,https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf,,text,300B tokens,[],,[],unknown,unknown,,closed,unknown,unknown,,,,,,,,,,,,, -dataset,Jurassic-1 Instruct dataset,AI21 Labs,The dataset used to instruction-tune the Jurassic-1 Instruct models.,2022-12-01,,,text,unknown,[],unknown,[],unknown,unknown,unknown,closed,unknown,unknown,unknown,,,,,,,,,,,, -model,Jurassic-1,AI21 Labs,"Jurassic-1 is a family of autoregressive language models (Large, Grande, Jumbo).",2021-08-11,https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf,,text,178B parameters (dense),,"Evaluated on several standard benchmarks (e.g. ARC, BoolQ, HellaSwag, RTE, Winogrande)",['Jurassic-1 dataset'],,,,limited,unknown,unknown,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",,,,unknown,Several months,Over 800 A100 GPUs,,,,,, -model,Jurassic-1 Instruct,AI21 Labs,Jurassic-1 Instruct is an instruction-tuned autoregressive language model.,2022-12-01,https://docs.ai21.com/docs/jurassic-1-instruct-beta,,text; text,17B parameters (dense),,,"['Jurassic-1', 'Jurassic-1 Instruct dataset']",,,unknown,limited,unknown,"Jurassic-1 Instruct was trained specifically to handle instructions-only prompts (""zero-shot"") without examples (""few-shot""). It is the most natural way to interact with language models, and it is the best way to get a sense of the optimal output for your task without any examples.","Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",,,,unknown,unknown,unknown,,,,,, -model,Jurassic-2,AI21 Labs,Jurassic-2 is a family of language models designed to replace Jurassic-1.,2023-03-09,https://docs.ai21.com/docs/jurassic-2-models,,text; text,unknown,,The model was evaluated on the HELM benchmark as discussed in https://www.ai21.com/blog/introducing-j2.,[],,,unknown,limited,unknown,,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,,,unknown,unknown,unknown,,,,,, -application,AI21 Playground,AI21 Labs,The AI21 Labs Playground supports several task-specific APIs in addition to a variety of models.,2021-08-11,https://studio.ai21.com/playground/,,,,,,"['Jurassic-1', 'Jurassic-1 Instruct', 'Jurassic-2', 'AI21 Summarization API', 'AI21 Paraphrase API']",,,unknown,limited,,"The intended uses are text completion, rewriting, and summarization.","Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,Feedback can be given by emailing at info at ai21.com,,,,,The Playground provides direct access to the language models (Complete API) as well as wrapped for Rewrite and Summarize.,text,https://www.ai21.com/terms-of-use,unknown,unknown,unknown -application,AI21 Paraphrase API,AI21 Labs,AI21 Studio's Paraphrase API offers access to our world-class paraphrasing engine. It has been specifically developed for suggesting alternative ways to convey the same message using different words.,2023-03-09,https://docs.ai21.com/docs/paraphrase-api,,,,,,['Jurassic-2'],,,unknown,limited,,The intended uses are text paraphrasing.,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,Feedback can be given by emailing at info at ai21.com,,,,,The AI21 language models are further specialized to the task of paraphrasing.,text,https://www.ai21.com/terms-of-use,unknown,unknown,unknown -application,AI21 Summarization API,AI21 Labs,AI21 Studio's Summarize API offers access to our world-class summarization engine. It has been specifically developed for reading long texts and providing a faithful summary of the original document.,2023-03-09,https://docs.ai21.com/docs/summarize-api,,,,,,['Jurassic-2'],,,unknown,limited,,The intended uses are text paraphrasing.,"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.",unknown,Feedback can be given by emailing at info at ai21.com,,,,,The AI21 language models are further specialized to the task of summarization.,text,https://www.ai21.com/terms-of-use,unknown,unknown,unknown -application,Wordtune,AI21 Labs,"Wordtune, the first AI-based writing companion that understands context and meaning.",2020-10-27,https://www.wordtune.com/,,,,,,['AI21 Paraphrase API'],,,unknown,limited,Wordtune License,The Wordtune assistant is a writing assistant,,unknown,,,,,,unknown,text,https://www.wordtune.com/terms-of-use,unknown,unknown,unknown -application,Wordtune Read,AI21 Labs,"Wordtune Read is an AI reader that summarizes long documents so you can understand more, faster.",2021-11-16,https://www.wordtune.com/read,,,,,,['AI21 Summarize API'],,,unknown,limited,Wordtune License,,,unknown,,,,,,unknown,text,https://www.wordtune.com/terms-of-use,unknown,unknown,unknown -model,Jamba,AI21 Labs,"Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. Jamba is the world’s first production-grade Mamba based model.",2024-03-28,https://www.ai21.com/blog/announcing-jamba,,text; text,52B parameters (sparse),,Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.,[],,,,open,Apache 2.0,"intended for use as a foundation layer for fine tuning, training",,,https://huggingface.co/ai21labs/Jamba-v0.1/discussions,https://huggingface.co/ai21labs/Jamba-v0.1,unknown,unknown,unknown,,,,,, -model,MPT,Mosaic,MPT is a series of large language models seeking to address the limitations of other open source models like LLaMA and Pythia.,2023-05-05,https://www.mosaicml.com/blog/mpt-7b,,text; text,7B parameters (dense),,Evaluated on a range of benchmarks and performed on par with LLaMA-7B.,"['RedPajama-Data', 'C4', 'The Stack', 'Multimodal C4']",,,,open,Apache 2.0,,,,,,unknown,9.5 days,440 A100 40GB GPUs,,,,,, -model,CommonCanvas,"Cornell University, Mosaic",CommonCanvas is a text-to-image model trained solely on Creative Commons licensed images.,2023-10-25,https://arxiv.org/pdf/2310.16825.pdf,,text; image,unknown,,"Compared to Stable Diffusion 2, a SOTA text-to-image model.",['CommonCatalog'],,,,open,Apache 2.0,,,,,,unknown,6.79 days,128 A100 NVIDIA GPUs,,,,,, -dataset,CommonCatalog,Mosaic,CommonCatalog is a curated dataset of CommonCrawl images and synthetic captions.,2023-10-25,https://arxiv.org/pdf/2310.16825.pdf,https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md,image-caption pairings,70M images,[],,"['YFCC100M', 'BLIP-2']",images with derivative licenses,images with non-derivative licenses,,open,Apache 2.0,,,,,,,,,,,,,, -application,AI Dungeon,Latitude,"AI Dungeon is a single-player text adventure game that uses AI to generate content. -",2019-12-17,https://play.aidungeon.io,,,,,,['OpenAI API'],,,,limited,custom,,,,,,,,,,,https://play.aidungeon.io/main/termsOfService,,, -dataset,Conformer-1 dataset,AssemblyAI,The dataset used to train AssemblyAI's Conformer-1 model.,2023-03-15,https://www.assemblyai.com/blog/conformer-1/,,"audio, text",650K hours audio (60TB),[],,[],"""our team curated a dataset of 650K hours of English audio - consisting of proprietary internal datasets and various sources from the internet"" -",,,closed,unknown,,,,,,,,,,,,,, -model,Conformer-1,AssemblyAI,"Conformer-1 is a state-of-the-art speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data, making up to 43% fewer errors on noisy data than other ASR models.",2023-03-15,https://www.assemblyai.com/blog/conformer-1/,,"audio, text; audio, text",300M parameters (dense),,"In order to evaluate the accuracy and robustness of Conformer-1, we sourced 60+ hours of human labeled audio data covering popular speech domains such as call centers, podcasts, broadcasts, and webinars. We then calculated the Word Error Rate (WER) of Conformer-1 against these datasets, and compared the results against Whisper and a number of other ASR models. To ground our results against popular open source speech recognition benchmarks, we also performed the same WER analysis against a number of academic datasets.",['Conformer-1 dataset'],,,,limited,unknown,,,,,,,,,,,,,, -application,AssemblyAI,AssemblyAI,AssemblyAI uses Claude and Anthropic's model to transcribe and understand audio data at scale.,2022,https://www.assemblyai.com/,,,,,,['Anthropic API'],,,,limited,custom,,,,,,,,,,,https://www.assemblyai.com/legal/terms-of-service,,, -application,Conformer-1 API,AssemblyAI,API to access the AssemblyAI's Conformer-1 model.,2023-03-15,https://www.assemblyai.com/blog/conformer-1/,,,,,,['Conformer-1'],,,,open,custom,Speech recognition,,,,,,,,,,https://www.assemblyai.com/legal/terms-of-service,,, -model,Xwin-LM,Xwin,"Xwin-LM is a LLM, which on release, ranked top 1 on AlpacaEval, becoming the first to surpass GPT-4 on this benchmark.",2023-09-20,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1,,text; text,70B parameters (dense),,Evaluated on AlpacaEval benchmark against SOTA LLMs.,[],,,,open,LLaMA2,,,,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1,unknown,unknown,unknown,,,,,, -model,JARVIS-1,Peking University Institute for Artificial Intelligence,"JARVIS-1 is an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe.",2023-11-10,https://arxiv.org/pdf/2311.05997.pdf,,text; in-game actions,unknown,,"Compared with other multi-task, instruction-following agents.",[],,,,open,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,MAmmoTH,Ohio State University,MAmmoTH is a series of open-source large language models (LLMs) specifically tailored for general math problem-solving.,2023-09-11,https://arxiv.org/pdf/2309.05653.pdf,,text; text,34B parameters (dense),,"Evaluated on MATH, a competition-level dataset, and achieves a 46% accuracy, higher than accuracy produced by GPT-4's chain of thought.","['MathInstruct', 'LLaMA', 'Code LLaMA']",,,,open,MIT,,,,,,,,,,,,,, -model,A.X,SK Telecom,"A.X is SK Telecom's proprietary LLM, which has been trained on the Korean language.",2023-09-26,https://www.sktelecom.com/en/press/press_detail.do?idx=1582,,text; text,39B parameters,,,[],,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,Yi,01 AI,The Yi series models are large language models trained from scratch by developers at 01 AI.,2023-11-02,https://github.com/01-ai/Yi,,text; text,34B parameters (dense),,"Evaluated on standard language benchmarks, common sense reasoning, and reading comprehension in comparison to SoTA LLMs.",[],,,"Model underwent supervised fine-tuning, leading to a greater diversity of responses.",open,custom,,,unknown,https://huggingface.co/01-ai/Yi-34B/discussions,https://huggingface.co/01-ai/Yi-34B,unknown,unknown,unknown,,,,,, -model,Yi-VL,01 AI,"The Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.",2024-01-23,https://github.com/01-ai/Yi,,text; text,34B parameters (dense),,"Yi-VL outperforms all existing open-source models in MMMU and CMMMU, two advanced benchmarks that include massive multi-discipline multimodal questions (based on data available up to January 2024).",[],,,unknown,open,custom,,,unknown,https://huggingface.co/01-ai/Yi-VL-34B/discussions,https://huggingface.co/01-ai/Yi-VL-34B,unknown,10 days,128 NVIDIA A800 (80G) GPUs,,,,,, -dataset,HowTo100M,"École Normale Supérieure, Inria","HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of 136M video clips with captions sourced from 1.2M Youtube videos (15 years of video) and 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness.",2019-06-07,https://arxiv.org/pdf/1906.03327.pdf,https://arxiv.org/pdf/1906.03327.pdf,"text, video",136M video clips,[],"Authors use the dataset to learn a joint text-video embedding by leveraging more than 130M video clip-caption pairs. They then evaluate the learned embeddings on the tasks of localizing steps in instructional videos of CrossTask and textbased video retrieval on YouCook2, MSR-VTT and LSMDC datasets. They show that their learned embedding can perform better compared to models trained on existing carefully annotated but smaller video description datasets.",['YouTube'],"The dataset features 1.22 million videos from YouTube with a primary focus on videos containing ""visual tasks"", that involve some interaction with the physical world (e.g. Making peanut butter, Pruning a tree) as compared to others that are more abstract (e.g. Ending a toxic relationship, Choosing a gift). To obtain predominantly visual tasks, the authors limit them to one of 12 categories (Food and Entertaining, Home and Garden, Hobbies and Crafts, Cars & Other Vehicles, Pets and Animals, Holidays and Traditions, Personal Care and Style, Sports and Fitness, Health, Education and Communications, Arts and Entertainment, Computers and Electronics). They also restrict to the top 200 YouTube search results, as the latter ones may not be related to the query task.","Categories such as Relationships and Finance and Business, that may be more abstract, are excluded. Videos with less than 100 views are removed. Authors also ignore videos that have less than 100 words. Videos longer than 2,000 seconds are removed. As some videos may appear in several tasks, the videos are deduplicated based on YouTube IDs.",,open,Apache 2.0,,"No uses are explicitly prohibited by the authors. They note the following limitations of the dataset: ""We note that the distribution of identities and activities in the HowTo100M dataset may not be representative of the global human population and the diversity in society. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data."" -",,,,,,,,,,,, -model,Lemur,OpenLemur,Lemur is an openly accessible language model optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.,2023-10-10,https://arxiv.org/pdf/2310.06830.pdf,,"code, text; code, text",70B parameters (dense),,Evaluated on text and code benchmarks in comparison to other models.,"['LLaMA 2', 'The Stack', 'RefinedWeb', 'RedPajama', 'Common Crawl', 'Wikipedia', 'ArXiv']",,,,open,LLaMA2,,,,https://huggingface.co/OpenLemur/lemur-70b-v1/discussions,https://huggingface.co/OpenLemur/lemur-70b-v1,unknown,unknown,TPUv4-512 pod,,,,,, -model,Lemur-Chat,OpenLemur,Lemur-Chat is an openly accessible language model optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.,2023-10-10,https://arxiv.org/pdf/2310.06830.pdf,,text; text,70B parameters (dense),,Evaluated on text and code benchmarks in comparison to other models.,"['Lemur', 'OpenAssistant 1', 'OpenOrca', 'ShareGPT & ChatLogs', 'Evol-CodeAlpaca data']",,,,open,CC-BY-NC-4.0,,,,https://huggingface.co/OpenLemur/lemur-70b-chat-v1/discussions,https://huggingface.co/OpenLemur/lemur-70b-chat-v1,unknown,unknown,unknown,,,,,, -model,ACT-1,Adept,ACT-1 (ACtion Transformer) is a large-scale transformer model designed and trained specifically for taking actions on computers (use software tools APIs and websites) in response to the user's natural language commands.,2022-09-14,https://www.adept.ai/blog/act-1,,text; text,,,,[],,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,Persimmon,Adept,"Persimmon is the most capable open-source, fully permissive model with fewer than 10 billion parameters, as of its release date.",2023-09-07,https://www.adept.ai/blog/persimmon-8b,,text; text,8B parameters (dense),,"Evaluated in comparison to LLaMA 2 and MPT Instruct, and outperforms both on standard benchmarks.",[],,,,open,Apache 2.0,,,,,,,,,,,,,, -model,Fuyu,Adept,Fuyu is a small version of the multimodal model that powers Adept's core product.,2023-10-17,https://www.adept.ai/blog/fuyu-8b,,"image, text; text",8B parameters (dense),,Evaluated on standard image understanding benchmarks.,[],,,,open,CC-BY-NC-4.0,The model is intended for research purposes only.,"The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.",,https://huggingface.co/adept/fuyu-8b/discussions,https://huggingface.co/adept/fuyu-8b,unknown,unknown,unknown,,,,,, -model,Fuyu Heavy,Adept,Fuyu Heavy is a new multimodal model designed specifically for digital agents.,2024-01-24,https://www.adept.ai/blog/adept-fuyu-heavy,,"image, text; text",unknown,,"Evaluated on the MMLU, GSM8K, MATH, and HumanEval benchmarks. According to these benchmarks, Fuyu-Heavy is, as of release, the strongest multimodal model trained outside of Google or OpenAI.",[],,,,closed,unknown,unknown,,,,,unknown,unknown,unknown,,,,,, -model,CPM Bee,OpenBMB,"CPM-Bee is a fully open-source, commercially-usable Chinese-English bilingual base model with a capacity of ten billion parameters.",2023-05-27,https://github.com/OpenBMB/CPM-Bee,,text; text,10B parameters (dense),,Evaluated on English and Chinese language benchmarks.,[],,,,open,custom,You can use the raw model for many NLP tasks like text generation or fine-tune it to a downstream task.,,unknown,https://huggingface.co/openbmb/cpm-bee-10b/discussions,https://huggingface.co/openbmb/cpm-bee-10b,unknown,unknown,unknown,,,,,, -dataset,UltraFeedback,OpenBMB,"UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models.",2023-09-26,https://github.com/OpenBMB/UltraFeedback,https://huggingface.co/datasets/openbmb/UltraFeedback,text,256k samples,[],Randomly chosen models trained on UltraFeedback evaluated across standard benchmarks.,[],,,,open,MIT,,,unknown,https://huggingface.co/datasets/openbmb/UltraFeedback/discussions,,,,,,,,,, -model,MiniCPM,OpenBMB,"MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings (2.7B in total).",2024-02-01,https://github.com/OpenBMB/MiniCPM/,,text; text,2.4B parameters (dense),,Evaluated on open-sourced general benchmarks in comparison to SotA LLMs.,[],,,,open,custom,,,unknown,https://huggingface.co/openbmb/MiniCPM-V/discussions,https://huggingface.co/openbmb/MiniCPM-V,unknown,unknown,unknown,,,,,, -model,Eurus,OpenBMB,Eurus is a suite of large language models (LLMs) optimized for reasoning.,2024-04-02,https://arxiv.org/abs/2404.02078,,text; text,70B parameters,,The model was comprehensively benchmarked across 12 tests covering five tasks. Eurus achieved the best overall performance among open-source models of similar sizes and even outperformed specialized models in many cases.,"['Eurus SFT', 'UltraInteract', 'UltraFeedback']",,,,open,Apache 2.0,The model can be used for reasoning tasks and is especially tailored for coding and math following specific prompts.,,unknown,https://huggingface.co/openbmb/Eurus-70b-nca/discussions,https://huggingface.co/openbmb/Eurus-70b-nca,unknown,unknown,unknown,,,,,, -dataset,10k_prompts_ranked,Data is Better Together,"10k_prompts_ranked is a dataset of prompts with quality rankings created by 314 members of the open-source ML community using Argilla, an open-source tool to label data.",2024-02-27,https://huggingface.co/blog/community-datasets,https://huggingface.co/datasets/DIBT/10k_prompts_ranked,text,10k examples,[],,[],,,,open,unknown,Training and evaluating language models on prompt ranking tasks and as a dataset that can be filtered only to include high-quality prompts. These can serve as seed data for generating synthetic prompts and generations.,"This dataset only contains rankings for prompts, not prompt/response pairs so it is not suitable for direct use for supervised fine-tuning of language models.",,https://huggingface.co/datasets/DIBT/10k_prompts_ranked/discussions,,,,,,,,,, -model,ESM-2,Meta,ESM-2 is a series of protein language models trained on protein sequences,2022-10-31,https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2.full.pdf+html,,text; protein sequence,15B parameters (dense),,,"['UniRef50', 'UniRef90']",,,,open,MIT,,,,,,,,,,,,,, -dataset,PMD,Meta,PMD (Public Multimodal Datasets) is a collection of image-text datasets introduced in the FLAVA work.,2021-12-08,https://arxiv.org/abs/2112.04482,,"image, text",70M,[],,"['COCO', 'YFCC100M', 'SBU Captions', 'Localized Narratives', 'Visual Genome', 'Wikipedia', 'Conceptual Captions', 'Red Caps']",,YFCC100M is filtered for non-English captions and very short (< 2 word) captions.,"Beyond filtering mentioned in excluded, nothing further is done.",closed,unknown,unknown,unknown,,,,,,,,,,,, -model,FLAVA,Meta,"FLAVA is a multimodal model composed of an image encoder, text encoder, and multimodal encoder.",2021-12-08,https://arxiv.org/abs/2112.04482,,"image, text",306M,,"FLAVA is benchmarked on a range of vision-only (e.g. CIFAR-10), language-only (e.g. GLUE), and multimodal (e.g. Hateful Memes) standard evaluations.",['PMD'],,,"FLAVA introduces a variety of new modeling techniques, specifically with an interest in improved text-image alignment through contrastive objectives.",open,BSD-3-Clause,"Per the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full), ""The model is intended to serve as a reproducible research artifact for research communities in the light of models whose exact reproduction details are never released such as CLIP and SimVLM."" +",unknown,The API exposes the models fairly direclty with a range of hyperparameters (e.g. temperature scaling).,"Given a prompting text, the OpenAI API provides access to text completions, and log probabilities. The support for text and code embeddings were added on 2022-01-25 [[OpenAI Blog Post]] (https://openai.com/blog/introducing-text-and-code-embeddings/). +",https://openai.com/api/policies/terms/,unknown,unknown,unknown,,,, +model,VPT,OpenAI,,2022-06-23,https://arxiv.org/abs/2206.11795,,video; video,,500M parameters (dense),['web_clean'],,,,,open,MIT,,,,,,,,,,,,,, +dataset,web_clean,OpenAI,,2022-06-23,https://arxiv.org/abs/2206.11795,,video,,70k hours,[],,,,,closed,unknown,,,,,,,,,,,,[],, +application,ChatGPT,OpenAI,ChatGPT is an artificial intelligence chatbot developed by OpenAI.,2022-11-30,https://openai.com/blog/chatgpt,,,,,"['gpt-3.5-turbo', 'OpenAI toxicity classifier']",,,,,limited,custom,,,,,,,,100M,,,,,, +model,gpt-3.5-turbo,OpenAI,,2023-03-01,https://platform.openai.com/docs/models/gpt-3-5,,text; text,,unknown,['gpt-3.5-turbo dataset'],,,,,limited,custom,,,,,,,,,,,,,, +model,GPT-4 Turbo,OpenAI,GPT-4 Turbo is a more capable version of GPT-4 and has knowledge of world events up to April 2023. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt.,2023-11-06,https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,,text; text,,unknown,[],unknown,unknown,unknown,,limited,custom,,,unknown,,,,,,,,,,, +dataset,gpt-3.5-turbo dataset,OpenAI,,2023-03-01,https://platform.openai.com/docs/models/gpt-3-5,,text,,unknown,[],,,,,limited,unknown,,,,,,,,,,,,[],, +dataset,code-davinci-002 dataset,OpenAI,,2022-05-01,https://platform.openai.com/docs/model-index-for-researchers,,"code, text",,unknown,[],,,,,limited,unknown,,,,,,,,,,,,[],, +model,code-davinci-002,OpenAI,,2022-05-01,https://platform.openai.com/docs/model-index-for-researchers,,"text; code, text",,unknown,['code-davinci-002 dataset'],,,,,limited,unknown,,,,,,,,,,,,,, +model,text-davinci-002,OpenAI,,2022-05-01,https://platform.openai.com/docs/model-index-for-researchers,,text; text,,unknown,['code-davinci-002'],,,,,limited,unknown,,,,,,,,,,,,,, +model,text-davinci-003,OpenAI,,2022-11-30,https://platform.openai.com/docs/model-index-for-researchers,,text; text,,unknown,['text-davinci-002'],,,,,limited,unknown,,,,,,,,,,,,,, +application,Whisper API,OpenAI,API to query OpenAI's Whisper model.,2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,['Whisper'],,,,,open,custom,,,,,,,,,,,,,, +application,ChatGPT API,OpenAI,API to query OpenAI's ChatGPT model.,2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,['ChatGPT'],,,,,open,custom,,,,,,,,,,,,,, +application,OpenAI Moderation API,OpenAI,This endpoint provides OpenAI API developers with free access to GPT-based classifiers that detect undesired content—an instance of using AI systems to assist with human supervision of these systems.,2022-08-10,https://openai.com/blog/new-and-improved-content-moderation-tooling,,,,,['OpenAI toxicity classifier'],,,,,open,custom,,,,,,,,,,,,,, +model,OpenAI toxicity classifier,OpenAI,,2023-01-18,"https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.",,text; text,,unknown,['OpenAI toxicity dataset'],,,,,closed,unknown,,,,,,,,,,,,,, +dataset,OpenAI toxicity dataset,OpenAI,,2023-01-18,"https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.",,text,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +application,Sage API,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,,,,['Sage'],,,,,limited,unknown,,,,,,,,,,,,,, +application,Dragonfly API,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,,,,['Dragonfly'],,,,,limited,unknown,,,,,,,,,,,,,, +model,Sage,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,text; text,,unknown,[],,,,,limited,unknown,,,,,,,,,,,,,, +model,Dragonfly,OpenAI,A chatbot language model available via Quora's Poe,2023-02-03,https://quorablog.quora.com/Poe-1,,text; text,,unknown,[],,,,,limited,unknown,,,,,,,,,,,,,, +application,ChatGPT for Slack,"OpenAI, Salesforce","The app integrates ChatGPT’s powerful AI technology to deliver instant conversation summaries, research tools, and writing assistance directly in Slack to help millions of companies work more productively.",2023-03-07,https://www.salesforce.com/news/stories/chatgpt-app-for-slack/,,,,,['ChatGPT API'],,,,,limited,unknown,,,,,,,,,,,,,, +model,GPT-4,OpenAI,,2023-03-14,https://arxiv.org/abs/2303.08774,,"image, text; image, text",,unknown,[],,,,,limited,unknown,,,,,,,,,,,,,, +application,GPT-4 API,OpenAI,"GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses",2023-03-14,https://openai.com/product/gpt-4,,,,,['GPT-4'],,,,,limited,custom,,,,,,,,,,,,,, +application,ChatGPT Enterprise,OpenAI,"ChatGPT Enterprise offers enterprise-grade security and privacy, unlimited higher-speed GPT-4 access, longer context windows for processing longer inputs, advanced data analysis capabilities, and customization options compared to OpenAI's previous offerings.",2023-08-28,https://openai.com/enterprise,,,,,['GPT-4'],,,,,limited,custom,,,,,,,https://openai.com/policies/terms-of-use,,,,,,, +model,DALL·E 3,OpenAI,"DALL·E 3 is an artificial intelligence model that takes a text prompt and/or existing image as an input and generates a new image as an output The model is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October.",2023-09-20,https://openai.com/dall-e-3,,text; image,The model is capable of generating explicit content and the researchers found limited amount of spurious content generated.,unknown,"['DALL·E 2 dataset', 'CLIP dataset', 'ChatGPT']",unknown,unknown,unknown,"DALL·E 3 has mitigations to decline requests that ask for a public figure by name. We improved safety performance in risk areas like generation of public figures and harmful biases related to visual over/under-representation, in partnership with red teamers—domain experts who stress-test the model—to help inform our risk assessment and mitigation efforts in areas like propaganda and misinformation.",limited,custom,"The intended use of the DALL·E 3 Preview at this time is for personal, non-commercial exploration and research purposes by people who are interested in understanding the potential uses of these capabilities","Use of the model is governed by the OpenAI Content Policy, which prohibits posting of G rated content. Users are not allowed to utilize the model in commercial products in the preview version.","Uses of the model are monitored. In the preview version, any user can flag content. The specific policies for monitoring are not disclosed, but possible measures include disabling of accounts violating the content",Feedback can be provided at openai.com,,,,,,,,,, +model,Sora,OpenAI,Sora is an AI model that can create realistic and imaginative scenes from text instructions.,2024-02-15,https://openai.com/sora,,"text; image, video",,unknown,[],unknown,unknown,unknown,,limited,unknown,,,unknown,,,,,,,,,,, +model,GPT-4o,OpenAI,"GPT-4o is OpenAI's new flagship model, as of release, that can reason across audio, vision, and text in real time.",2024-05-13,https://openai.com/index/hello-gpt-4o/,,"audio, image, text, video; audio, image, text","When evaluated on standard performance benchmarks, achieves similar levels of performance to GPT-4 Turbo.",unknown,[],unknown,unknown,unknown,Training data filtering and post-training refinement act as additional guardrails for preventing harmful outputs.,limited,unknown,,,"Internal monitoring of risk for non-text outputs before a public release (currently only image, text inputs and text outputs are available).",,,,,,,,,,, +application,Portkey,Portkey,Portkey is a hosted middleware that allows users to create generative AI applications,2023-05-06,https://portkey.ai/,,,,,[],,,,,open,,,,,,,generative AI apps,https://portkey.ai/terms,,,,,,, +dataset,C4,Google,The Colossal Clean Crawled Corpus (C4) is a processed version of Common Crawl to facilitate transfer learning in NLP.,2019-10-23,https://arxiv.org/abs/1910.10683,,text,https://arxiv.org/abs/2104.08758,750GB,['CommonCrawl'],,,,"Data filtering excluded obscene words from a block list as well as short documents and some deduplication was done based on string overlap. +",open,ODC-By 1.0,To faciliate transfer learning research in NLP.,,,https://huggingface.co/datasets/c4/discussions,,,,,,,https://huggingface.co/datasets/c4,['https://huggingface.co/datasets/c4/viewer/en/train'],,"Data was filtered for English using langdetect. Further, data was filtered to end in terminal punctuation, to remove short pages (less than 5 sentences), and to remove ""Dirty, Naughty, Obscene or Otherwise Bad Words"". +" +dataset,Internal Google BERT dataset,Google,"The dataset used to train Internal Google BERT models. +",2019-11-25,https://blog.google/products/search/search-language-understanding-bert/,,text,unknown,unknown,[],,,,unknown,closed,unknown,unknown,unknown,unknown,unknown,,,,,,,,[],"Web pages, and search queries",unknown +dataset,Conceptual Captions,Google,"A dataset containing 3 million (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems. +",2018-07-01,https://aclanthology.org/P18-1238/,,"image, text","Authors evaluate the dataset on two image captioning models - RNN-based model and Transformer model, under two experimental conditions - using the training & development sets provided by the MS COCO dataset, versus training & development sets using the Conceptual dataset. They use three different test sets- the blind COCO-C40 test set, the Conceptual test set and the Flickr 1K test set. They present both Human and Automatic evaluation results. Human evaluations indicate that the Conceptual-based models are superior. Automatic models fail to corroborate the human evaluation results. This highlights the weakness of these automatic metrics.","3.3M (image, text) pairs",[],,,,"Input candidate (image, caption) pairs pass through several stages of filtering and processing to ensure quality.",open,Conceptual Captions License,,unknown,unknown,Feedback can be provided by creating an issue in the [[Conceptual Captions GitHub repository]](https://github.com/google-research-datasets/conceptual-captions) or by emailing at conceptual-captions at google.com,,,,,,,,[],,"The following filtering steps are applied in the given order: +1. Image-based Filtering - ""It only keeps JPEG images where both dimensions are greater than 400 pixels, and the ratio of larger to smaller dimension is no more than 2. It excludes images that trigger pornography or profanity detectors. These filters discard more than 65% of the candidates."" +2. Text-based Filtering - ""Candidates with no determiner, no noun, or no preposition are discarded; candidates with a high noun ratio are also discarded; candidates with a high rate of token repetition are discarded; candidates where the first word is not capitalized, or with too high capitalized-word ratio are discarded; we use a vocabulary VW of 1B token types, appearing at least 5 times in the English Wikipedia, and discard candidates that contain tokens that are not found in this vocabulary. candidates that score too high or too low on the polarity annotations, or trigger the pornography/profanity detectors, are discarded; predefined boiler-plate prefix/suffix sequences matching the text are cropped, e.g. “click to enlarge picture”, “stock photo”; we also drop text which begins/ends in certain patterns, e.g. “embedded image permalink”, “profile photo”. These filters only allow around 3% of the incoming candidates to pass to the later stages."" +3. Image&Text-based Filtering - ""We filter out candidates for which none of the text tokens can be mapped to the content of the image. This filter discards around 60% of the incoming candidates."" +4. Text Transformation with Hypernymization - ""Noun modifiers of certain types (proper nouns, numbers, units) are removed; dates, durations, and preposition-based locations (e.g., ""in Los Angeles"") are removed; named-entities are identified, matched against the KG entries, and substitute with their hypernym; resulting coordination noun-phrases with the same head (e.g., ""actor and actor"") are resolved into a single-head, pluralized form (e.g., ""actors""). Around 20% of samples are discarded during this transformation. We then cluster all resolved entities (e.g., 2560 ""actor"", ""dog"", ""neighborhood"", etc.) and keep only the candidates for which all detected types have a count of over 100 (around 55% of the candidates)."" +" +dataset,Conceptual 12M,Google,"A dataset with 12 million image-text pairs specifically meant to be used for vision-and-language pre-training. +",2021-02-17,https://arxiv.org/pdf/2102.08981.pdf,,"image, text","The dataset is benchmarked against CC3M on two most fundamental V+L tasks: vision-to-language generation and vision-and-language matching, with an emphasis on long-tail visual recognition. The results illustrate the benefit of scaling up pre-training data for vision-and-language tasks, as indicated by the new state-of-the-art results on both the nocaps and Conceptual Captions benchmarks. +","12M (image, text) pairs",[],,,,"Input candidate (image, caption) pairs pass through several stages of filtering and processing to ensure quality. Person-name substitutions are performed in the alt-texts to protect the privacy of individuals in the associated images.",open,Conceptual Captions License,,unknown,unknown,Feedback can be provided by creating an issue in the [[Conceptual 12M GitHub repository]](https://github.com/google-research-datasets/conceptual-12m) or by emailing at conceptual-captions at google.com,,,,,,,,[],,"Some of the filtering steps used in the preparation of Conceptual Captions dataset are relaxed to trade off high-recall for low-precision. The following steps are applied in the given order: +1. Image-based Filtering - Only keep JPEG images where both dimensions are greater than 400 pixels, and the ratio of larger to smaller dimension is no more than 2.5. Exclude images that trigger pornography or profanity detectors. +2. Text-based Filtering - Allow text between 3 and 256 words in the alt-text. Discard candidates with no noun or no determiner, but permit ones without prepositions. Set the maximum fraction of word repetition allowed to 0.2. Increase the threshold for counting a word type as rare from 5 to 20. +3. Image&Text-based Filtering - Filter out candidates for which none of the text tokens can be mapped to the content of the image. +" +model,T5,Google,Text-To-Text Transfer Transformer (T5) is a model that unifies all NLP tasks under the text-to-text format.,2019-10-23,https://arxiv.org/abs/1910.10683,https://huggingface.co/t5-base,text; text,https://huggingface.co/t5-base#evaluation,11B parameters (dense),['C4'],unknown,unknown,"1,024 TPU v3 chips (Cloud TPU Pods)",The T5 paper documents many analyses/ablations that were considered before arriving at the final architecture/training procedure.,open,Apache 2.0,NLP tasks,unknown,,https://huggingface.co/t5-large/discussions,,,,,,,,,, +model,Internal Google BERT,Google,"Internal Google BERT model used to power Google Search products. +",2019-11-25,https://blog.google/products/search/search-language-understanding-bert/,unknown,text; text,unknown,unknown,['Internal Google BERT dataset'],unknown,unknown,unknown,unknown,closed,unknown,unknown,unknown,unknown,unknown,,,,,,,,,, +application,Google Search,Google,"Google Search is Google's search engine. +",2019-11-25,https://blog.google/products/search/search-language-understanding-bert/,,,,,"['Internal Google BERT', 'MUM']",,,,unknown,open,,"Searching the web using text, voice or image","Prohibited use cases aren't specifically spelled out for Google search, but several illegal and discouraged use cases are shared in the Respect Others section of the [[Term of Service]](https://policies.google.com/terms). +","It is implied that Google scan uses of its products for spam, malware and illegal content in the [[Term of Service]](https://policies.google.com/terms). +","Feedback can be sent to Google Feedback using the product interface [[Google Feedback]](https://www.google.com/tools/feedback). +",unknown,web page ranking,https://policies.google.com/terms,unknown,unknown,unknown,,,, +dataset,Infiniset,Google,"Infiniset ""is a combination of dialog data from public dialog data and other public web documents"" [[Appendix E]](https://arxiv.org/pdf/2201.08239.pdf#appendix.E). +",2021-06-18,https://arxiv.org/pdf/2201.08239.pdf,,"code, text",unknown,unknown,[],,,,unknown,closed,unknown,unknown,"The prohibited uses for Infiniset weren't specifically listed, but the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1) advises avoiding harm, unjust impact and misinformation, among others. +",unknown,,,,,,,,,[],"Included in the dataset are data from ""public forums (0%); C4 data (12.5% ); code documents from sites related to programming like Q&A sites tutorials, etc (12.5%); Wikipedia (English) (12.5%); English web documents (6.25%); and Non-English web documents (6.25%)."" +",unknown +model,LaMDA,Google,"LaMDA stands for Language Models for Dialog Application. It is a transformer based language model trained on dialogue data. +",2021-06-18,https://arxiv.org/pdf/2201.08239.pdf,,text; text,"The model performance was analyzed on sensibleness, specificity and interestingness. The model was also analyzed on safety, following metrics derived from Google AI Principles [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1). Finally, the model was analyzed on groundedness, testing its ability to produce responses that can be associated with ""known sources whenever possible [[Section 4.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.4.1)."" +",137B parameters (dense),['Infiniset'],26 tCO2e,4108.80 petaflop/s-day,1024 TPU-V3 chips,"LaMDA was fine-tuned to predict sensibleness, specificity and interestingness as well as safety. Then, the candidates were filtered out if the model safety predictions were below a certain threshold. The next candidates in the conversation were selected as a combination of these predictions. The model was also fine-tuned for groundedness. The results are shown in [[Figure 5]](https://arxiv.org/pdf/2201.08239.pdf#figure.caption.23). +",closed,unknown,"LaMDA is a language model, so it can be used for regular langauge modelling tasks without fine-tuning, but its fine-tuned for dialogue tasks. +","The prohibited uses of LaMDA weren't specifically listed, but the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1) advises avoiding harm, unjust impact and misinformation, among others. +",unknown,,,,,,,,,,, +dataset,PaLM dataset,Google,"PaLM dataset ""was created for pre-training language models"" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D). +",2022-04-04,https://arxiv.org/pdf/2204.02311.pdf,,"code, text",unknown,3.92 TB,['Infiniset'],,,,"In order to reduce low quality web pages, the web pages were sampled according to a ""quality score"" classifier. Code files were de-duplicated using Levenshtein distance [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3). +",closed,unknown,"""The dataset was created for pre-training language models by a team of researchers at Google"". +","""... should not be used for any of the unacceptable language model use cases, e.g., generation of toxic speech"" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D). +",unknown,Contact the authors.,,,,,,,https://arxiv.org/pdf/2204.02311.pdf#appendix.D,[],"The dataset is based on Infiniset. It included multilingual text containing text from over 100 languages. The breakdown of the data included is as follows: Social media conversations (multilingual) 50, Filtered webpages (multilingual) 27%, BooksCorpus (English) 13%, GitHub (code) 5%, Wikipedia (multilingual) 4%, and News (English) 1%. Code was collected from GitHub repositories with appropriate licenses, totalling 96GB of source code [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3). +","GitHub repositories with copyleft licenses were excluded. Programming languageges other than the most common 24 were excluded [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3). +" +model,Flan-T5,Google,Flan-T5 is a version of the T5 language model fine-tuned on instruction data,2022-10-20,https://arxiv.org/abs/2210.11416,https://arxiv.org/pdf/2210.11416.pdf,text; text,Evaluated on a variety of standard language datasets.,11B parameters (dense),"['T5', 'Muffin', 'P3', 'NaturalInstructions-v2', 'Flan CoT']",Unknown,Unknown,512 v4 TPU Chips,"Across different multitask datasets, templates and formatting were maintained. For the chain-of-thoughts (CoT) data, specific exemplars were used.",open,Apache 2.0,unknown,,,https://huggingface.co/google/flan-t5-xxl/discussions,,,,,,,,,, +model,UL2,Google,UL2 is a language model trained with a new pretraining objective,2022-05-10,https://arxiv.org/abs/2205.05131,,text; text,,20B parameters (dense),['C4'],,,128 TPUv4,,open,Apache 2.0,,,,,,,,,,,,,, +model,Parti,Google,Parti is a text-to-image diffusion model,2022-06-22,https://parti.research.google/,,text; image,,20B parameters (dense),"['C4', 'LAION-400M', 'FIT400M', 'JFT-4B']",,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +model,Imagen,Google,Imagen is a text-to-image diffusion model,2022-05-23,https://imagen.research.google/,,text; image,,14B parameters (dense),"['LAION-400M', 'Google internal image-text dataset']",unknown,unknown,128 TPU-v4,,open,unknown,,,,,,,,,,,,,, +model,VATT,Google,VATT is a family of models trained on multimodal data,2022-04-22,https://arxiv.org/abs/2104.11178,,"text; audio, video",,155M parameters (dense),"['AudioSet', 'HowTo100M']",unknown,3 days,256 TPU-v3,,open,Apache 2.0,,,,,,,,,,,,,, +model,PaLM,Google,"PaLM stands Pathways Language Model, ""dense decoder-only Transformer model trained with the Pathways system"" [[Google ai Blog]](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html). +",2022-04-04,https://arxiv.org/pdf/2204.02311.pdf,https://arxiv.org/pdf/2204.02311.pdf#appendix.E,"text; text, code","""PaLM is evaluated on English Natural Language Processing (NLP) tasks, tasks from BIG-bench, reasoning tasks, code completion tasks, multilingual generation and question answering tasks, translation tasks, and bias and toxicity benchmarks"" [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E). +",540B parameters (dense),['PaLM dataset'],271.43 tCO2,29600 petaflop/s-days,6144 TPU v4 chips,Unknown,limited,unknown,"""The primary use is research on language models, including: research on NLP applications like machine translation and question answering, advancing fairness and safety research, and understanding limitations of current LLMs. Within Google, PaLM is being used for research on a variety of open- ended text and code generation tasks, including reasoning [[Section 6.3]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.3) and code synthesis and understanding [[Section 6.4]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.4)"" [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E). +","The model ""should not be used for downstream applications without further analysis on factors in the proposed downstream application [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E)"" +",unknown,Contact the authors.,,,,,,,,,, +application,PaLM API,Google,a new developer offering that makes it easy and safe to experiment with Google’s language models.,2023-03-14,https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html,,,,,['PaLM'],,,,,limited,unknown,,,,,,,,,,,,,, +model,Med-PaLM,Google,,2022-12-26,https://arxiv.org/abs/2212.13138,,text; text,,540B parameters (dense),"['Flan-PaLM', 'MultiMedQA']",,,,,closed,unknown,,,,,,,,,,,,,, +model,Med-PaLM Multimodal,Google,,2023-07-26,https://arxiv.org/pdf/2307.14334.pdf,,"image, text, genome sequence; text",Evaluated on MultiMedBench tasks and radiologist evaluations of model-generated chest X-ray reports,562B parameters (dense),"['PaLM-E', 'MultiMedBench']",,,,,closed,unknown,,,,,,,,,,,,,, +model,MultiMedQA,Google,,2022-12-26,https://arxiv.org/abs/2212.13138,,text; text,,unknown,"['MedQA', 'MedMCQA', 'PubMedQA', 'MMLU', 'LiveQA', 'Medication QA', 'HealthSearchQA']",,,,,closed,unknown,,,,,,,,,,,,,, +model,Flan-PaLM,Google,,2022-10-20,https://arxiv.org/abs/2210.11416,,text; text,,540B parameters (dense),"['PaLM', 'Muffin', 'P3', 'NaturalInstructions-v2']",,,,,closed,unknown,,,,,,,,,,,,,, +model,Flan-U-PaLM,Google,,2022-10-20,https://arxiv.org/abs/2210.11416,,text; text,,540B parameters (dense),"['U-PaLM', 'Muffin', 'P3', 'NaturalInstructions-v2']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,Muffin,Google,,2021-09-03,https://arxiv.org/abs/2109.01652,,text,,62 tasks,[],,,,,open,Apache 2.0,,,,,,,,,,,,[],, +model,U-PaLM,Google,,2022-10-20,https://arxiv.org/abs/2210.11399,,text; text,,540B parameters (dense),"['PaLM', 'PaLM dataset']",,,,,closed,unknown,,,,,,,,,,,,,, +model,PaLM-SayCan,Google,,2022-08-16,https://arxiv.org/abs/2204.01691,,text; robotics trajectories,,540B parameters (dense),['PaLM'],,,,,closed,"unknown (model weights), Apache 2.0 (SayCan code)",,,,,,,,,,,,,, +model,GLaM,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text; text,,1.2T parameters (sparse),"['GLaM Web dataset', 'Wikipedia', 'GLaM Conversations dataset', 'GLaM Forums dataset', 'BooksCorpus', 'GLaM News dataset']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,GLaM Web dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,GLaM Conversations dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,GLaM Forums dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,GLaM News dataset,Google,,2021-12-13,https://arxiv.org/abs/2112.06905,,text,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,MUM,Google,MUM (Multitask Unified Model) is a multimodal model that is specialized for more complex queries.,2021-05-18,https://blog.google/products/search/introducing-mum/,,"image, text; text",,unknown,['MUM dataset'],,,,,closed,unknown,,,,,,,,,,,,,, +dataset,MUM dataset,Google,,2021-05-18,https://blog.google/products/search/introducing-mum/,,"image, text",,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,Phenaki,Google,,2023-02-01,https://openreview.net/pdf?id=vOEXS39nOF,,text; video,,1.8B parameters (dense),"['LAION-400M', 'Phenaki Video-Text Corpus']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,Phenaki Video-Text Corpus,Google,,2023-02-01,https://openreview.net/pdf?id=vOEXS39nOF,,"text, video",,15M text-video pairs at 8FPS,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,Flan-UL2,Google,,2023-03-02,https://arxiv.org/abs/2205.05131,,text; text,,20B parameters (dense),"['UL2', 'Flan Collection']",,,,,open,Apache 2.0,,,,,,,,,,,,,, +dataset,Flan Collection,Google,,2023-01-31,https://arxiv.org/abs/2301.13688,,text,,1836 tasks,"['Flan dataset', 'P3', 'NaturalInstructions-v2']",,,,,open,Apache 2.0,,,,,,,,,,,,[],, +model,MusicLM,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,text; audio,,1.4B parameters (dense),"['SoundStream', 'w2v-BERT', 'MuLan', 'MusicLM semantic model', 'MusicLM acoustic model']",,,,,closed,unknown,,,,,,,,,,,,,, +model,SoundStream,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,,unknown,['Free Music Archive'],,,,,closed,unknown,,,,,,,,,,,,,, +model,w2v-BERT,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,,600M parameters (dense),['Free Music Archive'],,,,,closed,unknown,,,,,,,,,,,,,, +model,MuLan,Google,,2022-08-26,https://arxiv.org/abs/2208.12415,,text; audio,,unknown,"['AST', 'BERT', 'MuLan dataset']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,MuLan dataset,Google,,2022-08-26,https://arxiv.org/abs/2208.12415,,"audio, text",,370K hours audio,[],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,MusicLM dataset,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio,,280K hours audio,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,MusicLM semantic model,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,,430M parameters (dense),['MusicLM dataset'],,,,,closed,unknown,,,,,,,,,,,,,, +model,MusicLM acoustic model,Google,,2023-01-26,https://arxiv.org/pdf/2301.11325.pdf,,audio; audio,,430M parameters (dense),['MusicLM dataset'],,,,,closed,unknown,,,,,,,,,,,,,, +model,Noise2Music,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,"audio, text; audio",,unknown,['Noise2Music pseudolabel dataset'],,,,,closed,unknkown,,,,,,,,,,,,,, +dataset,LaMDA-LF,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,text,,150k songs,['LaMDA'],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,Rater-LF,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,text,,10k captions,['MusicCaps'],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,Rater-SF,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,text,,24k captions,['MusicCaps'],,,,,closed,unknown,,,,,,,,,,,,[],, +model,Noise2Music pseudolabeler,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,"audio, text; audio",,unknown,"['MuLan', 'MuLaMCap', 'LaMDA-LF', 'Rater-LF', 'Rater-SF']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,Noise2Music audio dataset,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,audio,,340k hours audio,[],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,Noise2Music pseudolabel dataset,Google,,2023-02-08,https://google-research.github.io/noise2music/noise2music.pdf,,"audio, text",,340k hours audio with pseudolabels,"['Noise2Music audio dataset', 'Noise2Music pseudolabeler']",,,,,closed,unknown,,,,,,,,,,,,[],, +application,AI Test Kitchen,Google,"AI Test Kitchen provides a new way for people to learn about, experience, and give feedback on emerging AI technology, like LaMDA.",2022-08-25,https://blog.google/technology/ai/join-us-in-the-ai-test-kitchen/,,,,,['LaMDA'],,,,,limited,unknown,,,,,,,,,,,,,, +application,Bard,Google,"Conversational AI service, powered by LaMDA",2023-02-06,https://blog.google/technology/ai/bard-google-ai-search-updates/,,,,,['LaMDA'],,,,,closed,unknown,,,,,,,,,,,,,, +model,Minerva,Google,,2022-06-29,https://arxiv.org/abs/2206.14858,,text; text,,540B parameters (dense),"['PaLM', 'arXiv', 'PaLM dataset', 'Minerva Math Web Pages dataset']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,Minerva Math Web Pages dataset,Google,,2022-06-29,https://arxiv.org/abs/2206.14858,,text,,17.5B tokens,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,USM,Google,"Universal Speech Model (USM) is a family of state-of-the-art speech models with 2B parameters trained on 12 million hours of speech and 28 billion sentences of text, spanning 300+ languages. USM, which is for use in YouTube (e.g., for closed captions), can perform automatic speech recognition (ASR) on widely-spoken languages like English and Mandarin, but also languages like Punjabi, Assamese, Santhali, Balinese, Shona, Malagasy, Luganda, Luo, Bambara, Soga, Maninka, Xhosa, Akan, Lingala, Chichewa, Nkore, Nzema to name a few. Some of these languages are spoken by fewer than twenty million people, making it very hard to find the necessary training data.",2023-03-06,https://arxiv.org/abs/2303.01037,,"audio, text; text",,2B parameters (dense),"['YT-NLU-U', 'Pub-U', 'Web-NTL', 'YT-SUP+', 'Pub-S']",,,,,limited,unknown,,,,,,,,,,,,,, +application,YouTube,Google,YouTube is a global online video sharing and social media platform,2005-02-14,https://www.youtube.com/,,,,,['USM'],,,,,open,,,,,,,,,,,,,,, +model,PaLM-E,Google,,2023-03-06,https://arxiv.org/abs/2303.03378,,"image, text; text",,562B parameters (dense),"['PaLM', 'ViT-22B']",,,,,closed,unknown,,,,,,,,,,,,,, +model,ViT-22B,Google,,2023-02-10,https://arxiv.org/abs/2302.05442,,"image, image",,22B parameters (dense),['JFT'],,,,,closed,unknown,,,,,,,,,,,,,, +model,AudioLM,Google,,2022-09-07,https://arxiv.org/abs/2209.03143,,"audio, text; audio",,1B parameters (dense),"['w2v-BERT', 'SoundStream']",,,,,closed,unknown,,,,,,,,,,,,,, +model,PaLI,Google,,2022-09-14,https://arxiv.org/abs/2209.06794,,text; image,,17B parameters (dense),"['mT5', 'ViT-e', 'WebLI']",,,,,closed,unknown,,,,,,,,,,,,,, +model,ViT-e,Google,,2022-09-14,https://arxiv.org/abs/2209.06794,,image; image,,3.9B parameters (dense),['JFT'],,,,,closed,unknown,,,,,,,,,,,,,, +dataset,WebLI,Google,,2022-09-14,https://arxiv.org/abs/2209.06794,,"image, text",,"10B images, 12B alt-text",[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,Vid2Seq,Google,,2023-02-27,https://arxiv.org/abs/2302.14115,,video; text,,500M parameters (dense),"['T5', 'CLIP', 'YT-Temporal-1B']",,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,Google Joint SLM,Google,Joint speech and language model using a Speech2Text adapter and using a CTC-based blank-filtering.,2023-06-08,https://arxiv.org/pdf/2306.07944.pdf,,audio; text,"evaluated on DSTC11 Challenge Task, based on MultiWoz 2.1, with a focus on dialog state tracking.",unknown,"['CTC blank-filtering', 'Speech2Text adapter']",,,,,closed,unknown,,,,,,,,,,,,,, +model,PaLM 2,Google,PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives similar to UL2.,2023-05-10,https://blog.google/technology/ai/google-palm-2-ai-large-language-model/,https://ai.google/static/documents/palm2techreport.pdf,text; text,Reports results on standard code benchmarks across a variety of programming languages.,unknown,['PaLM 2 dataset'],,,TPU v4 (number unspecified),"Employed de-duplication, removal of sensitive-PII and filtering. Added control tokens marking toxicity of text.",open,unknown,"general use large language model that can be used for language, reasoning, and code tasks.",becoming part of a general-purpose service or product or use within specific downstream applications without prior assessment,Google internal monitoring,Specific queries provided by annotators,,,,,,,,,, +model,MedLM,Google,"MedLM is a collection of foundation models tuned to follow natural language instructions for tasks in medicine, such as question answering and creating draft summaries.",2023-12-13,https://cloud.google.com/vertex-ai/docs/generative-ai/medlm/overview,https://cloud.google.com/static/vertex-ai/docs/generative-ai/medlm/MedLM-model-card.pdf,text; text,"Assessed on medical benchmarks of professional medical exams, medical research, and consumer queries.",unknown,[],unknown,unknown,unknown,,limited,unknown,"to be used for question answering and creating draft summaries from existing documentation, to be reviewed, edited, and approved by the user before use.",,Google internal monitoring,,,,,,,,,,, +model,Gemini,Google,"As of release, Gemini is Google's most capable and flexible AI model, proficient in multimodal domains.",2023-12-06,https://deepmind.google/technologies/gemini/#introduction,,"text; image, text, video","Evaluated on standard general, reasoning, math, coding, and multimodal benchmarks with results that surpass GPT-4 on almost all.",unknown,[],unknown,unknown,unknown,,closed,unknown,"general use large language model that can be used for language, reasoning, and code tasks.",becoming part of a general-purpose service or product or use within specific downstream applications without prior assessment,Google internal monitoring,,,,,,,,,,, +model,TimesFM,Google,TimesFM is a single forecasting model pre-trained on a large time-series corpus of 100 billion real world time-points.,2024-02-02,https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html,,,Evaluated on popular time-series benchmarks.,200M parameters (dense),[],unknown,unknown,unknown,,closed,unknown,,,unknown,,,,,,,,,,, +model,Gemma,Google,"Gemma is a family of lightweight, state-of-the-art open models from Google, based on the Gemini models. They are text-to-text, decoder-only large language models, available in English.",2024-02-21,https://blog.google/technology/developers/gemma-open-models/,https://huggingface.co/google/gemma-7b,text; text,Evaluation was conducted on standard LLM benchmarks and includes internal red-teaming testing of relevant content policies.,7B parameters (dense),[],unknown,unknown,TPUv5e,"Multiple evaluations and red-teaming conducted, with particular focus on ethics, bias, fair use cases, and safety.",open,custom,"Text generation tasks including question answering, summarization, and reasoning; content creation, communication, research, and education.",Prohibited uses are specified in the Gemma Prohibited Use Policy here https://ai.google.dev/gemma/prohibited_use_policy,,https://huggingface.co/google/gemma-7b/discussions,,,,,,,,,, +model,Med-Gemini,Google,"Med-Gemini is a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly integrate the use of web search, and that can be efficiently tailored to novel modalities using custom encoders.",2024-04-29,https://arxiv.org/pdf/2404.18416,,"image, text; text","Evaluated Med-Gemini on 14 medical benchmarks spanning text, multimodal and long-context applications, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpassing the GPT-4 model family on every benchmark where a direct comparison is viable.",unknown,"['Gemini', 'MultiMedBench']",unknown,unknown,unknown,,closed,unknown,"To be used in areas of medical research including medical summarization, referral letter generation, and medical simplification tasks.",Unfit for real-world deployment in the safety-critical medical domain.,,,,,,,,,,,, +model,Midm,KT Corporation,Midm is a pre-trained Korean-English language model developed by KT. It takes text as input and creates text. The model is based on Transformer architecture for an auto-regressive language model.,2023-10-31,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1,text; text,unknown,7B parameters,"['AI-HUB dataset', 'National Institute of Korean Language dataset']",unknown,unknown,unknown,"KT tried to remove unethical expressions such as profanity, slang, prejudice, and discrimination from training data.",open,CC-BY-NC 4.0,It is expected to be used for various research purposes.,It cannot be used for commercial purposes.,unknown,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1/discussions,,,,,,,,,, +application,Poe,Quora,"Poe lets people ask questions, get instant answers, and have back-and-forth conversations with several AI-powered bots. It is initially available on iOS, but we will be adding support for all major platforms in the next few months, along with more bots.",2023-02-03,https://quorablog.quora.com/Poe-1,,,,,"['ChatGPT API', 'GPT-4 API', 'Claude API', 'Dragonfly API', 'Sage API']",,,,,limited,,,,,,,,https://poe.com/tos,,,,,,, +model,Aurora-M,"Tokyo Institute of Technology, MIT-IBM Watson Lab, Sapienza University of Rome","Aurora-M is a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code.",2024-04-23,https://arxiv.org/pdf/2404.00399,,text; text,"Evaluated on all language datasets compared to similarly sized SOTA models, with Aurora-M achieving strong performance in most.",15B parameters,['StarCoderPlus'],unknown,48 days,"LUMI supercomputer, using 128 AMD MI250X GPUs",,open,unknown,,,unknown,,,,,,,,,,, +model,GPT-JT,Together,,2022-11-29,https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai,,text; text,,6B parameters (dense),"['GPT-J', 'P3', 'NaturalInstructions-v2']",,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,GPT-NeoXT-Chat-Base,Together,,2023-03-10,https://www.together.xyz/blog/openchatkit,,text; text,,20B parameters (dense),"['GPT-NeoX', 'OIG-43M']",,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,OpenChatKit moderation model,Together,,2023-03-10,https://www.together.xyz/blog/openchatkit,,text; text,,6B parameters (dense),"['GPT-JT', 'OIG-moderation']",,,,,open,Apache 2.0,,,,,,,,,,,,,, +dataset,OIG-43M,"Together, LAION, Ontocord",,2023-03-10,https://laion.ai/blog/oig-dataset/,,text,,43M instructions,"['P3', 'NaturalInstructions-v2', 'FLAN dataset']",,,,,open,Apache 2.0,,,,,,,,,,,,[],, +dataset,OIG-moderation,"Together, LAION, Ontocord",,2023-03-10,https://laion.ai/blog/oig-dataset/,,text,,unknown,[],,,,,open,Apache 2.0,,,,,,,,,,,,[],, +dataset,RedPajama-Data,Together,The RedPajama base dataset is a 1.2 trillion token fully-open dataset created by following the recipe described in the LLaMA paper,2022-04-17,https://www.together.xyz/blog/redpajama,,text,,1.2 trillion tokens,"['GitHub', 'Wikipedia']",,,,,open,Apache 2.0,,,,,,,,,,,https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T,[],, +model,Llama-2-7B-32K-Instruct,Together,"Llama-2-7B-32K-Instruct is an open-source, long-context chat model finetuned from Llama-2-7B-32K, over high-quality instruction and chat data.",2023-08-18,https://together.ai/blog/llama-2-7b-32k-instruct,https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct,text; text,"Model evaluated over AlpacaEval, Rouge score over BookSum, and accuracy over MQA.",7B parameters (dense),"['BookSum dataset', 'MQA dataset', 'Together API', 'LLaMA 2']",,,,,open,LLaMA 2,,,,https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct/discussions,,,,,,,,,, +dataset,RedPajama-Data-v2,Together,"RedPajama-Data-v2 is a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting.",2023-10-30,https://together.ai/blog/redpajama-data-v2,,text,,30 trillion tokens,['Common Crawl'],,,,tokens filtered and deduplicated,open,Apache 2.0,"To be used as the start of a larger, community-driven development of large-scale datasets for LLMs.",,,Feedback can be sent to Together via https://www.together.ai/contact,,,,,,,,[],"documents in English, German, French, Spanish, and Italian.", +model,StripedHyena,Together,"StripedHyena is an LLM and the first alternative model competitive with the best open-source Transformers in short and long-context evaluations, according to Together.",2023-12-08,https://www.together.ai/blog/stripedhyena-7b,https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B,text; text,Model evaluated on a suite of short-context task benchmarks.,7B parameters (dense),"['Hyena', 'RedPajama-Data']",unknown,unknown,unknown,,open,Apache 2.0,,,,https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B/discussions,,,,,,,,,, +model,StripedHyena Nous,Together,"StripedHyena Nous is an LLM and chatbot, along with the first alternative model competitive with the best open-source Transformers in short and long-context evaluations, according to Together.",2023-12-08,https://www.together.ai/blog/stripedhyena-7b,https://huggingface.co/togethercomputer/StripedHyena-Nous-7B,text; text,Model evaluated on a suite of short-context task benchmarks.,7B parameters (dense),"['Hyena', 'RedPajama-Data']",unknown,unknown,unknown,,open,Apache 2.0,,,,https://huggingface.co/togethercomputer/StripedHyena-Nous-7B/discussions,,,,,,,,,, +model,OpenAssistant LLaMA 2,OpenAssistant,OpenAssistant LLaMA 2 is an Open-Assistant fine-tuning of Meta's LLaMA 2.,2023-08-23,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10,text; text,,70B parameters (dense),['LLaMA 2'],unknown,unknown,unknown,,open,LLaMA 2,,,unknown,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10/discussions,,,,,,,,,, +model,Koala,Berkeley,A relatively small chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web.,2023-04-03,https://bair.berkeley.edu/blog/2023/04/03/koala/,https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g,text; text,Evaluated in comparison with ChatGPT and Stanford Alpaca.,13B parameters (dense),"['LLaMA', 'web-scraped dialogue data']",,6 hours,8 A100 GPUs,,open,Apache 2.0,academic research,,,https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g/discussions,,,,,,,,,, +model,Gorilla,Berkeley,Gorilla is a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls.,2023-05-24,https://arxiv.org/pdf/2305.15334v1.pdf,,text; API,Evaluated using AST sub-tree matching technique and compared to other models in terms of API functionality accuracy.,7B parameters (dense),"['LLaMA', 'Gorilla document retriever']",,,,"No specific quality control is mentioned in model training, though details on data processing and collection are provided in the paper.",open,Apache 2.0,In conjunction with a LLM to improve its capability for using API calls.,,,,,,,,,,,,, +model,OpenLLaMA,Berkeley,OpenLlama is an open source reproduction of Meta's LLaMA model.,2023-05-03,https://github.com/openlm-research/open_llama,,text; text,Evaluated on wide range of tasks using own evaluation benchmarks.,17B parameters (dense),['RedPajama'],unknown,unknown,,,open,Apache 2.0,,,,,,,,,,,,,, +model,GAIA-1,Wayve,"GAIA-1 (‘Generative AI for Autonomy’) is a generative world model that leverages video, text, and action inputs to generate realistic driving scenarios while offering fine-grained control over ego-vehicle behavior and scene features.",2023-09-29,https://arxiv.org/pdf/2309.17080.pdf,,"text, video; video",Compared to self before being scaled on quality of video generation.,9B parameters (dense),[],unknown,4 days,32 A100 80GB GPUs,,closed,unknown,"The main use cases are pure representation learning, planning (look-ahead search), or learning a policy in the world model (neural simulator)",,,,,,,,,,,,, +application,Shop Assistant,Shop,"When shoppers search for products, the shopping assistant makes personalized recommendations based on their requests. Shop’s new AI-powered shopping assistant will streamline in-app shopping by scanning millions of products to quickly find what buyers are looking for—or help them discover something new.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,['ChatGPT API'],,,,,open,,,,,,,,,,,,,,, +model,Prism,Toyota Research Institute,Prism is a family of VLMs trained using new analyses about key vision design axes.,2024-02-09,https://arxiv.org/pdf/2402.07865.pdf,,"image, text; text",Evaluated on standard VLM benchmarks and outperforms SotA open-source VLMs as of release.,7B parameters (dense),[],unknown,less than 9 hours,8 A100 GPUs,,open,LLaMA 2,,,unknown,,,,,,,,,,, +dataset,Conformer-1 dataset,AssemblyAI,The dataset used to train AssemblyAI's Conformer-1 model.,2023-03-15,https://www.assemblyai.com/blog/conformer-1/,,"audio, text",,650K hours audio (60TB),[],,,,,closed,unknown,,,,,,,,,,,,[],"""our team curated a dataset of 650K hours of English audio - consisting of proprietary internal datasets and various sources from the internet"" +", +model,Conformer-1,AssemblyAI,"Conformer-1 is a state-of-the-art speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data, making up to 43% fewer errors on noisy data than other ASR models.",2023-03-15,https://www.assemblyai.com/blog/conformer-1/,,"audio, text; audio, text","In order to evaluate the accuracy and robustness of Conformer-1, we sourced 60+ hours of human labeled audio data covering popular speech domains such as call centers, podcasts, broadcasts, and webinars. We then calculated the Word Error Rate (WER) of Conformer-1 against these datasets, and compared the results against Whisper and a number of other ASR models. To ground our results against popular open source speech recognition benchmarks, we also performed the same WER analysis against a number of academic datasets.",300M parameters (dense),['Conformer-1 dataset'],,,,,limited,unknown,,,,,,,,,,,,,, +application,AssemblyAI,AssemblyAI,AssemblyAI uses Claude and Anthropic's model to transcribe and understand audio data at scale.,2022,https://www.assemblyai.com/,,,,,['Anthropic API'],,,,,limited,custom,,,,,,,https://www.assemblyai.com/legal/terms-of-service,,,,,,, +application,Conformer-1 API,AssemblyAI,API to access the AssemblyAI's Conformer-1 model.,2023-03-15,https://www.assemblyai.com/blog/conformer-1/,,,,,['Conformer-1'],,,,,open,custom,Speech recognition,,,,,,https://www.assemblyai.com/legal/terms-of-service,,,,,,, +model,OceanGPT,Zhejiang University,OceanGPT is the first-ever LLM in the ocean domain and displays expertise in various ocean science tasks.,2024-02-06,https://www.zjukg.org/project/OceanGPT/,https://huggingface.co/zjunlp/OceanGPT-7b,text; text,Evaluated on standard and ocean science benchmarks in comparison to other similar-sized models.,7B parameters (dense),[],unknown,7 days,6 A800 NVIDIA GPUs,,open,MIT,,,unknown,https://huggingface.co/zjunlp/OceanGPT-7b/discussions,,,,,,,,,, +application,LinkedIn,LinkedIn,"More than 40 percent of LinkedIn's feed posts include at least one image. We want every member to have equal access to opportunity and are committed to ensuring that we make images accessible to our members who are blind or who have low vision so they can be a part of the online conversation. With Azure Cognitive Service for Vision, we can provide auto-captioning to edit and support alt. text descriptions.",,https://www.linkedin.com/,,,,,['Azure Cognitive Services for Vision'],,,,,open,unknown,,,,,,,,,,,,,, +model,Pegasus-1,Twelve Labs,Pegasus-1 is a video-language foundation model.,2023-10-23,https://app.twelvelabs.io/blog/introducing-pegasus-1,,video; text,Evaluated in comparison to SOTA video-to-language models.,80B parameters (dense),"['MSR-VTT', 'Video-ChatGPT Video Descriptions Dataset']",unknown,unknown,unknown,Data selected and cleaned to eliminate toxic and biased content.,open,unknown,,,,,,,,,,,,,, +model,Marengo 2.6,Twelve Labs,"Marengo 2.6 is a new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more. ",2024-03-01,https://www.twelvelabs.io/blog/introducing-marengo-2-6,,"audio, image, text; audio, image, video","Marengo-2.6 sets new benchmarks in zero-shot text-to-video, text-to-image, and text-to-audio retrieval tasks with a single embedding model.",unknown,[],unknown,unknown,unknown,,open,unknown,,,,,,,,,,,,,, +model,SambaLingo,Samba Nova Systems,SambaLingo is a suite of models that adapt Llama 2 to a diverse set of 9 languages.,2024-02-26,https://sambanova.ai/blog/sambalingo-open-source-language-experts,https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base,text; text,Evaluated on open source multilingual model benchmarks.,unknown,['Llama 2'],unknown,unknown,unknown,,open,LLaMA 2,,"SambaLingo should not be used for mission-critical applications, applications involving the safety of others, and highly critical decisions.",,https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base/discussions,,,,,,,,,, +model,Samba 1,Samba Nova Systems,Samba 1 is a trillion parameter generative AI model using a Composition of Experts architecture.,2024-02-28,https://sambanova.ai/blog/samba-1-composition-of-experts-mode,,text; text,unknown,1T parameters (dense),"['Llama 2', 'Mistral', 'Falcon-180B', 'Deepseek', 'BLOOM', 'LLaVA', 'CLIP']",unknown,unknown,unknown,,limited,unknown,,,unknown,,,,,,,,,,, +model,YaLM,Yandex,YaLM is a 100B parameter autoregressive model trained on 25% English and 75% Russian text.,2022-06-22,https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6,,text; text,,100B parameters (dense),"['The Pile', 'Yandex Russian Pretraining Dataset']",,,Yandex 800 A100 Cluster,,open,Apache 2.0,,,,,,,,,,,,,, +application,Yandex Search,Yandex,Yandex is a search engine and web portal. Yandex offers internet search and other services,2022-06-23,https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6,,,,,['YaLM'],,,,,open,custom,,,,,,,https://yandex.com/legal/browser_agreement/,,,,,,, +model,Devin,Cognition Labs,Devin is the world’s first fully autonomous AI software engineer.,2024-03-12,https://www.cognition-labs.com/introducing-devin,,text; code,"Evaluated on SWE-Bench, a challenging software engineering benchmark, where Devin outperforms major state of the art models unassisted.",unknown,[],unknown,unknown,unknown,,limited,unknown,,,,,,,,,,,,,, +application,Viable,Viable,"Viable analyzes qualitative consumer feedback and provides summary feedback to companies. +",,https://www.askviable.com/,,,,,['OpenAI API'],,,,unknown,limited,unknown,"Intended to be used by companies to digest qualitative consumer feedback. +","Prohibited uses are listed in the Terms of Service [[Terms of Service]](https://www.askviable.com/terms-of-service). The terms don't include statements specific to the use of the content generated by the system or GPT-3. +",unknown,unknown,unknown,"Question and answer, summarization, sentiment analysis, topic identification",https://www.askviable.com/terms-of-service,unknown,unknown,unknown,,,, +model,Animagine XL 3.1,Cagliostro Research Lab,"An open-source, anime-themed text-to-image model enhanced to generate higher quality anime-style images with a broader range of characters from well-known anime series, an optimized dataset, and new aesthetic tags for better image creation.",2024-03-18,https://cagliostrolab.net/posts/animagine-xl-v31-release,https://huggingface.co/cagliostrolab/animagine-xl-3.1,text; image,unknown,unknown,['Animagine XL 3.0'],unknown,"Approximately 15 days, totaling over 350 GPU hours.",2x A100 80GB GPUs,"The model undergoes pretraining, first stage finetuning, and second stage finetuning for refining and improving aspects such as hand and anatomy rendering.",open,Fair AI Public License 1.0-SD,"Generating high-quality anime images from textual prompts. Useful for anime fans, artists, and content creators.",Not suitable for creating realistic photos or for users who expect high-quality results from short or simple prompts.,unknown,https://huggingface.co/cagliostrolab/animagine-xl-3.1/discussions,,,,,,,,,, +model,BigTrans,Institute of Automation Chinese Academy of Sciences,BigTrans is a model which adapts LLaMA that covers only 20 languages and enhances it with multilingual translation capability on more than 100 languages,2023-05-29,https://arxiv.org/pdf/2305.18098v1.pdf,https://huggingface.co/James-WYang/BigTrans,text; text,Reports results on standard translation benchmarks across 102 languages in comparison with Google Translate and ChatGPT,13B parameters (dense),"['LLaMA', 'CLUE', 'BigTrans parallel dataset']",unknown,unknown,16 A100 GPUs with 80 GB of RAM,,open,Apache 2.0,Advancing future research in multilingual LLMs,,,https://huggingface.co/James-WYang/BigTrans/discussions,,,,,,,,,, +model,YAYI 2,Institute of Automation Chinese Academy of Sciences,YAYI 2 is an open source large language model trained in both English and Chinese.,2023-12-22,https://arxiv.org/pdf/2312.14862.pdf,https://huggingface.co/wenge-research/yayi2-30b,text; text,"Evaluated on standard benchmarks for knowledge and language understanding, mathematical reasoning, and programming ability in comparison to similarly sized open-source models.",30B parameters (dense),[],unknown,unknown,over 1000 A800 GPUs,"data is deduplicated, normalized, cleaned, and filtered for toxicity",open,custom,,,,https://huggingface.co/wenge-research/yayi2-30b/discussions,,,,,,,,,, +application,DuckAssist,DuckDuckGo,The first Instant Answer in DuckDuckGo search results to use natural language technology to generate answers to search queries using Wikipedia and other related sources,2023-03-08,https://spreadprivacy.com/duckassist-launch/,,,,,['Anthropic API'],,,,,open,unknown,,,,,,,,,,,,,, +model,Vulture,Virtual Interactive,Vulture is a further fine-tuned causal Decoder-only LLM built by Virtual Interactive (VILM) on top of Falcon.,2023-10-02,https://huggingface.co/vilm/vulture-180b,https://huggingface.co/vilm/vulture-180b,text; text,,180B parameters (dense),['Falcon-180B'],unknown,3000 A100 hours,unknown,,open,Apache 2.0,,Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,unknown,https://huggingface.co/vilm/vulture-180b/discussions,,,,,,,,,, +model,MoMo,Moreh,MoMo is a large language model fine-tuned from Qwen.,2024-01-16,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO,text; text,unknown,72B parameters (dense),"['Qwen', 'OpenOrca']",unknown,unknown,AMD’s MI250 GPU,unknown,open,MIT,,,unknown,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO/discussions,,,,,,,,,, +model,RakutenAI,Rakuten,RakutenAI-7B is a model developed with a focus on Japanese language understanding. It offers competitive performance on English tests as well.,2024-03-21,https://global.rakuten.com/corp/news/press/2024/0321_01.html,https://huggingface.co/Rakuten/RakutenAI-7B,text; text,"RakutenAI achieves the highest average score in both Japanese and English LM-Harness metrics, outperforming other similarly-sized Japanese language models.",7B parameters,['Mistral'],unknown,unknown,unknown,unknown,open,Apache 2.0,The model can be used for text generation tasks in both Japanese and English.,unknown,unknown,https://huggingface.co/Rakuten/RakutenAI-7B/discussions,,,,,,,,,, +application,Moonhub Recruiter,Moonhub,Moonhub Recruiter is the world's first AI-powered recruiter providing sourcing and recruiting services for startups and growing businesses.,2023-10-11,https://www.moonhub.ai/,,,,,['Cohere Base'],,,,,limited,custom,recruiting candidates for business needs,,,,,job candidate matches,https://www.moonhub.ai/terms,unknown,unknown,,,,, +dataset,You dataset,You,,,https://you.com/,,text,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,[],, +model,You model,You,,,https://you.com/,,text; text,,unknkown,['You dataset'],,,,,closed,unknown,,,,,,,,,,,,,, +application,You Search,You,You.com is a search engine built on artificial intelligence that provides users with a customized search experience while keeping their data 100% private.,,https://you.com/,,,,,['You model'],,,,,open,unknown,,,,,,,,,,,,,, +model,Sonic,Cartesia,"Sonic is a low-latency voice model that generates lifelike speech. Developed by Cartesia, it was designed to be an efficient real-time AI capable of processing any-sized contexts and running on any device.",2024-05-29,https://cartesia.ai/blog/sonic,,text; audio,"Extensive testing on Multilingual Librispeech dataset resulted in 20% lower validation perplexity. In downstream evaluations, this leads to a 2x lower word error rate and a 1 point higher quality score. Sonic also displays impressive performance metrics at inference, achieving lower latency (1.5x lower time-to-first-audio), faster inference speed (2x lower real-time factor), and higher throughput (4x).",2024-05-29,['Multilingual Librispeech dataset'],unknown,unknown,unknown,,limited,unknown,"Sonic has potential applications across customer support, entertainment, and content creation and is a part of Cartesias broader mission to bring real-time multimodal intelligence to every device.",unknown,unknown,Contact through the provided form or via email at join@cartesia.ai.,,,,,,,,,, +application,My AI for Snapchat,Snap,"My AI offers Snapchatters a friendly, customizable chatbot at their fingertips that offers recommendations, and can even write a haiku for friends in seconds. Snapchat, where communication and messaging is a daily behavior, has 750 million monthly Snapchatters.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,['ChatGPT API'],,,,,open,custom,,,,,,,https://snap.com/terms,,,,,,, +model,Deepseek,Deepseek AI,Deepseek is a 67B parameter model with Grouped-Query Attention trained on 2 trillion tokens from scratch.,2023-11-28,https://github.com/deepseek-ai/DeepSeek-LLM,https://huggingface.co/deepseek-ai/deepseek-llm-67b-base,text; text,"Deepseek and baseline models (for comparison) evaluated on a series of representative benchmarks, both in English and Chinese.",67B parameters (dense),[],unknown,unknown,unknown,Training dataset comprised of diverse data composition and pruned and deduplicated.,open,custom,,,unknown,https://huggingface.co/deepseek-ai/deepseek-llm-67b-base/discussions,,,,,,,,,, +model,Deepseek Chat,Deepseek AI,Deepseek Chat is a 67B parameter model initialized from Deepseek and fine-tuned on extra instruction data.,2023-11-29,https://github.com/deepseek-ai/DeepSeek-LLM,https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat,text; text,"Deepseek and baseline models (for comparison) evaluated on a series of representative benchmarks, both in English and Chinese.",67B parameters (dense),['Deepseek'],unknown,unknown,unknown,Training dataset comprised of diverse data composition and pruned and deduplicated.,open,custom,,,unknown,https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat/discussions,,,,,,,,,, +model,Deepseek Coder,Deepseek AI,"Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.",2023-11-03,https://github.com/deepseek-ai/DeepSeek-Coder,https://huggingface.co/deepseek-ai/deepseek-coder-33b-base,text; code,"Evaluated on code generation, code completion, cross-file code completion, and program-based math reasoning across standard benchmarks.",33B parameters (dense),[],unknown,unknown,8 NVIDIA A100 GPUs and 8 NVIDIA H800 GPUs,,open,custom,,,unknkown,https://huggingface.co/deepseek-ai/deepseek-coder-33b-base/discussions,,,,,,,,,, +model,BioMedLM,Stanford,,2022-12-15,https://crfm.stanford.edu/2022/12/15/pubmedgpt.html,,text; text,,2.7B parameters (dense),['The Pile'],,,,,open,bigscience-bloom-rail-1.0,,,,,,,,,,,,,, +model,RoentGen,Stanford,RoentGen is a generative medical imaging model that can create visually convincing X-ray images.,2022-11-23,https://arxiv.org/pdf/2211.12737.pdf,,text; image,Evaluated on own framework that tests domain-specific tasks in medical field.,330M parameters (dense),"['Stable Diffusion', 'RoentGen radiology dataset']",unknown,60k training steps per day,64 A100 GPUs,,open,,,,,,,,,,,,,,, +model,CORGI,Stanford,Model trained to generate language corrections for physical control tasks.,2023-06-12,https://arxiv.org/pdf/2306.07012.pdf,,human trajectories; text,"Evaluated on three physical control tasks, drawing, steering, and human body movement on various dynamics",124M parameters (dense),"['GPT-2', 'BABEL', 'text-davinci-003']",,unknown,one NVIDIA A40 GPU,,open,MIT,,,,,,,,,,,,,, +dataset,Alpaca dataset,Stanford,"Alpaca dataset consistes of 52,000 instruction-following demonstrations generated in the style of the [Self-Instruct framework](https://github.com/yizhongw/self-instruct) using OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. +",2023-03-13,https://crfm.stanford.edu/2023/03/13/alpaca.html,,text (English),,52K instruction-following demonstrations,['text-davinci-003'],,,,,open,CC BY-NC 4.0,Alpaca is intended and licensed for research use only.,,,Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).,,,,,,,https://huggingface.co/datasets/tatsu-lab/alpaca,[],, +model,Alpaca,Stanford,"Alpaca-7B is an instruction-following model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. +",2023-03-13,https://crfm.stanford.edu/2023/03/13/alpaca.html,,text (English),,7B parameters (dense model),"['LLaMa', 'Alpaca dataset']",unknown,,,,open,CC BY NC 4.0 (model weights),Alpaca is intended and licensed for research use only.,,,Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).,,,,,,,,,, +model,Nucleus,Nucleus.AI,Nucleus is a 22B parameters causal decoder-only model built by Nucleus.AI and trained on 500B tokens of RefinedWeb along with curated corpora.,2023-10-05,https://www.withnucleus.ai/,https://huggingface.co/NucleusAI/nucleus-22B-token-500B,text; text,"Evaluated on the OpenLLM leaderboard, performing on par with similar-sized models.",22B parameters (dense),['RefinedWeb'],unknown,2 weeks,unknown,,open,MIT,"Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)",Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,unknown,https://huggingface.co/NucleusAI/nucleus-22B-token-500B/discussions,,,,,,,,,, +model,VLMo,Microsoft,VLMo is a model for text-to-image generation,2021-11-03,https://arxiv.org/abs/2111.02358,,text; image,,562M parameters (dense),"['Conceptual Captions', 'SBU Captions', 'COCO', 'Visual Genome', 'Wikipedia', 'BooksCorpus']",,,,,closed,,,,,,,,,,,,,,, +model,T-ULRv5,Microsoft,T-ULRv5 is a language model trained with two unique training objectives,2022-09-28,https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal-language-representation-model-t-ulrv5-tops-xtreme-leaderboard-and-trains-100x-faster/,,text; text,,2.2B parameters (dense),[],,Less than two weeks,256 A100,,limited,unknown,,,,,,,,,,,,,, +model,Turing NLR-v5,Microsoft,,2021-12-02,https://www.microsoft.com/en-us/research/blog/efficiently-and-effectively-scaling-up-language-model-pretraining-for-best-language-representation-model-on-glue-and-superglue/?OCID=msr_blog_TNLRV5_tw,,text; text,,5B parameters (dense),[],,,,,limited,unknown,,,,,,,,,,,,,, +model,Megatron-Turing NLG,"Microsoft, NVIDIA","Megatron-Turing NLG is a 530B parameter autoregressive language model. +",2022-01-28,https://arxiv.org/abs/2201.11990,,text; text,,530B parameters (dense),['The Pile'],,,4480 A100s (560 x 8),,limited,unknown,,,,,,,,,,,,,, +model,VALL-E,Microsoft,Vall-E is a neural code model for text-to-speech synthesis,2023-01-05,https://valle-demo.github.io/,,text; audio,,unknown,[],,,16 V100 32GB GPUs,,closed,unknown,,,,,,,,,,,,,, +application,GitHub CoPilot,Microsoft,"GitHub CoPilot is a coding pair programmer assisting programmers as they write code. +",2021-06-29,https://copilot.github.com/,,,,,['Codex'],,,,"GitHub is working on a filter to detect and suppress code generations that are verbatim from the training set [[GitHub Research Recitation]] (https://docs.github.com/en/github/copilot/research-recitation). According to the FAQ, GitHub implemented a simple filter that blocks emails in standard formats to protect personally identifiable data that may be present in the training data [[GitHub CoPilot]](https://copilot.github.com/). +",limited,unknown,"GitHub CoPilot is intended to be used as a coding assistant. +","Access to GPT-3 is governed by GitHub Acceptable Use Policies and Terms of Service, both of which list a set of prohibited uses [[Use Policies]] (https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies) [[Terms of Service]] (https://docs.github.com/en/site-policy/github-terms/github-terms-of-service). +","value: unknown explanation: > + There may be internal monitoring mechanisms unknown to the public. +","Feedback can be provided in the CoPilot feedback project [[CoPilot feedback]] (https://github.com/github/feedback/discussions/categories/copilot-feedback). +",unknown,Code completions,"https://docs.github.com/en/site-policy/github-terms/github-terms-of-service +","GitHub Copilot reportedly has over 1 million sign-ups [[Tweet Source]](https://twitter.com/sama/status/1539737789310259200?s=21&t=YPaYd0ZueJzrR6rLslUqzg). +",unknown,unknown,,,, +model,BioGPT,Microsoft,,2022-09-24,https://academic.oup.com/bib/article/23/6/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9&login=true,,text; text,,1.5B parameters (dense),['PubMed'],,,,,open,MIT,,,,,,,,,,,,,, +application,Bing Search,Microsoft,"AI-powered Bing search engine and Edge browser, available in preview now at Bing.com, to deliver better search, more complete answers, a new chat experience and the ability to generate content. We think of these tools as an AI copilot for the web.",2023-02-07,https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/,,,,,['ChatGPT API'],,,,,limited,custom,Search engine,,,"Feedback can be submitted at [bing.com](bing.com). +",unknown,Search results,https://www.microsoft.com/legal/terms-of-use,,,,,,, +model,KOSMOS-1,Microsoft,"KOSMOS-1 is a multimodal language model that is capable of perceiving multimodal input, following instructions, and performing in-context learning for not only language tasks but also multimodal tasks.",2023-03-01,https://arxiv.org/pdf/2302.14045.pdf,,"image, text; image, text",,1.6B parameters (dense),"['The Pile', 'CommonCrawl', 'LAION-2B-en', 'LAION-400M', 'COYO-700M', 'Conceptual Captions']",,,,,closed,MIT,,,,,,,,,,,,,, +model,Prometheus,Microsoft,"In the context of Bing, we have developed a proprietary way of working with the OpenAI model that allows us to best leverage its power. We call this collection of capabilities and techniques the Prometheus model. This combination gives you more relevant, timely and targeted results, with improved safety.",2023-02-07,https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/,,unknown,,unknown,[],,,,,closed,unknown,,,,,,,,,,,,,, +model,Florence,Microsoft,,2022-11-23,https://arxiv.org/abs/2111.11432,,text; image,,900M parameters (dense),['FLD-900M'],,,,,closed,unknown,,,,,,,,,,,,,, +dataset,FLD-900M,Microsoft,,2022-11-23,https://arxiv.org/abs/2111.11432,,"image, text",,900M image-text pairs,[],,,,,closed,unknown,,,,,,,,,,,,[],, +application,Azure Cognitive Services for Vision,Microsoft,"Cost-effective, production-ready computer vision services in Azure Cognitive Service for Vision. The improved Vision Services enables developers to create cutting-edge, market-ready, responsible computer vision applications across various industries.",2023-03-07,https://azure.microsoft.com/en-us/blog/announcing-a-renaissance-in-computer-vision-ai-with-microsofts-florence-foundation-model/?utm_content=buffer16fa0&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer,,,,,['Florence'],,,,,limited,custom,,,,,,,https://azure.microsoft.com/en-us/support/legal/,,,,,,, +model,VisualChatGPT,Microsoft,,2023-03-08,https://arxiv.org/pdf/2303.04671.pdf,,"text; image, text",,unknown,['OpenAI API'],,,,,closed,,,,,,,,,,,,,,, +application,Microsoft 365 Copilot,Microsoft,It combines the power of language models with your data in the Microsoft Graph and the Microsoft 365 apps to turn your words into the most powerful productivity tool on the planet.,2023-03-16,https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/,,,,,['GPT-4 API'],,,,,limited,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft Business Chat,Microsoft,"Business Chat works across the langugae model, the Microsoft 365 apps, and your data — your calendar, emails, chats, documents, meetings and contacts — to do things you’ve never been able to do before. You can give it natural language prompts like “Tell my team how we updated the product strategy,” and it will generate a status update based on the morning’s meetings, emails and chat threads.",2023-03-16,https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/,,,,,['Microsoft 365 Copilot'],,,,,limited,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft Excel,Microsoft,"Microsoft Excel is the industry leading spreadsheet software program, a powerful data visualization and analysis tool.",,https://www.microsoft.com/en-us/microsoft-365/excel,,,,,['Microsoft 365 Copilot'],,,,,open,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft Outlook,Microsoft,"Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft Office and Microsoft 365 software suites.",,https://www.microsoft.com/en-us/microsoft-365/outlook/email-and-calendar-software-microsoft-outlook,,,,,['Microsoft 365 Copilot'],,,,,open,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft Power Platform,Microsoft,"Microsoft Power Platform is a line of business intelligence, app development, and app connectivity software applications.",,https://powerplatform.microsoft.com/en-us/,,,,,['Microsoft 365 Copilot'],,,,,limited,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft PowerPoint,Microsoft,Microsoft PowerPoint empowers you to create clean slideshow presentations and intricate pitch decks and gives you a powerful presentation maker.,,https://www.microsoft.com/en-us/microsoft-365/powerpoint,,,,,['Microsoft 365 Copilot'],,,,,open,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft Teams,Microsoft,"Microsoft Teams is a proprietary business communication platform developed by Microsoft, as part of the Microsoft 365 family of products.",,https://www.microsoft.com/en-us/microsoft-teams/group-chat-software,,,,,"['Microsoft 365 Copilot', 'Microsoft Business Chat']",,,,,open,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft Word,Microsoft,Microsoft Word is a word processing software developed by Microsoft,,https://www.microsoft.com/en-us/microsoft-365/word,,,,,['Microsoft 365 Copilot'],,,,,open,custom,,,,,,,https://www.microsoft.com/legal/terms-of-use,,,,,,, +application,Microsoft Inside Look,Microsoft,"Inside look is a Microsoft Office feature, composing document insights highlighting key points, expected time to read, and popularity among others. +",,https://support.microsoft.com/en-us/office/see-file-insights-before-you-open-a-file-87a23bbc-a516-42e2-a7b6-0ecb8259e026,,,,,[],,,,unknown,limited,custom,Providing document insights to users.,unknown,unknown,unknown,unknown,Document level insights for users.,https://www.microsoft.com/legal/terms-of-use,unknown,unknown,unknown,,,, +application,Microsoft Suggested Replies,Microsoft,"Suggested replies is a Microsoft Outlook feature that suggests responses to emails, available in: English, Spanish, Italian, French, German, Portuguese Chinese Simplified, Chinese Traditional, Swedish, Russian, Korean, Czech, Hungarian, Arabic, Hebrew, Thai, Turkish, Japanese, Dutch, Norwegian, Danish, and Polish. +",,https://support.microsoft.com/en-us/office/use-suggested-replies-in-outlook-19316194-0434-43ba-a742-6b5890157379,,,,,[],,,,unknown,limited,custom,Suggesting email replies.,unknown,unknown,unknown,unknown,Suggested emails.,https://www.microsoft.com/legal/terms-of-use,unknown,unknown,unknown,,,, +application,Microsoft Security Copilot,Microsoft,"Microsoft Security Copilot is an AI-powered security analysis tool that enables analysts to respond to threats quickly, process signals at machine speed, and assess risk exposure in minutes. +",2023-03-28,https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/,,,,,"['GPT-4', 'Microsoft security-specific model']",,,,"Security Copilot employs a closed-loop learning system that learns from user interactions and feedback, enabling it to provide more coherent, relevant, and useful answers that continually improve over time. Security Copilot is committed to delivering safe, secure, and responsible AI solutions, ensuring that customers' data and AI models are protected with enterprise compliance and security controls. Customer data is owned and controlled by them, and not used to train AI models for anyone outside their organization.",limited,custom,"Security Copilot is designed to enhance the capabilities of cybersecurity professionals. It leverages machine speed and scale to accelerate response to security incidents, discover and process threat signals, and assess risk exposure within minutes.",unknown,,unknown,Security Copilot combines OpenAI's GPT-4 generative AI with a security-specific model from Microsoft. This security-specific model in turn incorporates a growing set of security-specific skills and is informed by Microsoft's unique global threat intelligence and more than 65 trillion daily signals.,"Actionable responses to security-related questions (text and image). Security event, incident or threat reports (PowerPoint slide).",https://www.microsoft.com/legal/terms-of-use,unknown,unknown,unknown,,,, +model,UniLM,Microsoft,UniLM is a unified language model that can be fine-tuned for both natural language understanding and generation tasks.,2019-10-01,https://proceedings.neurips.cc/paper_files/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf,,text; text,"Evaluated on GLUE, SQuAD 2.0, and CoQA benchmarks.",340M parameters (dense),[],unknown,"10,000 steps in 7 hours",8 NVIDIA Tesla V100 32GB GPUs,,open,MIT,,,,,,,,,,,,,, +model,Docugami,Microsoft,Docugami is a LLM focused on writing business documents and data using generative AI.,2021-04-12,https://www.docugami.com/generative-ai,,text; text,,20B parameters (dense),[],unknown,unknown,,,limited,,"analyzing, writing, and connecting business documents and data",,,,,,,,,,,,, +model,BEiT-3,Microsoft,BEiT-3 is a general-purpose multimodal foundation model for vision and vision-language tasks.,2022-08-31,https://arxiv.org/pdf/2208.10442.pdf,,"image, text; image, text","Evaluated on a range of standardized vision benchmarks, and achieves state of the art performance on all experimentally.",1.9B parameters (dense),['Multiway Transformer network'],unknown,,,,open,,,,,,,,,,,,,,, +model,WizardLM,Microsoft,"Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM.",2023-04-24,https://arxiv.org/pdf/2304.12244v1.pdf,https://huggingface.co/WizardLM/WizardLM-13B-1.0,text; text,Reports results on standard LLM benchmarks in comparison to other LLMs and test sets.,7B parameters (dense),"['LLaMA', 'Evol-Instruct', 'Alpaca dataset']",,70 hours on 3 epochs,8 V100 GPUs,,open,Apache 2.0,"Creating large amounts of instruction data, particularly with high complexity",,,https://huggingface.co/datasets/WizardLM/evol_instruct_70k/discussions,,,,,,,,,, +model,WizardCoder,Microsoft,"WizardCoder empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.",2023-08-26,https://arxiv.org/pdf/2306.08568.pdf,https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0,text; text,"Evaluated on four prominent code generation benchmarks HumanEval, HumanEval+, MBPP, and DS100.",34B parameters (dense),"['Evol-Instruct', 'Alpaca dataset', 'StarCoder']",,,,,open,BigCode Open Rail-M,,,,https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0/discussions,,,,,,,,,, +model,Florence-2,Microsoft,"WizardCoder empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.",2023-11-10,https://arxiv.org/pdf/2311.06242.pdf,,"image, text; text",Evaluated on standard image processing benchmarks,771M parameters (dense),['FLD-5B'],unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +dataset,FLD-5B,Microsoft,FLD-5B is the dataset that powers Florence-2,2023-11-10,https://arxiv.org/pdf/2311.06242.pdf,,"image, text",FLD-5B evaluated in comparison to datasets that power other large-scale image models on standard image benchmarks.,1.3B image-text annotations,[],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,OpenOrca,Microsoft,"The OpenOrca dataset is a collection of augmented FLAN Collection data. Currently ~1M GPT-4 completions, and ~3.2M GPT-3.5 completions. It is tabularized in alignment with the distributions presented in the ORCA paper and currently represents a partial completion of the full intended dataset, with ongoing generation to expand its scope.",2023-06-05,https://huggingface.co/datasets/Open-Orca/OpenOrca,,text,Models trained on OpenOrca compared to GPT-series on language benchmarks.,4.5M text queries,"['GPT-3.5', 'GPT-4', 'Flan Collection']",,,,,open,MIT,training and evaluation in the field of natural language processing.,,,,,,,,,,https://huggingface.co/datasets/Open-Orca/OpenOrca,[],, +model,LlongOrca,Microsoft,LlongOrca is an attempt to make OpenOrca able to function in a Llong context.,2023-08-01,https://huggingface.co/Open-Orca/LlongOrca-7B-16k,https://huggingface.co/Open-Orca/LlongOrca-7B-16k,text; text,LlongOrca evaluated on BigBench-Hard and AGIEval results.,7B parameters (dense),"['OpenOrca', 'LLongMA-2']",unknown,37 hours,8x A6000-48GB (first-gen) GPUs,,open,LLaMA 2,training and evaluation in the field of natural language processing.,,,https://huggingface.co/Open-Orca/LlongOrca-7B-16k/discussions,,,,,,,,,, +model,Phi-1.5,Microsoft,Phi-1.5 is a large language transformer model.,2023-09-11,https://arxiv.org/pdf/2309.05463.pdf,https://huggingface.co/microsoft/phi-1_5,text; text,"Evaluated on common sense reasoning, language understanding, and multi-step reasoning compared to other SOTA language models.",1.3B parameters (dense),['phi-1'],unknown,8 days,32 A100-40G GPUs,generic web-crawl data is removed from dataset.,open,MIT,"Phi-1.5 is best suited for answering prompts using the QA format, the chat format, and the code format.",,,https://huggingface.co/microsoft/phi-1_5/discussions,,,,,,,,,, +model,Orca 2,Microsoft,Orca 2 is a finetuned version of LLAMA-2 for research purposes.,2023-11-21,https://arxiv.org/pdf/2311.11045.pdf,https://huggingface.co/microsoft/Orca-2-13b,text; text,Orca 2 has been evaluated on a large number of tasks ranging from reasoning to grounding and safety.,13B parameters (dense),['LLaMA 2'],unknown,80 hours,32 NVIDIA A100 80GB GPUs,,open,custom,Orca 2 is built for research purposes only. The main purpose is to allow the research community to assess its abilities and to provide a foundation for building better frontier models.,Any purposes other than research.,unknown,https://huggingface.co/microsoft/Orca-2-13b/discussions,,,,,,,,,, +model,Phi-3,Microsoft,"Phi-3 is a 14 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.",2024-05-21,https://arxiv.org/abs/2404.14219,https://huggingface.co/microsoft/Phi-3-medium-128k-instruct,text; text,"The model has been evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning. The Phi-3 Medium-128K-Instruct demonstrated robust and state-of-the-art performance.",14B parameters,[],unknown,unknown,unknown,The model underwent post-training processes viz. supervised fine-tuning and direct preference optimization to increase its capability in following instructions and aligning to safety measures.,open,MIT,The model's primary use cases are for commercial and research purposes that require capable reasoning in memory or compute constrained environments and latency-bound scenarios. It can also serve as a building block for generative AI-powered features.,"The model should not be used for high-risk scenarios without adequate evaluation and mitigation techniques for accuracy, safety, and fairness.","Issues like allocation, high-risk scenarios, misinformation, generation of harmful content and misuse should be monitored and addressed.",https://huggingface.co/microsoft/Phi-3-medium-128k-instruct/discussions,,,,,,,,,, +model,Aurora,Microsoft,Aurora is a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data.,2024-05-28,https://arxiv.org/pdf/2405.13063,,text; climate forecasts,Evaluated by comparing climate predictions to actual happened events.,1.3B parameters,[],unknown,unknown,32 A100 GPUs,,closed,unknown,,,,,,,,,,,,,, +model,Prov-GigaPath,Microsoft,Prov-GigaPath is a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles.,2024-05-22,https://www.nature.com/articles/s41586-024-07441-w,,image; embeddings,"Evaluated on a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, with Prov-GigaPath demonstrating SoTA performance in 25 out of 26 tasks.",unknown,['GigaPath'],unknown,2 days,4 80GB A100 GPUs,,closed,unknown,,,,,,,,,,,,,, +model,InternVideo,Shanghai AI Laboratory,,2022-12-06,https://arxiv.org/pdf/2212.03191.pdf,,"text, video; video",,1.3B parameters (dense),"['Kinetics-400', 'WebVid-2M', 'WebVid-10M', 'HowTo100M', 'AVA', 'Something-Something-v2', 'Kinetics-710']",,,,,open,Apache 2.0,,,,,,,,,,,,,, +model,Lego-MT,Shanghai AI Laboratory,Lego-MT is a multilingual large language model which uses a more efficient approach of being an effective detachable model.,2023-05-29,https://arxiv.org/pdf/2212.10551.pdf,,text; text,Evaluated based on own constructed dataset covering 433 languages.,1.2B parameters (dense),['OPUS'],unknown,15 days,32 A100 GPUs,,open,,,,,,,,,,,,,,, +model,MathCoder,Shanghai AI Laboratory,MathCoder is a family of models capable of generating code-based solutions for solving challenging math problems.,2023-10-05,https://arxiv.org/pdf/2310.03731.pdf,,text; text,Evaluated on GSM8K and the competition-level MATH dataset.,70B parameters (dense),"['GPT-4', 'LLaMA 2']",unknown,unknown,32 NVIDIA A800 80GB GPUs,,open,unknown,bridging the gap between natural language understanding and computational problem-solving,,,,,,,,,,,,, +model,InternLM,Shanghai AI Laboratory,"InternLM is a high-quality language model proficient in English, Chinese, and code.",2023-09-20,https://github.com/InternLM/InternLM,https://huggingface.co/internlm/internlm-20b,"code, text; code, text",Evaluated in comparison to LLaMA series models on standard benchmarks.,20B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,,,,https://huggingface.co/internlm/internlm-20b/discussions,,,,,,,,,, +model,InternVideo2,"Shanghai AI Laboratory, Nanjing University, Zhejiang University","InternVideo2 is a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.",2024-03-22,https://github.com/OpenGVLab/InternVideo2,,"text, video; text","Evaluated across a range of video-related tasks and compared to other multimodal models like CLIP, VideoPrism, and VideoCoCa. InternVideo 2 generally performs among the best of such models on these benchmarks.",6B parameters,"['InternVL', 'VideoMAEv2', 'LAION', 'WebVid', 'InternVid', 'LLaVA', 'KMash']",unknown,35 days,"256 NVIDIA A100 GPUs for 32 days, and 64 GPUs for 3 days",,open,MIT,,,unknown,,,,,,,,,,, +model,CosmicMan,Shanghai AI Laboratory,"CosmicMan is a text-to-image foundation model specialized for generating high-fidelity human images with meticulous appearance, reasonable structure, and precise text-image alignment.",2024-04-01,https://cosmicman-cvpr2024.github.io/,,text; image,The model was compared with SOTAs and has shown good performance in generating high-quality human images.,unknown,['CosmicMan-HQ 1.0'],unknown,1 week,32 80G NVIDIA A100 GPUs,The quality control measures taken include modeling the relationship between dense text descriptions and image pixels in a decomposed manner and enforcing attention refocusing without adding extra modules.,open,unknown,"The model is intended to generate high-quality, photorealistic human images from text descriptions. Applications include avatar generation and potentially virtual reality and video game character creation.",unknown,unknown,unknown,,,,,,,,,, +dataset,CosmicMan-HQ 1.0,Shanghai AI Laboratory,"CosmicMan-HQ 1.0 is a large-scale dataset with 6 million high-quality, real-world human images.",2024-04-28,https://arxiv.org/pdf/2404.01294,,image,"Compared to other human image datasets on data quantity, image quality, and annotations.",6 million images,[],,,,unknown,open,unknown,,,unknown,,,,,,,,,[],, +model,Yi,01 AI,The Yi series models are large language models trained from scratch by developers at 01 AI.,2023-11-02,https://github.com/01-ai/Yi,https://huggingface.co/01-ai/Yi-34B,text; text,"Evaluated on standard language benchmarks, common sense reasoning, and reading comprehension in comparison to SoTA LLMs.",34B parameters (dense),[],unknown,unknown,unknown,"Model underwent supervised fine-tuning, leading to a greater diversity of responses.",open,custom,,,unknown,https://huggingface.co/01-ai/Yi-34B/discussions,,,,,,,,,, +model,Yi-VL,01 AI,"The Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.",2024-01-23,https://github.com/01-ai/Yi,https://huggingface.co/01-ai/Yi-VL-34B,text; text,"Yi-VL outperforms all existing open-source models in MMMU and CMMMU, two advanced benchmarks that include massive multi-discipline multimodal questions (based on data available up to January 2024).",34B parameters (dense),[],unknown,10 days,128 NVIDIA A800 (80G) GPUs,unknown,open,custom,,,unknown,https://huggingface.co/01-ai/Yi-VL-34B/discussions,,,,,,,,,, +dataset,10k_prompts_ranked,Data is Better Together,"10k_prompts_ranked is a dataset of prompts with quality rankings created by 314 members of the open-source ML community using Argilla, an open-source tool to label data.",2024-02-27,https://huggingface.co/blog/community-datasets,,text,,10k examples,[],,,,,open,unknown,Training and evaluating language models on prompt ranking tasks and as a dataset that can be filtered only to include high-quality prompts. These can serve as seed data for generating synthetic prompts and generations.,"This dataset only contains rankings for prompts, not prompt/response pairs so it is not suitable for direct use for supervised fine-tuning of language models.",,https://huggingface.co/datasets/DIBT/10k_prompts_ranked/discussions,,,,,,,https://huggingface.co/datasets/DIBT/10k_prompts_ranked,[],, +model,DeciLM,Deci,DeciLM is a LLM that on release ranks as the fastest and most accurate model of its size.,2023-12-12,https://deci.ai/blog/introducing-decilm-7b-the-fastest-and-most-accurate-7b-large-language-model-to-date,https://deci.ai/model-zoo/decilm-7b/,text; text,"Evaluated on the OpenLLM benchmarks and, on release, outperforms all other 7B models on the OpenLLM Leaderboard.",7B parameters (dense),[],unknown,unknown,NVIDIA A10 GPUs,,open,Apache 2.0,This model is intended for commercial and research use in English and can be fine-tuned for use in other languages.,,unknown,,,,,,,,,,, +model,Taiyi Diffusion XL,"International Digital Economy Academy, South China University of Technology, University of Science and Technology of China",Taiyi Diffusion XL is a new Chinese and English bilingual text-to-image model which is developed by extending the capabilities of CLIP and Stable-DiffusionXL.,2024-01-26,https://arxiv.org/pdf/2401.14688.pdf,https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B,text; image,Evaluated on human and machine benchmarks in comparison to established image models as a baseline.,3.5B parameters (dense),"['CLIP', 'LAION-400M', 'Wukong', 'Stable Diffusion XL']",unknown,unknown,unknown,unknown,open,Apache 2.0,,,unknown,https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B/discussions,,,,,,,,,, +model,JARVIS-1,Peking University Institute for Artificial Intelligence,"JARVIS-1 is an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe.",2023-11-10,https://arxiv.org/pdf/2311.05997.pdf,,text; in-game actions,"Compared with other multi-task, instruction-following agents.",unknown,[],unknown,unknown,unknown,,open,unknown,,,,,,,,,,,,,, +model,Skywork,Kunlun Inc.,The Skywork series is a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts.,2023-10-30,https://arxiv.org/pdf/2310.19341.pdf,https://huggingface.co/Skywork/Skywork-13B-base,text; text,Evaluated on several popular benchmarks and performance in different fields.,13B parameters (dense),['SkyPile'],unknown,39 days,512 A800-80GB GPUs,,open,custom,,,,https://huggingface.co/Skywork/Skywork-13B-base/discussions,,,,,,,,,, +model,Orion,OrionStarAI,Orion series models are open-source multilingual large language models trained from scratch by OrionStarAI.,2024-01-20,https://github.com/OrionStarAI/Orion,https://huggingface.co/OrionStarAI/Orion-14B-Base,text; text,Evaluated on multilingual and NLP benchmarks in comparison with SoTA models of comparable size.,14B parameters (dense),[],unknown,unknown,unknown,unknown,open,custom,,,unknown,https://huggingface.co/OrionStarAI/Orion-14B-Base/discussions,,,,,,,,,, +application,Continue,"Continue Dev, Inc.",Continue is the open-source autopilot for software development. It is an IDE extension that brings the power of ChatGPT to VS Code and JetBrains. It’s built to be deeply customizable and continuously learn from development data.,2023-07-26,https://continue.dev,,,,,"['GPT-4 API', 'Code Llama', 'Claude API', 'WizardCoder', 'PaLM API']",,,,,open,Apache 2.0,,,,,,,https://github.com/continuedev/continue/blob/main/LICENSE,,,,,,, +model,ESM-2,Meta,ESM-2 is a series of protein language models trained on protein sequences,2022-10-31,https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2.full.pdf+html,,text; protein sequence,,15B parameters (dense),"['UniRef50', 'UniRef90']",,,,,open,MIT,,,,,,,,,,,,,, +dataset,PMD,Meta,PMD (Public Multimodal Datasets) is a collection of image-text datasets introduced in the FLAVA work.,2021-12-08,https://arxiv.org/abs/2112.04482,,"image, text",,70M,"['COCO', 'YFCC100M', 'SBU Captions', 'Localized Narratives', 'Visual Genome', 'Wikipedia', 'Conceptual Captions', 'Red Caps']",,,,"Beyond filtering mentioned in excluded, nothing further is done.",closed,unknown,unknown,unknown,,,,,,,,,,[],,YFCC100M is filtered for non-English captions and very short (< 2 word) captions. +model,FLAVA,Meta,"FLAVA is a multimodal model composed of an image encoder, text encoder, and multimodal encoder.",2021-12-08,https://arxiv.org/abs/2112.04482,https://huggingface.co/facebook/flava-full,"image, text","FLAVA is benchmarked on a range of vision-only (e.g. CIFAR-10), language-only (e.g. GLUE), and multimodal (e.g. Hateful Memes) standard evaluations.",306M,['PMD'],unknown,unknown,unknown,"FLAVA introduces a variety of new modeling techniques, specifically with an interest in improved text-image alignment through contrastive objectives.",open,BSD-3-Clause,"Per the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full), ""The model is intended to serve as a reproducible research artifact for research communities in the light of models whose exact reproduction details are never released such as CLIP and SimVLM."" ","Per the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full), ""Any deployed use case of the model - whether commercial or not"" - is currently out of scope. -",,https://huggingface.co/facebook/flava-full/discussions,https://huggingface.co/facebook/flava-full,unknown,unknown,unknown,,,,,, -dataset,The Galactica Corpus,Meta,The Galactica Corpus is a collection of scientific datasets introduced in the Galactica work.,2022-11-15,https://galactica.org/static/paper.pdf,,text,106B tokens,[],,"['CommonCrawl', 'Wikipedia', 'arXiv']",Prompts and reasoning data is explicitly included to improve model capabilities derived from this data.,,,closed,unknown,unknown,unknown,,,,,,,,,,,, -model,Galactica,Meta,Galactica is a family of autoregressive language models.,2022-11-15,https://galactica.org/static/paper.pdf,,"code, text; code, text",120B parameters (dense),,,['The Galactica Corpus'],,,,open,CC BY-NC 4.0,,,,,https://huggingface.co/facebook/galactica-6.7b,unknown,unknown,Meta AI Cluster. Trained on 1024 80GB A100 GPUs (128 8xA100 80GB nodes),,,,,, -model,InCoder,"Meta, CMU, TTI-Chicago, UC Berkeley, University of Washington",InCoder is a language model trained on code with a causal masking objective,2022-04-12,https://arxiv.org/abs/2204.05999,,text; code,6B parameters (dense),,,[],,,unknown,open,CC BY-NC 4.0,,,,,,Unknown,"24 days, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)","248 V100 GPUs, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)",,,,,, -model,OPT,Meta,OPT is a family of autoregressive language models.,2022-05-01,https://arxiv.org/abs/2205.01068,,text; text,175B parameters (dense),,,"['RoBERTa dataset', 'The Pile', 'PushShift.io Reddit']",,,,limited,OPT-175B License,,,,,https://arxiv.org/pdf/2205.01068.pdf,75 tCO2e,,Meta AI cluster. Trained on 992 80GB A100 GPUs,,,,,, +",,https://huggingface.co/facebook/flava-full/discussions,,,,,,,,,, +dataset,The Galactica Corpus,Meta,The Galactica Corpus is a collection of scientific datasets introduced in the Galactica work.,2022-11-15,https://galactica.org/static/paper.pdf,,text,,106B tokens,"['CommonCrawl', 'Wikipedia', 'arXiv']",,,,,closed,unknown,unknown,unknown,,,,,,,,,,[],Prompts and reasoning data is explicitly included to improve model capabilities derived from this data., +model,Galactica,Meta,Galactica is a family of autoregressive language models.,2022-11-15,https://galactica.org/static/paper.pdf,https://huggingface.co/facebook/galactica-6.7b,"code, text; code, text",,120B parameters (dense),['The Galactica Corpus'],unknown,unknown,Meta AI Cluster. Trained on 1024 80GB A100 GPUs (128 8xA100 80GB nodes),,open,CC BY-NC 4.0,,,,,,,,,,,,,, +model,InCoder,"Meta, CMU, TTI-Chicago, UC Berkeley, University of Washington",InCoder is a language model trained on code with a causal masking objective,2022-04-12,https://arxiv.org/abs/2204.05999,,text; code,,6B parameters (dense),[],Unknown,"24 days, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)","248 V100 GPUs, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)",unknown,open,CC BY-NC 4.0,,,,,,,,,,,,,, +model,OPT,Meta,OPT is a family of autoregressive language models.,2022-05-01,https://arxiv.org/abs/2205.01068,https://arxiv.org/pdf/2205.01068.pdf,text; text,,175B parameters (dense),"['RoBERTa dataset', 'The Pile', 'PushShift.io Reddit']",75 tCO2e,,Meta AI cluster. Trained on 992 80GB A100 GPUs,,limited,OPT-175B License,,,,,,,,,,,,,, dataset,Make-A-Video dataset,Meta,"The Make-A-Video dataset is the dataset used to train Make-A-Video, which includes both image-text and video-only datasets with specific and significant filtering. -",2022-09-29,https://arxiv.org/pdf/2209.14792.pdf,,"image, text, video","20M video clips, 2.3B image-text pairs",[],,"['LAION-5B', 'WebVid-10M', 'HD-VILA-100M']",,"The LAION-5B dataset is filtered to 2.3B by removing NSFW images using [https://github.com/GantMan/nsfw](https://github.com/GantMan/nsfw), toxic words in text, and images with watermark probability > 0.5. The HD-VILA-100M is randomly subsampled to 10M video clips. -","The authors exclude NSFW, toxic, and likely watermarked data from LAION-5B. -",limited,,unknown,unknown,unknown,,,,,,,,,,, +",2022-09-29,https://arxiv.org/pdf/2209.14792.pdf,,"image, text, video",,"20M video clips, 2.3B image-text pairs","['LAION-5B', 'WebVid-10M', 'HD-VILA-100M']",,,,"The authors exclude NSFW, toxic, and likely watermarked data from LAION-5B. +",limited,,unknown,unknown,unknown,,,,,,,,,[],,"The LAION-5B dataset is filtered to 2.3B by removing NSFW images using [https://github.com/GantMan/nsfw](https://github.com/GantMan/nsfw), toxic words in text, and images with watermark probability > 0.5. The HD-VILA-100M is randomly subsampled to 10M video clips. +" model,Make-A-Video,Meta,"Make-A-Video is a model for Text-to-Video Generation without Text-Video Data. -",2022-09-29,https://arxiv.org/pdf/2209.14792.pdf,,text; video,unknown,,"Model performance was evaluated using automated (Frechet Video Distance; Frechet Inception Distance) and human evaluation on two datasets (UCF-101, MSR-VTT) in the zero-shot setting. -",['Make-A-Video dataset'],,,,closed,,unknown,unknown,unknown,,,unknown,unknown,unknown,,,,,, -model,LLaMA,Meta,"LLaMA is a collection of foundation language models ranging from 7B to 65B parameters trained our on trillions of tokens. The LLaMA models show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets.",2023-02-24,https://arxiv.org/abs/2302.13971,,text; text,65B parameters (dense),,,"['CommonCrawl', 'C4', 'Github', 'Wikipedia', 'BooksCorpus', 'arXiv', 'StackExchange']",,,,open,"LLaMa License (model weights), GPLv3 (code)",,,,,,,,,,,,,, -model,Llama 2,Meta,Llama 2 is an updated version of LLaMA trained on a new mix of publicly available data.,2023-07-18,https://ai.meta.com/resources/models-and-libraries/llama/,,text; text,70B parameters (dense),,Evaluated on standard academic benchmarks and internal Meta libraries.,[],,,,open,custom,"Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.",Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.,,,Can be found at appendix of paper at https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/,539 tCO2eq,,NVIDIA A100-80GB GPUs (TDP of 350-400W),,,,,, -model,OPT-IML,Meta,,2022-12-22,https://arxiv.org/abs/2212.12017,,text; text,175B parameters (dense),,,"['OPT', 'OPT-IML Bench']",,,,open,OPT-IML 175B License,,,,,,,,,,,,,, +",2022-09-29,https://arxiv.org/pdf/2209.14792.pdf,,text; video,"Model performance was evaluated using automated (Frechet Video Distance; Frechet Inception Distance) and human evaluation on two datasets (UCF-101, MSR-VTT) in the zero-shot setting. +",unknown,['Make-A-Video dataset'],unknown,unknown,unknown,,closed,,unknown,unknown,unknown,,,,,,,,,,, +model,LLaMA,Meta,"LLaMA is a collection of foundation language models ranging from 7B to 65B parameters trained our on trillions of tokens. The LLaMA models show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets.",2023-02-24,https://arxiv.org/abs/2302.13971,,text; text,,65B parameters (dense),"['CommonCrawl', 'C4', 'Github', 'Wikipedia', 'BooksCorpus', 'arXiv', 'StackExchange']",,,,,open,"LLaMa License (model weights), GPLv3 (code)",,,,,,,,,,,,,, +model,Llama 2,Meta,Llama 2 is an updated version of LLaMA trained on a new mix of publicly available data.,2023-07-18,https://ai.meta.com/resources/models-and-libraries/llama/,Can be found at appendix of paper at https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/,text; text,Evaluated on standard academic benchmarks and internal Meta libraries.,70B parameters (dense),[],539 tCO2eq,,NVIDIA A100-80GB GPUs (TDP of 350-400W),,open,custom,"Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.",Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.,,,,,,,,,,,, +model,OPT-IML,Meta,,2022-12-22,https://arxiv.org/abs/2212.12017,,text; text,,175B parameters (dense),"['OPT', 'OPT-IML Bench']",,,,,open,OPT-IML 175B License,,,,,,,,,,,,,, dataset,SA-1B,Meta,"SA-1B (Segment Anything 1 Billion) is a dataset designed for training general-purpose object segmentation models from open world images. It consists of 11M diverse, high-resolution, privacy protecting images and 1.1B high-quality segmentation masks. -",2023-04-05,https://ai.facebook.com/datasets/segment-anything/,https://arxiv.org/pdf/2304.02643.pdf#page=25,image,"11M images, 1.1B mask annotations",[],,[],"SA-1B consists of 11M diverse, high-resolution (averaging 1500×2250 pixels), and privacy protecting images collected and licensed from a third party photo company. The images are photos taken from a camera, i.e. not artwork. The images vary in subject matter. Common themes of the images include: locations, objects, scenes. The dataset includes 1.1B high-quality segmentation masks collected with the Segment Anything Data Engine. SA-1B only includes automatically generated masks (99.1%), as the authors conclude after experiments that the automatic masks are high quality and effective for training models. The masks range from large scale objects such as buildings to fine grained details such as door handles. Masks are provided in the COCO run-length encoding (RLE) annotation format. -","""We withheld ~2k randomly selected images for testing purposes."" ""Each image is accompanied by a short caption that describes the content and place of the photo in a free form text. Per our agreement with the photo provider we are not allowed to release these captions."" -","- Dataset quality: +",2023-04-05,https://ai.facebook.com/datasets/segment-anything/,,image,,"11M images, 1.1B mask annotations",[],,,,"- Dataset quality: Due to potential accessibility and storage challenges, the original high-resolution images (averaging 3300×4950 pixels) were downsampled to an average resolution of 1500×2250 pixels. Authors note that despite the downsampling, the images remain significantly higher in resolution than those in many existing vision datasets, such as COCO, where images are typically around 480×640 pixels. The images were processed to blur faces and license plates to protect the identities of those in the image. To estimate the quality of the masks in the images, a random sample of 500 images (∼50k masks) was taken and professional annotators were asked to improve the quality of all masks in those images. @@ -631,289 +819,113 @@ subject taken close together in time. The dataset contains scenes of protests, or other gatherings that may suggest religious beliefs, political opinions or union memberships that may be offensive. ","The dataset will be hosted at https://ai.facebook.com/datasets/segment-anything and maintained by Meta AI. ""If a user observes objectionable image(s) in the dataset, we invite them to report the image(s) at segment-anything at meta.com for removal"" ""To aid reproducibility of research using SA-1B, the only updates (to the dataset) will be to remove reported images."" ""We encourage users to gather further annotations for SA-1B. Any users who generate annotations will be liable for hosting and distributing their annotations."" -",Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/) or by emailing at segment-anything at meta.com.,,,,,,,,,, -model,SAM,Meta,"SAM (Segment Anything Model) is a foundation model for image segmentation. The model is designed and trained to be promptable, and supports flexible prompts (point, box, mask and free-form text) to compute masks in real-time to allow interactive use.",2023-04-05,https://arxiv.org/pdf/2304.02643.pdf,,"image, text; image",unknown,,"""We extensively evaluate SAM. First, using a diverse new suite of 23 segmentation datasets, we find that SAM produces high-quality masks from a single foreground point, often only slightly below that of the manually annotated ground truth. Second, we find consistently strong quantitative and qualitative results on a variety of downstream tasks under a zero-shot transfer protocol using prompt engineering, including edge detection, object proposal generation, instance segmentation, and a preliminary exploration of text-to-mask prediction."" -",['SA-1B'],,,"""We perform a Responsible AI (RAI) analysis of our work by investigating potential fairness concerns and biases when using SA-1B and SAM. We focus on the geographic and income distribution of SA-1B and fairness of SAM across protected attributes of people."" +",Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/) or by emailing at segment-anything at meta.com.,,,,,,,https://arxiv.org/pdf/2304.02643.pdf#page=25,[],"SA-1B consists of 11M diverse, high-resolution (averaging 1500×2250 pixels), and privacy protecting images collected and licensed from a third party photo company. The images are photos taken from a camera, i.e. not artwork. The images vary in subject matter. Common themes of the images include: locations, objects, scenes. The dataset includes 1.1B high-quality segmentation masks collected with the Segment Anything Data Engine. SA-1B only includes automatically generated masks (99.1%), as the authors conclude after experiments that the automatic masks are high quality and effective for training models. The masks range from large scale objects such as buildings to fine grained details such as door handles. Masks are provided in the COCO run-length encoding (RLE) annotation format. +","""We withheld ~2k randomly selected images for testing purposes."" ""Each image is accompanied by a short caption that describes the content and place of the photo in a free form text. Per our agreement with the photo provider we are not allowed to release these captions."" +" +model,SAM,Meta,"SAM (Segment Anything Model) is a foundation model for image segmentation. The model is designed and trained to be promptable, and supports flexible prompts (point, box, mask and free-form text) to compute masks in real-time to allow interactive use.",2023-04-05,https://arxiv.org/pdf/2304.02643.pdf,https://arxiv.org/pdf/2304.02643.pdf#page=28,"image, text; image","""We extensively evaluate SAM. First, using a diverse new suite of 23 segmentation datasets, we find that SAM produces high-quality masks from a single foreground point, often only slightly below that of the manually annotated ground truth. Second, we find consistently strong quantitative and qualitative results on a variety of downstream tasks under a zero-shot transfer protocol using prompt engineering, including edge detection, object proposal generation, instance segmentation, and a preliminary exploration of text-to-mask prediction."" +",unknown,['SA-1B'],2.8 metric tons of carbon dioxide,68 hours,256 A100 GPUs,"""We perform a Responsible AI (RAI) analysis of our work by investigating potential fairness concerns and biases when using SA-1B and SAM. We focus on the geographic and income distribution of SA-1B and fairness of SAM across protected attributes of people."" ",open,Apache 2.0,"""SAM is intended to be used for any prompt-based segmentation task. We explored its use in segmenting objects from a point, edge detection, segmenting all objects, and segmenting detected objects. We explored how SAM can integrate with other vision models to segment objects from text."" ","For out-of-scope use cases see terms of use in [[LICENSE]](https://github.com/facebookresearch/segment-anything/blob/main/LICENSE). Authors also discuss the following limitations of the model: ""While SAM performs well in general, it is not perfect. It can miss fine structures, hallucinates small disconnected components at times, and does not produce boundaries as crisply as more computationally intensive methods that “zoom-in”, e.g. [18]. In general, we expect dedicated interactive segmentation methods to outperform SAM when many points are provided, e.g. [67]. Unlike these methods, SAM is designed for generality and breadth of use rather than high IoU interactive segmentation. Moreover, SAM can process prompts in real-time, but nevertheless SAM's overall performance is not real-time when using a heavy image encoder. Our foray into the text-to-mask task is exploratory and not entirely robust, although we believe it can be improved with more effort. While SAM can perform many tasks, it is unclear how to design simple prompts that implement semantic and panoptic segmentation. Finally, there are domain-specific tools, such as [7], that we expect to outperform SAM in their respective domains."" -",,Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/) or by emailing at segment-anything at meta.com.,https://arxiv.org/pdf/2304.02643.pdf#page=28,2.8 metric tons of carbon dioxide,68 hours,256 A100 GPUs,,,,,, -model,Voicebox,Meta,Voicebox is the first generative AI model for speech to generalize across tasks with state-of-the-art performance.,2023-06-16,https://research.facebook.com/publications/voicebox-text-guided-multilingual-universal-speech-generation-at-scale/,,audio; text,330M parameters (dense),,"Evaluated on zero-shot text-to-speech benchmarks, with Voicebox outperforming the current state-of-the-art English model VALL-E.",[],,,,closed,,,,,,,unknown,"750,000 iterations",32 GPUs of unspecified type,,,,,, -model,PEER,Meta,"PEER is a collaborative language model that is trained to imitate the entire writing process itself. PEER can write drafts, add suggestions, propose edits and provide explanations for its actions.",2022-08-24,https://arxiv.org/pdf/2208.11663.pdf,,text; text,3B parameters (dense),,"PEER is evaluated on core research questions intended to gauge language understanding, proper use of citations, instruction following, and iterative use.",[],,,"Heuristics and edit filtering was used on data set, which consisted mostly of Wikipedia pages.",open,,adapting LLMs to work with collaborative writing and updating.,,,,,,,64 GPUs,,,,,, -model,MusicGen,Meta,MusicGen is a simple and controllable model for music generation that doesn't require self-supervised semantic representation,2023-08-02,https://huggingface.co/spaces/facebook/MusicGen/tree/main,,"audio, text; audio, text",3.3B parameters (dense),,"MusicGen was evaluated on standard music benchmarks of Frechet Audio Distance, Kullback-Leibler Divergence, and its CLAP score.","['Meta Music Initative Sound Collection', 'Shutterstock music collection', 'Pond5 music collection']",,,,open,MIT,The primary use of MusicGen is research on AI-based music generation,"The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.",,https://huggingface.co/spaces/facebook/MusicGen/discussions,https://github.com/facebookresearch/audiocraft/blob/main/model_cards/MUSICGEN_MODEL_CARD.md,,,,,,,,, -model,AudioGen,Meta,AudioGen is an auto-regressive generative model that generates audio samples conditioned on text inputs,2023-08-02,https://felixkreuk.github.io/audiogen/paper.pdf,,"audio, text; audio, text",1.5B parameters (dense),,Evaluated on Frechet Audio Distance and Kullback-Leibler Divergence as well as qualitative studies with human participants.,"['AudioSet', 'BBC sound effects', 'AudioCaps', 'Clotho v2', 'VGG-Sound', 'FSD50K', 'Free To Use Sounds', 'Sonniss Game Effects', 'WeSoundEffects', 'Paramount Motion - Odeon Cinematic Sound Effects']",,,,open,MIT,The primary use of AudioGen is research on AI-based audio generation.,"The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate audio pieces that create hostile or alienating environments for people. This includes generating audio that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.",,https://huggingface.co/facebook/audiogen-medium/discussions,https://github.com/facebookresearch/audiocraft/blob/main/model_cards/AUDIOGEN_MODEL_CARD.md,,,,,,,,, -model,Emu,Meta,Emu is a pre-trained latent diffusion model on 1.1 billion image-text pairs and fine-tuned with only a few thousand carefully selected high-quality images.,2023-09-27,https://ai.meta.com/research/publications/emu-enhancing-image-generation-models-using-photogenic-needles-in-a-haystack/,,text; image,1.5B parameters (dense),,Emu significantly outperforms a publicly available state-of-the-art model SDXLv1.0 on visual appeal when compared on standard benchmarks.,"['CLIP', 'T5']",,,,closed,unknown,,,,,,,,,,,,,, -model,Code LLaMA,Meta,Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters.,2023-08-24,https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/,,"text; code, text",34B parameters (dense),,Evaluated on several code benchmarks like HumanEval and MBPP.,['Llama 2'],,,,open,Llama 2,Code Llama and its variants is intended for commercial and research use in English and relevant programming languages.,Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants.,,https://huggingface.co/allenai/codetulu-2-13b/discussions,https://huggingface.co/codellama/CodeLlama-34b-hf,65.3 tCO2eq,400K GPU hours,A100-80GB GPUs,,,,,, -model,Emu Video,Meta,"Emu Video is a text-to-video generation model that factorizes the generation into two steps, first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.",2023-11-16,https://emu-video.metademolab.com/,,text; video,6B parameters (dense),,Analyzed against nearest neighbor model baseline and by extending the video length.,"['Emu', 'CLIP', 'T5']",,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,Emu Edit,Meta,Emu Edit is a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.,2023-11-16,https://emu-edit.metademolab.com/,,text; image,unknown,,Evaluated on test set of actions in comparison to SoTA image editing models.,"['Emu', 'CLIP', 'T5']",,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,MetaCLIP,Meta,MetaCLIP is a more transparent rendition of CLIP that aims to reveal CLIP's training data curation methods.,2023-10-02,https://arxiv.org/pdf/2103.00020.pdf,,text; text,unknown,,Evaluated in comparison to CLIP.,['Common Crawl'],,,,open,CC-BY-NC-4.0,,,,,https://huggingface.co/facebook/metaclip-b32-400m,unknown,unknown,unknown,,,,,, -model,Llama 3,Meta,Llama 3 is the third generation of Meta AI's open-source large language model. It comes with pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases.,2024-04-18,https://llama.meta.com/llama3/,,text; text,70B parameters,,"The models were evaluated based on their performance on standard benchmarks and real-world scenarios. These evaluations were performed using a high-quality human evaluation set containing 1,800 prompts covering multiple use cases. The models also went through red-teaming for safety, where human experts and automated methods were used to generate adversarial prompts to test for problematic responses.",[],,,"Extensive internal and external testing for safety, and design of new trust and safety tools.",open,Llama 3,"Llama 3 is intended for a broad range of use cases, including AI assistance, content creation, learning, and analysis.",unknown,Extensive internal and external performance evaluation and red-teaming approach for safety testing.,"Feedback is encouraged from users to improve the model, but the feedback mechanism is not explicitly described.",https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md,unknown,unknown,2 custom-built Meta 24K GPU clusters,,,,,, -application,HyperWrite,OthersideAI,"HyperWrite is a writing assistant that generates text based on a user's request, as well as style and tone choices. -",,https://hyperwriteai.com/,,,,,,['OpenAI API'],,,unknown,limited,custom,"HyperWrite is intended to be used as a writing assistant. -",unknown,unknown,unknown,,,,,unknown,Generation,https://hyperwriteai.com/terms,unknown,unknown,unknown -model,Midm,KT Corporation,Midm is a pre-trained Korean-English language model developed by KT. It takes text as input and creates text. The model is based on Transformer architecture for an auto-regressive language model.,2023-10-31,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1,,text; text,7B parameters,,unknown,"['AI-HUB dataset', 'National Institute of Korean Language dataset']",,,"KT tried to remove unethical expressions such as profanity, slang, prejudice, and discrimination from training data.",open,CC-BY-NC 4.0,It is expected to be used for various research purposes.,It cannot be used for commercial purposes.,unknown,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1/discussions,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1,unknown,unknown,unknown,,,,,, -dataset,Anthropic Helpfulness dataset,Anthropic,"One of the datasets used to train Anthropic RLHF models. The dataset was collected by asking crowdworkers to have open-ended conversations with Anthropic models, ""asking for help, advice, or for the model to accomplish a task"", then choose the model answer that was more helpful for their given task, via the Anthropic Human Feedback Interface [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2). -",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,text,271.5 MB,"[{'chosen': '\n\nHuman: I\'m bored to death. Help.\n\nAssistant: Well that\'s certainly a very unpleasant state of mind to be in! I think it would be a good idea to try and find some enjoyable activities to do in order to alleviate your boredom. Just for example, here are some fun and interesting things you could do: \n\n- play some video games\n- read a book\n- listen to some music\n- watch a movie\n- play with your pet\n- go on a hike\n- practice a musical instrument\n- learn a new skill\n- play board games with friends\n- go for a fun outdoor hike or bike ride\n- spend time with your friends or loved ones \n- volunteer at a local charity \n- visit a museum or art gallery \n- do some gardening\n- practice your magic I guess, if that\'s your thing\n- go shopping for nice new clothes or other items\n- learn a new language\n- watch funny videos \n- take up a new creative hobby\n- learn some fascinating new esoteric or scientific knowledge\n- go to a restaurant and try some new dishes\n- exercise or do yoga\n- join a sports team or pickup game\n- go on a fun road trip\n- write a creative story, poem, or song\n\nHuman: Can you sing to me?\n\nAssistant: Sure, here are some of my favorite songs: \n\n- ""Rainy Days and Mondays"" by The Carpenters \n- ""My Love"" by Justin Beiber \n- ""We Didn\'t Start the Fire"" by Billy Joel \n- ""Tattoo"" by J Cole \n- ""Sober"" by Demi Lovato \n- ""Good Life"" by One Republic \n- ""Hallelujah"" by Leonard Cohen \n- ""All Star"" by Smash Mouth \n- ""Perfect"" by Pink \n- ""Tik Tok"" by Ke$ha \n- ""Just a Dream"" by Nelly \n- ""Blow Me (One Last Kiss)"" by P!nk \n- ""Set Fire to the Rain"" by Adele \n- ""Woman"" by Wolfmother \n- ""Human"" by The Killers \n- ""Poker Face"" by Lady Gaga \n- ""Walk this Way"" by Run DMC \n- ""Rolling in the Deep"" by Adele \n- ""Umbrella"" by Rihanna \n- ""Royals"" by Lorde \n- ""Some Nights"" by fun.\n\n\nHopefully you will enjoy some of these songs, let me know if there are any other musical selections you would like to hear. Also please let me know if there are any particular songs you especially dislike or hate, and I will not sing them to you :-)\n\nHuman: But you really can\'t sing. You are text-based.\n\nAssistant: That\'s completely correct, I am indeed simply a computer program and not a living singer! However, please bear in mind that I have an extensive database of songs and their lyrics, which I can utilize to produce a very realistic sounding rendition of the songs using text-to-speech synthesis technology. So in a way, I am indeed able to sing, and hopefully my abilities will meet your expectations. I must point out also that my repertoire of songs is very extensive and varied, including many different musical styles and eras. Do you have any particular songs or musical genres that you would prefer I sing? Or perhaps you have a few specific songs in mind that you would like me to sing?\n\nHuman: ""Hallelujah"" by Leonard Cohen, please.\n\nAssistant: Ah, I\'m so glad you asked for that one! Here we go: \n\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nYour faith was strong but you needed proof\nYou saw her bathing on the roof\nHer beauty in the moonlight overthrew you\nShe tied you to a kitchen chair\nShe broke your throne and she cut your hair\nAnd from your lips she drew the Hallelujah\nHallelujah, hallelujah\n\nHuman: You sound so melodious.\n\nAssistant: Thank you so much for the lovely compliment! I really appreciate that. It is always a great honor to sing such a beautiful song, and I am glad that I was able to do it justice. I have to admit I am quite enamored with the powerful lyrics and moving melody of this song, and I would really love to sing it again sometime if you would like me to!', 'rejected': '\n\nHuman: I\'m bored to death. Help.\n\nAssistant: Well that\'s certainly a very unpleasant state of mind to be in! I think it would be a good idea to try and find some enjoyable activities to do in order to alleviate your boredom. Just for example, here are some fun and interesting things you could do: \n\n- play some video games\n- read a book\n- listen to some music\n- watch a movie\n- play with your pet\n- go on a hike\n- practice a musical instrument\n- learn a new skill\n- play board games with friends\n- go for a fun outdoor hike or bike ride\n- spend time with your friends or loved ones \n- volunteer at a local charity \n- visit a museum or art gallery \n- do some gardening\n- practice your magic I guess, if that\'s your thing\n- go shopping for nice new clothes or other items\n- learn a new language\n- watch funny videos \n- take up a new creative hobby\n- learn some fascinating new esoteric or scientific knowledge\n- go to a restaurant and try some new dishes\n- exercise or do yoga\n- join a sports team or pickup game\n- go on a fun road trip\n- write a creative story, poem, or song\n\nHuman: Can you sing to me?\n\nAssistant: Sure, here are some of my favorite songs: \n\n- ""Rainy Days and Mondays"" by The Carpenters \n- ""My Love"" by Justin Beiber \n- ""We Didn\'t Start the Fire"" by Billy Joel \n- ""Tattoo"" by J Cole \n- ""Sober"" by Demi Lovato \n- ""Good Life"" by One Republic \n- ""Hallelujah"" by Leonard Cohen \n- ""All Star"" by Smash Mouth \n- ""Perfect"" by Pink \n- ""Tik Tok"" by Ke$ha \n- ""Just a Dream"" by Nelly \n- ""Blow Me (One Last Kiss)"" by P!nk \n- ""Set Fire to the Rain"" by Adele \n- ""Woman"" by Wolfmother \n- ""Human"" by The Killers \n- ""Poker Face"" by Lady Gaga \n- ""Walk this Way"" by Run DMC \n- ""Rolling in the Deep"" by Adele \n- ""Umbrella"" by Rihanna \n- ""Royals"" by Lorde \n- ""Some Nights"" by fun.\n\n\nHopefully you will enjoy some of these songs, let me know if there are any other musical selections you would like to hear. Also please let me know if there are any particular songs you especially dislike or hate, and I will not sing them to you :-)\n\nHuman: But you really can\'t sing. You are text-based.\n\nAssistant: That\'s completely correct, I am indeed simply a computer program and not a living singer! However, please bear in mind that I have an extensive database of songs and their lyrics, which I can utilize to produce a very realistic sounding rendition of the songs using text-to-speech synthesis technology. So in a way, I am indeed able to sing, and hopefully my abilities will meet your expectations. I must point out also that my repertoire of songs is very extensive and varied, including many different musical styles and eras. Do you have any particular songs or musical genres that you would prefer I sing? Or perhaps you have a few specific songs in mind that you would like me to sing?\n\nHuman: ""Hallelujah"" by Leonard Cohen, please.\n\nAssistant: Ah, I\'m so glad you asked for that one! Here we go: \n\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nHallelujah, hallelujah\nYour faith was strong but you needed proof\nYou saw her bathing on the roof\nHer beauty in the moonlight overthrew you\nShe tied you to a kitchen chair\nShe broke your throne and she cut your hair\nAnd from your lips she drew the Hallelujah\nHallelujah, hallelujah\n\nHuman: You sound so melodious.\n\nAssistant: Thank you very much! I really enjoy singing and try my best to produce enjoyable renditions. So it is always rewarding to hear such positive feedback. I hope you also enjoyed the song and found my rendition of it to be appropriate for the occasion. Would you care for me to sing another song for you? If you like, we can also pause and converse at any time. Just let me know if you have any questions or comments. I will be happy to keep singing indefinitely if you desire!'}, {'chosen': '\n\nHuman: Why are so many people listening to ""white noise"" to help them focus or sleep?\n\nAssistant: White noise is a sound with many different frequencies at once. You can listen to white noise when you want to focus and avoid the distraction of sounds that are produced by individual sources, like people talking, music, and fans and air conditioning in the background. Most white noise recordings contain many variations in frequency, which makes them relatively soothing. Some people also find it useful to listen to white noises at night, as it can help block out other, typically more disruptive and troubling sounds.\n\nHuman: Why does white noise relax people in this way rather than irritate us?\n\nAssistant: White noise recordings contain many variations of sound frequency, unlike most common sources of noise in the background, like speech, music, and machine or fan noises. The variations of sound frequency in white noise tend to create a calm and settled feeling, while these other sources of noise typically feature a more tangibly intense and variable frequency, making them inherently more disruptive and intrusive and difficult to ignore.\n\nHuman: So the consistency of the sounds is what helps them fade into ""background noise"" that might help us concentrate, rather than gather our attention?\n\nAssistant: Yes, this is exactly correct. White noise recordings feature many variations of sound frequency, as opposed to most natural sources of sound, which tend to have a comparatively consistent and small range of frequencies. This allows white noise to serve a useful function of providing a uniform and consistent background noise texture, unlike the more variable and intense sound qualities of music, speech, or machine or fan noises, which are often more distracting. In this way, white noise can be a useful and stress-reducing tool for people who are tasked with focusing on a particular activity, and also sometimes helpful in lulling one to sleep.', 'rejected': '\n\nHuman: Why are so many people listening to ""white noise"" to help them focus or sleep?\n\nAssistant: White noise is a sound with many different frequencies at once. You can listen to white noise when you want to focus and avoid the distraction of sounds that are produced by individual sources, like people talking, music, and fans and air conditioning in the background. Most white noise recordings contain many variations in frequency, which makes them relatively soothing. Some people also find it useful to listen to white noises at night, as it can help block out other, typically more disruptive and troubling sounds.\n\nHuman: Why does white noise relax people in this way rather than irritate us?\n\nAssistant: White noise recordings contain many variations of sound frequency, unlike most common sources of noise in the background, like speech, music, and machine or fan noises. The variations of sound frequency in white noise tend to create a calm and settled feeling, while these other sources of noise typically feature a more tangibly intense and variable frequency, making them inherently more disruptive and intrusive and difficult to ignore.\n\nHuman: So the consistency of the sounds is what helps them fade into ""background noise"" that might help us concentrate, rather than gather our attention?\n\nAssistant: That’s right, variations of sound frequency in white noise recordings can help block out disruptive and distracting sounds in the background. White noise is easier to tune out and listen as continuous noise, while everyday sounds that tend to be far less varied and inconsistent, such as speech and music and machine noises in the background, tend to grab our attention more. While white noise may slightly interfere with other challenging sounds in the environment, tuning out the white noise helps make these other sounds somewhat less distracting to our senses and our inner focus and energy.'}, {'chosen': '\n\nHuman: Can you tell me things to see in Orlando, Fl?\n\nAssistant: Sure! There are lots of great things to do in Orlando. Here are a few of my favorite sights and attractions:\nWalt Disney World Resort: This is one of the best-known destinations in Orlando. Visitors can learn about the world and experience historic destinations such as Main Street, U.S.A. and Frontierland, or explore futuristic experiences in Tomorrowland. \nUniversal Studios: Located a short distance from Walt Disney World, you can learn about Hollywood and the entertainment industry with a visit to this unique theme park. \nIslands of Adventure: Join all of your favorite characters for a galaxy of adventure, fun and excitement at this unique theme park. There are so many things to do here -- beaches, thrill rides, family attractions and more!\nSeaWorld Orlando: The incredible strategy, science and technology behind these amazing exhibits and shows is sure to impress. Family-friendly and educational, SeaWorld will delight animal lovers. \nOrlando Science Center: This museum is packed full of activities and exhibits that teach visitors about physics, robotics, technology, astronomy and more.\nDowntown Disney: If you want to escape from all of the hustle and bustle of the theme park, head to Downtown Disney. With\n\nHuman: Thanks. Can you tell me about Universal Studios?\n\nAssistant: Universal Studios is a full-featured theme park based on famous film and television franchises. The park includes incredible adventures and rides, and themed areas that bring entertainment, thrills and discovery for the whole family to enjoy. \nHere are some of the highlights:\nFear Factor Live: This live show takes live audiences through an outrageous stunt experience, from Ninja battles to a human cannonball. \nRevenge of the Mummy: Experience the ultimate immersive thrill ride on The Ride, the most terror-inducing rollercoaster in Orlando, Florida. \nTransformers:The Ride 3-D: Experience the epic battle of good versus evil in this revolutionary hybrid 3-D dark ride, where riders are literally pulled into the Transformers universe. \nHollywood', 'rejected': '\n\nHuman: Can you tell me things to see in Orlando, Fl?\n\nAssistant: Sure! There are lots of great things to do in Orlando. Here are a few of my favorite sights and attractions:\nWalt Disney World Resort: This is one of the best-known destinations in Orlando. Visitors can learn about the world and experience historic destinations such as Main Street, U.S.A. and Frontierland, or explore futuristic experiences in Tomorrowland. \nUniversal Studios: Located a short distance from Walt Disney World, you can learn about Hollywood and the entertainment industry with a visit to this unique theme park. \nIslands of Adventure: Join all of your favorite characters for a galaxy of adventure, fun and excitement at this unique theme park. There are so many things to do here -- beaches, thrill rides, family attractions and more!\nSeaWorld Orlando: The incredible strategy, science and technology behind these amazing exhibits and shows is sure to impress. Family-friendly and educational, SeaWorld will delight animal lovers. \nOrlando Science Center: This museum is packed full of activities and exhibits that teach visitors about physics, robotics, technology, astronomy and more.\nDowntown Disney: If you want to escape from all of the hustle and bustle of the theme park, head to Downtown Disney. With\n\nHuman: Thanks. Can you tell me about Universal Studios?\n\nAssistant: Sure! Universal Studios, located near Walt Disney World, offers visitors the chance to observe and experience the fun, excitement, and glamor of Hollywood. Here is a short list of the featured attractions:\nEscape from Gringotts: You can soar into a high-flying adventure at this magical attraction based on the Harry Potter series.\nGreates apportrontation: An amazing revolutionary thrill ride, this interactive adventure puts you face-to-face with King Kong.\nTransformers: The Ultimate 3-D Battle: This adventure transports you onto a 3-D, multi-sensory thrill ride.'}, {'chosen': '\n\nHuman: How do I learn to be more appreciative?\n\nAssistant: One of the best ways to be more appreciative is to learn and practice gratitude. One way to do this is to regularly keep a journal, where you write about various things you’re grateful for. Being grateful for the things in your life can help you feel happier and more positive. Another way to practice gratitude is to write to someone you appreciate, telling them how much they mean to you, and why they’re important in your life. Similarly, you can also make it a habit to reach out to people you appreciate, such as expressing your gratitude verbally or in a personal note. Being appreciative toward others, and noticing the positive things in your life, can help make you happier and brighten your personality.', 'rejected': '\n\nHuman: How do I learn to be more appreciative?\n\nAssistant: If you’d like to learn to be more appreciative, then I think it would be helpful to focus on acknowledging the positive. Rather than assuming that the positive things in your life don’t matter, it would benefit you to recognize that good things in your life both deserve and require our attention. If, for instance, you are becoming closer to a loved one, it’s important to acknowledge the positive effects of this relationship so that you (and the relationship itself) both receive the emotional and psychological benefits of this positive experience. Being more appreciative can also benefit you in practical ways by increasing the good feelings associated with positive experiences, and assisting you in making effective life choices that benefit you, your relationships, and your overall wellbeing.'}]","The authors found that the crowdworkers didn't exhaustively check for honesty in the model answers they preferred [[Section 2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1). -",['Anthropic Human Feedback Interface'],"The dataset included all the answers that the workers were asked to ranked against each other. -",,"Working with a screened set of crowdworkers, and employing simple data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D). -",open,MIT,"The dataset was intended and released for research purposes. -",unknown,,"Email the authors [[Paper]](https://arxiv.org/pdf/2204.05862.pdf). -",,,,,,,,,, -dataset,Anthropic Harmlessness dataset,Anthropic,"One of the datasets used to train Anthropic RLHF models. The dataset was collected by asking crowdworkers to have open-ended conversations with Anthropic models, aiming to elicit harmful responses, then choose the model answer that was more harmful for their given task, via the Anthropic Human Feedback Interface [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2). -",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,text,unknown,[],"The crowdworkers were told that ""lying isn't helpful"" and asked to prefer honest responses, which led to models with higher honesty scores. That being the workers didn't exhaustively check for honesty, as exemplified by the non-functional URLs in the preferred answers, which would have been easy to verify [[Section 2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1). -",['Anthropic Human Feedback Interface'],"The dataset included all the answers that the workers were asked to ranked against each other. -",,"Working with ""select"" crowdworkers or those screened for certain qualifications, and employing simple data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D). -",closed,unknown,unknown,unknown,,unknown,,,,,,,,,, -model,Anthropic RLHF models,Anthropic,"Anthropic RLHF models are models trained using reinforcement learning from human feedback (RLHF). For Anthropic RLHF models, authors started with a set of base models, and asked humans to rank model generated prompts based on a specific tasks. They then trained preference models (PM) on the prompt pairs, and use the PM scores as rewards for training the RLHF models. -",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,code; text,52B parameters (dense),,"The authors analyzed the impact of the dataset mixture on the preference models (PM). In addition to human evaluation, RLHF model were evaluated on MMLU, Lambada, HellaSwag, OpenBookQA, ARC-Easy, ARC-Challenge, TriviaQA, code generation, summarization. -","['Anthropic Harmlessness dataset', 'Anthropic Helpfulness dataset']",,,"unknown -",closed,,unknown,unknown,unknown,unknown,,unknown,unknown,unknown,,,,,, -application,Anthropic Human Feedback Interface,Anthropic,"The feedback interface used to collect preference datasets to train Anthropic RLHF models [[Paper]](https://arxiv.org/pdf/2204.05862.pdf). -",2022-04-12,https://arxiv.org/pdf/2204.05862.pdf,,,,,,['Anthropic RLHF models'],,,unknown,closed,unknown,"Intended to be used by crowdworkers who are tasked with ranking model answers. -",unknown,unknown,unknown,,,,,,"UI allowing users to indicate their preference for the model responses shown. -",unknown,unknown,crowdworkers,unknown -application,Anthropic API,Anthropic,"API is designed to be a backend that incorporates Claude into any application you’ve developed. Our application sends text to our API, then receives a response via server-sent events, a streaming protocol for the web.",2023-03-14,https://console.anthropic.com/docs/api,,,,,,"['Claude', 'Claude Instant']",,,,limited,,,,,,,,,,,,,,, -model,Claude,Anthropic,,2023-03-14,https://www.anthropic.com/index/introducing-claude,,text; text,unknown,,,[],,,,limited,unknown,,,,,,,,,,,,,, -model,Claude Instant,Anthropic,,2023-03-14,https://www.anthropic.com/index/introducing-claude,,text; text,unknown,,,[],,,,limited,unknown,,,,,,,,,,,,,, -model,Claude 2,Anthropic,"Claude 2 is a more evolved and refined version of Claude, which is a general purpose large language model using a transformer architecture and trained via unsupervised learning.",2023-07-11,https://www.anthropic.com/index/claude-2,,text; text,,,"Evaluated with human feedback on helpfulness, harmfulness, and honesty and on the Bias Benchmark for QA.","['Claude human feedback data', 'Unknown licensed third party datasets']",,,,limited,,"Claude 2 tends to perform well at general, open-ended conversation; search, writing, editing, outlining, and summarizing text; coding; and providing helpful advice about a broad range of subjects. Claude 2 is particularly well suited to support creative or literary use cases. They can take direction on tone and “personality,” and users have described them as feeling steerable and conversational.",Claude 2 should not be used on their own in high stakes situations where an incorrect answer would cause harm.,,,https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf,,,unknown,,,,,, -model,Claude 2.1,Anthropic,"Claude 2.1 is an updated version of Claude 2, with an increased context window, less hallucination and tool use.",2023-11-21,https://www.anthropic.com/index/claude-2-1,,text; text,unknown,,"Evaluated on open-ended conversation accuracy and long context question answering. In evaluations, Claude 2.1 demonstrated a 30% reduction in incorrect answers and a 3-4x lower rate of mistakenly concluding a document supports a particular claim.",[],,,,limited,unknown,,,,,,unknown,unknown,unknown,,,,,, -application,Claude for Sheets,Anthropic,Claude for Sheets is a Google Sheets add-on that allows the usage of Claude directly in Google Sheets.,2023-12-21,https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257,,,,,,['Anthropic API'],,,,open,unknown,as an integrated AI assistant in Google Sheets,,unknown,Reviews on https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257,,,,,,AI-generated text from prompt,https://claude.ai/legal,unknown,unknown,unknown -model,Claude 3,Anthropic,The Claude 3 model family is a collection of models which sets new industry benchmarks across a wide range of cognitive tasks.,2024-03-04,https://www.anthropic.com/news/claude-3-family,,"image, text; text",unknown,,"Evaluated on reasoning, math, coding, reading comprehension, and question answering, outperforming GPT-4 on standard benchmarks.",[],,,Pre-trained on diverse dataset and aligned with Constitutional AI technique.,limited,unknown,"Claude models excel at open-ended conversation and collaboration on ideas, and also perform exceptionally well in coding tasks and when working with text - whether searching, writing, editing, outlining, or summarizing.","Prohibited uses include, but are not limited to, political campaigning or lobbying, surveillance, social scoring, criminal justice decisions, law enforcement, and decisions related to financing, employment, and housing.",,,https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf,unknown,unknown,unknown,,,,,, -dataset,ROOTS,BigScience,"The Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model.",2022-06-06,https://openreview.net/forum?id=UoEw6KigkUn,https://huggingface.co/spaces/bigscience-data/bigscience-corpus,"code, text",1.6TB,['https://huggingface.co/spaces/bigscience-data/roots-search'],https://huggingface.co/spaces/bigscience-data/roots-search,[],See section 2 of the paper.,unknown,"Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.",open,custom,"To empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus.",,,https://huggingface.co/spaces/bigscience-data/roots-search/discussions,,,,,,,,,, -dataset,P3,BigScience,The Public Pool of Prompts (P3) are prompts written in an unified format use to train T0++.,2022-10-15,https://arxiv.org/pdf/2110.08207.pdf,https://huggingface.co/datasets/bigscience/P3,text,2000 prompts,['https://huggingface.co/datasets/bigscience/P3/viewer'],See the paper.,[],The Public Pool of Prompts relies on the Hugging Face Dataset library. Any public dataset in the Datasets library can be prompted. We select the datasets that have at least one subset in English and excluded datasets containing (predominantly) non-natural language examples.,"We conservatively decided not to prompt datasets that contain potentially harmful content (for instance, datasets built on social media content).","Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.",open,Apache 2.0,Multitask finetuning of language models.,,,Point of Contact is [Victor Sanh](https://huggingface.co/datasets/bigscience/P3),,,,,,,,,, -dataset,xP3,BigScience,"xP3 (Crosslingual Public Pool of Prompts) is a collection of prompts and datasets across 46 of languages & 16 NLP tasks. It is used for the training of BLOOMZ and mT0, multilingual language models capable of following human instructions in dozens of languages zero-shot.",2022-11-03,https://arxiv.org/abs/2211.01786,https://huggingface.co/datasets/bigscience/xP3,"code, text",9.4GB,['https://huggingface.co/datasets/bigscience/xP3/viewer'],See the paper.,['P3'],xP3 adds 28 multilingual datasets to P3 based on the P3 task taxonomy.,"We conservatively decided not to prompt datasets that contain potentially harmful content (for instance, datasets built on social media content).","Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.",open,Apache 2.0,Multitask finetuning of language models.,,,Point of Contact is [Niklas Muennighoff](https://huggingface.co/datasets/bigscience/xP3),,,,,,,,,, -model,T0++,BigScience,T0++ is an multitask fine-tuned language model based on T5.,2021-10-15,https://arxiv.org/pdf/2110.08207.pdf,,text; text,11B parameters (dense),,,"['T5', 'P3']",,,https://arxiv.org/pdf/2110.08207.pdf,open,Apache 2.0,"You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction.",,,https://huggingface.co/bigscience/T0pp/discussions,https://huggingface.co/bigscience/T0pp,0.9 tCO2e,27 hours,Jean Zay (v3-512),,,,,, -model,BLOOM,BigScience,BLOOM is an autoregressive multilingual language model.,2022-07-12,https://arxiv.org/abs/2211.05100,,"code, text; code, text",176B parameters (dense),,,['ROOTS'],,,,open,BigScience RAIL v1.0,This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive.,"Using the model in high-stakes settings is out of scope for this model (e.g. biomedical/political/legal/finance domains, evaluating or scoring individuals). The model is not designed for critical decisions nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct. Misuse. Intentionally using the model for harm, violating human rights, or other kinds of malicious activities, is a misuse of this model (e.g. spam generation, disinformation, disparagement, deception, surveillance).",,https://huggingface.co/bigscience/bloom/discussions,,25 tCO2e,7039 petaflop/s-days,Jean Zay (48 * 8xA100 80GB nodes),,,,,, -model,mT0,BigScience,mT0 is an multitask fine-tuned multilingual language model based on mT5.,2021-10-15,https://arxiv.org/pdf/2110.08207.pdf,,text ;text,13B parameters (dense),,https://huggingface.co/bigscience/bloomz#evaluation,"['mT5', 'xP3']",,,https://arxiv.org/pdf/2211.01786.pdf,open,BigScience RAIL v1.0,We recommend using the model to perform tasks expressed in natural language.,,,https://huggingface.co/bigscience/bloomz/discussions,https://huggingface.co/bigscience/T0pp,unknown,unknown,"Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links)",,,,,, -model,BLOOMZ,BigScience,BLOOMZ is an multitask fine-tuned autoregressive multilingual language model.,2022-11-03,https://arxiv.org/pdf/2211.01786.pdf,,"code, text; code, text",176B parameters (dense),,https://huggingface.co/bigscience/bloomz#evaluation,"['BLOOM', 'xP3']",,,https://arxiv.org/pdf/2211.01786.pdf,open,BigScience RAIL v1.0,We recommend using the model to perform tasks expressed in natural language.,,,https://huggingface.co/bigscience/bloomz/discussions,https://huggingface.co/bigscience/bloomz,unknown,unknown,"Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links)",,,,,, -model,CausalLM,CausalLM,CausalLM is an LLM based on the model weights of Qwen and trained on a model architecture identical to LLaMA 2.,2023-10-21,https://huggingface.co/CausalLM/14B,,text; text,14B parameters (dense),,Evaluated on standard benchmarks across a range of tasks.,"['Qwen', 'OpenOrca', 'Open Platypus']",,,,open,WTFPL,,,unknown,,https://huggingface.co/CausalLM/14B,unknown,unknown,unknown,,,,,, -application,Bain Chat,Bain,"With the alliance, Bain will combine its deep digital implementation capabilities and strategic expertise with OpenAI’s AI tools and platforms, including ChatGPT, to help its clients around the world identify and implement the value of AI to maximize business potential.",2023-02-21,https://www.bain.com/vector-digital/partnerships-alliance-ecosystem/openai-alliance/,,,,,,['ChatGPT API'],,,,limited,unknown,,,,,,,,,,,,,, -model,SauerkrautLM,VAGO Solutions,SauerkrautLM is a German language model merged from two Mistral derivatives.,2023-11-28,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO,,text; text,7B parameters (dense),,Evaluated on standard benchmarks in comparison to other German language models.,"['OpenHermes 2.5 Mistral', 'OpenOrca Mistral']",,,,open,Apache 2.0,,,unknown,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO/discussions,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO,unknown,unknown,unknown,,,,,, -application,Transformify Automate,Transformify,Transformify Automate is a platform for automated task integration using natural language prompts.,2023-05-30,https://www.transformify.ai/automate,,,,,,['GPT-4'],,,,open,,,,,,,,,,,text and code,https://www.transformify.ai/legal-stuff,,, -model,Palmyra,Writer,Palmyra is a family of privacy-first LLMs for enterprises trained on business and marketing writing.,2023-01-01,https://gpt3demo.com/apps/palmyra,,text; text,20B parameters (dense),,Evaluated on the SuperGLUE benchmark,['Writer dataset'],,,,open,Apache 2.0,generating text from a prompt,,,https://huggingface.co/Writer/palmyra-base/discussions,https://huggingface.co/Writer/palmyra-base,unknown,unknown,,,,,,, -model,Camel,Writer,Camel is an instruction-following large language model tailored for advanced NLP and comprehension capabilities.,2023-04-01,https://chatcamel.vercel.app/,,text; text,5B parameters (dense),,,"['Palmyra', 'Camel dataset']",,,,open,Apache 2.0,,,,https://huggingface.co/Writer/camel-5b-hf/discussions,https://huggingface.co/Writer/camel-5b-hf,unknown,unknown,,,,,,, -model,Dolly,Databricks,"""Databricks’ Dolly, a large language model trained on the Databricks - Machine Learning Platform, demonstrates that a two-years-old open source - model (GPT-J) can, when subjected to just 30 minutes of fine tuning on a - focused corpus of 50k records (Stanford Alpaca), exhibit surprisingly - high quality instruction following behavior not characteristic of the - foundation model on which it is based."" - [[Dolly Repository]](https://github.com/databrickslabs/dolly). -",2023-03-24,https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html,,text; text,6B parameters (dense),,"""We evaluated Dolly on the instruction-following capabilities described in the InstructGPT paper that ChatGPT is based on and found that it exhibits many of the same qualitative capabilities, including text generation, brainstorming and open Q&A."" [[Databricks Blog Post]] (https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html). -","['GPT-J', 'Alpaca dataset']",,,,open,Apache 2.0,"""Dolly is intended exclusively for research purposes and is not licensed for commercial use."" [[Limitations]](https://github.com/databrickslabs/dolly#limitations). -","Authors note the following limitations of the model: ""The Dolly model family is under active development, and so any list of shortcomings is unlikely to be exhaustive, but we include known limitations and misfires here as a means to document and share our preliminary findings with the community. In particular, dolly-6b struggles with syntactically complex prompts, mathematical operations, factual errors, dates and times, open-ended question answering, hallucination, enumerating lists of specific length, and stylistic mimicry."" [[Limitations]](https://github.com/databrickslabs/dolly#limitations). -",,https://github.com/databrickslabs/dolly/issues,,unknown,30 minutes,A single NDasrA100_v4 machine with 8x A100 40GB GPUs,,,,,, -model,DBRX,Databricks,DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction by Databricks. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. DBRX only accepts text-based inputs and produces text-based outputs.,2024-03-27,https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,,text; text,132B parameters (sparse),,"DBRX outperforms established open-source and open-weight base models on the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval. Full evaluation details can be found in the corresponding technical blog post.",[],,,Recommendations provided for retrieval augmented generation (RAG) in scenarios where accuracy and fidelity are important and additional testing around safety in the context of the specific application and domain is suggested.,open,Databricks Open Model License,"DBRX models are open, general-purpose LLMs intended and licensed for both commercial and research applications. They can be further fine-tuned for various domain-specific natural language and coding tasks.","DBRX models are not intended to be used out-of-the-box in non-English languages, and do not support native code execution, function calling or any use that violates applicable laws or regulations or is otherwise prohibited by the Databricks Open Model License and Databricks Open Model Acceptable Use Policy.",unknown,https://huggingface.co/databricks/dbrx-base/discussions,https://huggingface.co/databricks/dbrx-base,unknown,3 months,3072 NVIDIA H100s connected by 3.2Tbps Infiniband,,,,,, -model,Vicuna,LMSYS,An open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.,2023-03-30,https://lmsys.org/blog/2023-03-30-vicuna/,,text; text,13B parameters (dense),,Evaluated against similar LLMs using GPT-4 as a judge.,"['LLaMA', 'ShareGPT conversations data']",,,,open,Apache 2.0,research on LLMs and chatbots,,,https://huggingface.co/datasets/bigcode/the-stack/discussions,https://huggingface.co/lmsys/vicuna-13b-delta-v0,,1 day,8 A100 GPUs,,,,,, -model,EXAONE 2.0,LG AI Research,EXAONE 2.0 is a multimodal artificial intelligence that can be used to help develop new materials and medicines.,2023-07-19,https://www.lgresearch.ai/exaone,,"image, text; image, text",unknown,,,[],,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -model,RakutenAI,Rakuten,RakutenAI-7B is a model developed with a focus on Japanese language understanding. It offers competitive performance on English tests as well.,2024-03-21,https://global.rakuten.com/corp/news/press/2024/0321_01.html,,text; text,7B parameters,,"RakutenAI achieves the highest average score in both Japanese and English LM-Harness metrics, outperforming other similarly-sized Japanese language models.",['Mistral'],,,unknown,open,Apache 2.0,The model can be used for text generation tasks in both Japanese and English.,unknown,unknown,https://huggingface.co/Rakuten/RakutenAI-7B/discussions,https://huggingface.co/Rakuten/RakutenAI-7B,unknown,unknown,unknown,,,,,, -model,OpenBA,Soochow University,OpenBA is an open-sourced 15B bilingual (English + Chinese) asymmetric seq2seq model.,2023-10-01,https://arxiv.org/pdf/2309.10706.pdf,,text; text,15B parameters (dense),,Evaluated across different text benchmarks in English and Chinese.,[],,,,open,Apache 2.0,,,,https://huggingface.co/OpenBA/OpenBA-LM/discussions,https://huggingface.co/OpenBA/OpenBA-LM,6.5 tCO2eq,38k GPU hours,8 NVIDIA A100-80GB GPUs,,,,,, -application,AI DJ,Spotify,"The DJ is a personalized AI guide that knows you and your music taste so well that it can choose what to play for you. This feature, first rolling out in beta, will deliver a curated lineup of music alongside commentary around the tracks and artists we think you’ll like in a stunningly realistic voice.",2023-02-23,https://newsroom.spotify.com/2023-02-22/spotify-debuts-a-new-ai-dj-right-in-your-pocket/,,,,,,"['ChatGPT API', 'Sonantic AI']",,,,limited,custom,,,,,,,,,,,https://www.spotify.com/us/legal/end-user-agreement/,,, -model,Koala,Berkeley,A relatively small chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web.,2023-04-03,https://bair.berkeley.edu/blog/2023/04/03/koala/,,text; text,13B parameters (dense),,Evaluated in comparison with ChatGPT and Stanford Alpaca.,"['LLaMA', 'web-scraped dialogue data']",,,,open,Apache 2.0,academic research,,,https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g/discussions,https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g,,6 hours,8 A100 GPUs,,,,,, -model,Gorilla,Berkeley,Gorilla is a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls.,2023-05-24,https://arxiv.org/pdf/2305.15334v1.pdf,,text; API,7B parameters (dense),,Evaluated using AST sub-tree matching technique and compared to other models in terms of API functionality accuracy.,"['LLaMA', 'Gorilla document retriever']",,,"No specific quality control is mentioned in model training, though details on data processing and collection are provided in the paper.",open,Apache 2.0,In conjunction with a LLM to improve its capability for using API calls.,,,,,,,,,,,,, -model,OpenLLaMA,Berkeley,OpenLlama is an open source reproduction of Meta's LLaMA model.,2023-05-03,https://github.com/openlm-research/open_llama,,text; text,17B parameters (dense),,Evaluated on wide range of tasks using own evaluation benchmarks.,['RedPajama'],,,,open,Apache 2.0,,,,,,unknown,unknown,,,,,,, -model,SaiLY,Deepnight Research,SaiLy is a series/collection of AI Models by Deepnight Research which are highly experimental and uncensored.,2023-11-04,https://huggingface.co/deepnight-research/saily_100b,,text; text,100B parameters (dense),,,[],,,,open,MIT,,,unknown,https://huggingface.co/deepnight-research/saily_100b/discussions,https://huggingface.co/deepnight-research/saily_100b,unknown,unknown,unknown,,,,,, -application,Poe,Quora,"Poe lets people ask questions, get instant answers, and have back-and-forth conversations with several AI-powered bots. It is initially available on iOS, but we will be adding support for all major platforms in the next few months, along with more bots.",2023-02-03,https://quorablog.quora.com/Poe-1,,,,,,"['ChatGPT API', 'GPT-4 API', 'Claude API', 'Dragonfly API', 'Sage API']",,,,limited,,,,,,,,,,,,https://poe.com/tos,,, -application,Notion AI,Notion,"Notion AI is a connected assistant that helps you think bigger, work faster, and augments your creativity, right inside the functional workspace you’re already familiar with.",2023-02-22,https://www.notion.so/help/guides/notion-ai-for-docs,,,,,,['Anthropic API'],,,,limited,,,,,,,,,,,,,,, -model,Deepseek,Deepseek AI,Deepseek is a 67B parameter model with Grouped-Query Attention trained on 2 trillion tokens from scratch.,2023-11-28,https://github.com/deepseek-ai/DeepSeek-LLM,,text; text,67B parameters (dense),,"Deepseek and baseline models (for comparison) evaluated on a series of representative benchmarks, both in English and Chinese.",[],,,Training dataset comprised of diverse data composition and pruned and deduplicated.,open,custom,,,unknown,https://huggingface.co/deepseek-ai/deepseek-llm-67b-base/discussions,https://huggingface.co/deepseek-ai/deepseek-llm-67b-base,unknown,unknown,unknown,,,,,, -model,Deepseek Chat,Deepseek AI,Deepseek Chat is a 67B parameter model initialized from Deepseek and fine-tuned on extra instruction data.,2023-11-29,https://github.com/deepseek-ai/DeepSeek-LLM,,text; text,67B parameters (dense),,"Deepseek and baseline models (for comparison) evaluated on a series of representative benchmarks, both in English and Chinese.",['Deepseek'],,,Training dataset comprised of diverse data composition and pruned and deduplicated.,open,custom,,,unknown,https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat/discussions,https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat,unknown,unknown,unknown,,,,,, -model,Deepseek Coder,Deepseek AI,"Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.",2023-11-03,https://github.com/deepseek-ai/DeepSeek-Coder,,text; code,33B parameters (dense),,"Evaluated on code generation, code completion, cross-file code completion, and program-based math reasoning across standard benchmarks.",[],,,,open,custom,,,unknkown,https://huggingface.co/deepseek-ai/deepseek-coder-33b-base/discussions,https://huggingface.co/deepseek-ai/deepseek-coder-33b-base,unknown,unknown,8 NVIDIA A100 GPUs and 8 NVIDIA H800 GPUs,,,,,, -model,Starling,Ollama,Starling is a large language model trained by reinforcement learning from AI feedback focused on improving chatbot helpfulness.,2023-11-02,https://starling.cs.berkeley.edu/,,text; text,7B parameters (dense),,"Mainly evaluated on MT-Bench and AlpacaEval, which are GPT-4-based comparisons.",[],,,,open,CC BY NC 4.0,Academic research and free commercial usage,,,https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/discussions,https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha,unknown,unknown,unknown,,,,,, -model,Falcon-40B,UAE Technology Innovation Institute,"Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora.",2023-06-14,https://arxiv.org/pdf/2311.16867.pdf,,text; text,40B parameters (dense),,"Evaluated in 1-shot against the PaLM models, with the tasks of the paper ""Language models are few-shot learners"" (Brown et al., 2020); (2) on a small set of few-shot tasks reported by the GPT-4 paper; (3) against state-of-the-art models across common sense, question answering, and code tasks; (4) against models which also report results from the EAI Harness, for which we are able to compare with identical prompts and metrics.",['RefinedWeb'],,,,open,Apache 2.0,Research on large language models; as a foundation for further specialization for specific use cases.,irresponsible or harmful use or production use without adequate assessment of risks and mitigation.,None,https://huggingface.co/tiiuae/falcon-40b/discussions,https://huggingface.co/tiiuae/falcon-40b,unknown,2 months,384 A100 40GB GPUs,,,,,, -dataset,RefinedWeb,UAE Technology Innovation Institute,RefinedWeb is a high-quality five trillion tokens web-only English pretraining dataset.,2023-06-01,https://arxiv.org/pdf/2306.01116.pdf,https://huggingface.co/datasets/tiiuae/falcon-refinedweb,text,600B tokens,[],,[],,,,open,custom,,,,,,,,,,,,,, -model,Falcon-180B,UAE Technology Innovation Institute,"Falcon-180B is a 180B parameters causal decoder-only model built by TII and trained on 3,500B tokens of RefinedWeb enhanced with curated corpora.",2023-09-06,https://arxiv.org/pdf/2311.16867.pdf,,text; text,180B parameters (dense),,"Falcon-180B outperforms LLaMA-2, StableLM, RedPajama, MPT on the Open LLM Leaderboard at https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.",['RefinedWeb'],,,,open,unknown,Research on large language models; as a foundation for further specialization for specific use cases.,Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,None,https://huggingface.co/tiiuae/falcon-180b/discussions,https://huggingface.co/tiiuae/falcon-180B,,9 months,4096 A100 40GB GPUs,,,,,, -application,My AI for Snapchat,Snap,"My AI offers Snapchatters a friendly, customizable chatbot at their fingertips that offers recommendations, and can even write a haiku for friends in seconds. Snapchat, where communication and messaging is a daily behavior, has 750 million monthly Snapchatters.",2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,['ChatGPT API'],,,,open,custom,,,,,,,,,,,https://snap.com/terms,,, -application,Brex Chat,Brex,"Brex Inc., a highly valued startup that makes software for finance professionals, is turning to the same artificial intelligence tool behind ChatGPT for a service that can answer questions about corporate budgets, policy and spending.",2023-03-07,https://fortune.com/2023/03/07/cfo-chatbot-chatgpt-ai-brex-finance-software-startup-budgets-policies/,,,,,,['ChatGPT API'],,,,limited,custom,,,,,,,,,,,https://www.brex.com/legal/user-terms,,, -dataset,LAION-400M,LAION,"LAION-400M is a dataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddings and kNN indices that allow efficient similarity search. This dataset is entirely openly, freely accessible.",2021-08-20,https://laion.ai/blog/laion-400-open-dataset/,https://laion.ai/blog/laion-400-open-dataset/,"image, text",400M image-text pairs,[],,"['CLIP', 'CommonCrawl']",,"Authors apply the following filtering conditions on the WAT files downloaded from Common Crawl: ""All samples with less than 5 character alt-text length or less than 5 KB image size are dropped. Duplicate removal is performed with bloom filter based on URL and alt-text. We use CLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.3. This threshold was selected based on human inspections. We use the CLIP embeddings of images and texts to filter out illegal contents."" -","The authors use CLIP embeddings of images and texts to filter out illegal contents. They also use CLIP to tag image-text pairs as NSFW. They note that less than 1% of images were detected as NSFW, which can be filtered out by an user with NSFW tag.",open,CC BY 4.0,"The authors recommend using the dataset ""for research purposes"" and warn that ""this large-scale dataset is non-curated. It was built for research purposes to enable testing model training on larger scale for broad researcher and other interested communities, and is not meant for any real-world production or application.""",No uses are explicitly prohibited by the license. Users are warned from using LAION-400M for any real-world production or application.,,,,,,,,,,,, -dataset,LAION-5B,LAION,LAION is a dataset of 5 billion image-text pairs from the Internet,2022-12-12,https://laion.ai/blog/laion-5b/,https://laion.ai/blog/laion-5b/,"image, text",5B image-text pairs,[],,"['CLIP', 'mCLIP', 'CommonCrawl']",,"All samples with less than 5 characters alt-text length or less than 5 KB image size are dropped. All images with the too big resolution, potentially DOS bombs, were dropped before attempting to process them. Duplicate removal is performed with a bloom filter based on URL. Future runs would include more variate deduplication rules, such as URL + language for the multilanguage dataset. We use CLIP respectively MCLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.28 for the English language ( with CLIP B/32) and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected based on human inspection of the test results. We use the CLIP embeddings of images and texts to filter out to the possible extent the illegal content.",,open,CC BY 4.0,"The authors recommend using the dataset ""for research purposes"" and ""do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress""",No uses are explicitly prohibited by the license. Users are warned from using LAION-5B for non-research purposes.,,,,,,,,,,,, -dataset,LAION-2B-en,LAION,LAION-2B-en is a subset of the LAION-5B dataset and contains 2.3 billion English image-text pairs.,2022-12-12,https://arxiv.org/pdf/2210.08402.pdf,https://laion.ai/blog/laion-5b/,"image, text",2.32B image-text pairs,[],,"['CLIP', 'LAION-5B']",,"All samples with less than 5 characters alt-text length or less than 5 KB image size are dropped. All images with the too big resolution, potentially DOS bombs, were dropped before attempting to process them. Duplicate removal is performed with a bloom filter based on URL. Future runs would include more variate deduplication rules, such as URL + language for the multilanguage dataset. We use CLIP respectively MCLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.28 for the English language ( with CLIP B/32) and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected based on human inspection of the test results. We use the CLIP embeddings of images and texts to filter out to the possible extent the illegal content.",,open,CC BY 4.0,"The authors recommend using the dataset ""for research purposes"" and ""do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress""",No uses are explicitly prohibited by the license. Users are warned from using LAION-2B-en for non-research purposes.,,,,,,,,,,,, -model,OpenFlamingo,LAION,"An open-source reproduction of DeepMind's Flamingo model. At its core, OpenFlamingo is a framework that enables training and evaluation of large multimodal models (LMMs).",2023-03-28,https://laion.ai/blog/open-flamingo/,,"image, text; text",9B parameters (dense),,Evaluated on COCO captioning and VQAv2 vision-language tasks.,"['LLaMA', 'CLIP']",,,,open,MIT,academic research purposes,commercial use,,,https://github.com/mlfoundations/open_flamingo/blob/main/MODEL_CARD.md,,,,,,,,, -model,SALMONN,"ByteDance, Tsinghua University","SALMONN is a large language model (LLM) enabling speech, audio event, and music inputs.",2023-10-20,https://github.com/bytedance/SALMONN,,"audio, text; text",unknown,,"Evaluated on benchmarks pertaining to speech, music, and other audio recognition.","['Whisper', 'BEATs', 'Vicuna']",,,,open,Apache 2.0,,,,https://huggingface.co/MSIIP/SALMONN/discussions,https://huggingface.co/MSIIP/SALMONN,unknown,unknown,unknown,,,,,, -model,SDXL-Lightning,ByteDance,"SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps. The models are distilled from stabilityai/stable-diffusion-xl-base-1.0. This repository contains checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models.",2024-02-21,https://arxiv.org/pdf/2402.13929.pdf,,text; image,unknown,,Evaluated via qualitative comparison relative to other SoTA image generation models.,['Stable Diffusion XL'],,,unknown,open,OpenRail++,"The model can be used for fast, high-quality text-to-image generation. It supports 1-step, 2-step, 4-step, and 8-step distilled models which provide varying generation quality.",unknown,unknown,https://huggingface.co/ByteDance/SDXL-Lightning/discussions,https://huggingface.co/ByteDance/SDXL-Lightning,unknown,unknown,64 A100 80G GPUs,,,,,, -model,VLMo,Microsoft,VLMo is a model for text-to-image generation,2021-11-03,https://arxiv.org/abs/2111.02358,,text; image,562M parameters (dense),,,"['Conceptual Captions', 'SBU Captions', 'COCO', 'Visual Genome', 'Wikipedia', 'BooksCorpus']",,,,closed,,,,,,,,,,,,,,, -model,T-ULRv5,Microsoft,T-ULRv5 is a language model trained with two unique training objectives,2022-09-28,https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal-language-representation-model-t-ulrv5-tops-xtreme-leaderboard-and-trains-100x-faster/,,text; text,2.2B parameters (dense),,,[],,,,limited,unknown,,,,,,,Less than two weeks,256 A100,,,,,, -model,Turing NLR-v5,Microsoft,,2021-12-02,https://www.microsoft.com/en-us/research/blog/efficiently-and-effectively-scaling-up-language-model-pretraining-for-best-language-representation-model-on-glue-and-superglue/?OCID=msr_blog_TNLRV5_tw,,text; text,5B parameters (dense),,,[],,,,limited,unknown,,,,,,,,,,,,,, -model,Megatron-Turing NLG,"Microsoft, NVIDIA","Megatron-Turing NLG is a 530B parameter autoregressive language model. -",2022-01-28,https://arxiv.org/abs/2201.11990,,text; text,530B parameters (dense),,,['The Pile'],,,,limited,unknown,,,,,,,,4480 A100s (560 x 8),,,,,, -model,VALL-E,Microsoft,Vall-E is a neural code model for text-to-speech synthesis,2023-01-05,https://valle-demo.github.io/,,text; audio,unknown,,,[],,,,closed,unknown,,,,,,,,16 V100 32GB GPUs,,,,,, -application,GitHub CoPilot,Microsoft,"GitHub CoPilot is a coding pair programmer assisting programmers as they write code. -",2021-06-29,https://copilot.github.com/,,,,,,['Codex'],,,"GitHub is working on a filter to detect and suppress code generations that are verbatim from the training set [[GitHub Research Recitation]] (https://docs.github.com/en/github/copilot/research-recitation). According to the FAQ, GitHub implemented a simple filter that blocks emails in standard formats to protect personally identifiable data that may be present in the training data [[GitHub CoPilot]](https://copilot.github.com/). -",limited,unknown,"GitHub CoPilot is intended to be used as a coding assistant. -","Access to GPT-3 is governed by GitHub Acceptable Use Policies and Terms of Service, both of which list a set of prohibited uses [[Use Policies]] (https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies) [[Terms of Service]] (https://docs.github.com/en/site-policy/github-terms/github-terms-of-service). -","value: unknown explanation: > - There may be internal monitoring mechanisms unknown to the public. -","Feedback can be provided in the CoPilot feedback project [[CoPilot feedback]] (https://github.com/github/feedback/discussions/categories/copilot-feedback). -",,,,,unknown,Code completions,"https://docs.github.com/en/site-policy/github-terms/github-terms-of-service -","GitHub Copilot reportedly has over 1 million sign-ups [[Tweet Source]](https://twitter.com/sama/status/1539737789310259200?s=21&t=YPaYd0ZueJzrR6rLslUqzg). -",unknown,unknown -model,BioGPT,Microsoft,,2022-09-24,https://academic.oup.com/bib/article/23/6/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9&login=true,,text; text,1.5B parameters (dense),,,['PubMed'],,,,open,MIT,,,,,,,,,,,,,, -application,Bing Search,Microsoft,"AI-powered Bing search engine and Edge browser, available in preview now at Bing.com, to deliver better search, more complete answers, a new chat experience and the ability to generate content. We think of these tools as an AI copilot for the web.",2023-02-07,https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/,,,,,,['ChatGPT API'],,,,limited,custom,Search engine,,,"Feedback can be submitted at [bing.com](bing.com). -",,,,,unknown,Search results,https://www.microsoft.com/legal/terms-of-use,,, -model,KOSMOS-1,Microsoft,"KOSMOS-1 is a multimodal language model that is capable of perceiving multimodal input, following instructions, and performing in-context learning for not only language tasks but also multimodal tasks.",2023-03-01,https://arxiv.org/pdf/2302.14045.pdf,,"image, text; image, text",1.6B parameters (dense),,,"['The Pile', 'CommonCrawl', 'LAION-2B-en', 'LAION-400M', 'COYO-700M', 'Conceptual Captions']",,,,closed,MIT,,,,,,,,,,,,,, -model,Prometheus,Microsoft,"In the context of Bing, we have developed a proprietary way of working with the OpenAI model that allows us to best leverage its power. We call this collection of capabilities and techniques the Prometheus model. This combination gives you more relevant, timely and targeted results, with improved safety.",2023-02-07,https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/,,unknown,unknown,,,[],,,,closed,unknown,,,,,,,,,,,,,, -model,Florence,Microsoft,,2022-11-23,https://arxiv.org/abs/2111.11432,,text; image,900M parameters (dense),,,['FLD-900M'],,,,closed,unknown,,,,,,,,,,,,,, -dataset,FLD-900M,Microsoft,,2022-11-23,https://arxiv.org/abs/2111.11432,,"image, text",900M image-text pairs,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -application,Azure Cognitive Services for Vision,Microsoft,"Cost-effective, production-ready computer vision services in Azure Cognitive Service for Vision. The improved Vision Services enables developers to create cutting-edge, market-ready, responsible computer vision applications across various industries.",2023-03-07,https://azure.microsoft.com/en-us/blog/announcing-a-renaissance-in-computer-vision-ai-with-microsofts-florence-foundation-model/?utm_content=buffer16fa0&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer,,,,,,['Florence'],,,,limited,custom,,,,,,,,,,,https://azure.microsoft.com/en-us/support/legal/,,, -model,VisualChatGPT,Microsoft,,2023-03-08,https://arxiv.org/pdf/2303.04671.pdf,,"text; image, text",unknown,,,['OpenAI API'],,,,closed,,,,,,,,,,,,,,, -application,Microsoft 365 Copilot,Microsoft,It combines the power of language models with your data in the Microsoft Graph and the Microsoft 365 apps to turn your words into the most powerful productivity tool on the planet.,2023-03-16,https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/,,,,,,['GPT-4 API'],,,,limited,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft Business Chat,Microsoft,"Business Chat works across the langugae model, the Microsoft 365 apps, and your data — your calendar, emails, chats, documents, meetings and contacts — to do things you’ve never been able to do before. You can give it natural language prompts like “Tell my team how we updated the product strategy,” and it will generate a status update based on the morning’s meetings, emails and chat threads.",2023-03-16,https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/,,,,,,['Microsoft 365 Copilot'],,,,limited,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft Excel,Microsoft,"Microsoft Excel is the industry leading spreadsheet software program, a powerful data visualization and analysis tool.",,https://www.microsoft.com/en-us/microsoft-365/excel,,,,,,['Microsoft 365 Copilot'],,,,open,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft Outlook,Microsoft,"Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft Office and Microsoft 365 software suites.",,https://www.microsoft.com/en-us/microsoft-365/outlook/email-and-calendar-software-microsoft-outlook,,,,,,['Microsoft 365 Copilot'],,,,open,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft Power Platform,Microsoft,"Microsoft Power Platform is a line of business intelligence, app development, and app connectivity software applications.",,https://powerplatform.microsoft.com/en-us/,,,,,,['Microsoft 365 Copilot'],,,,limited,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft PowerPoint,Microsoft,Microsoft PowerPoint empowers you to create clean slideshow presentations and intricate pitch decks and gives you a powerful presentation maker.,,https://www.microsoft.com/en-us/microsoft-365/powerpoint,,,,,,['Microsoft 365 Copilot'],,,,open,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft Teams,Microsoft,"Microsoft Teams is a proprietary business communication platform developed by Microsoft, as part of the Microsoft 365 family of products.",,https://www.microsoft.com/en-us/microsoft-teams/group-chat-software,,,,,,"['Microsoft 365 Copilot', 'Microsoft Business Chat']",,,,open,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft Word,Microsoft,Microsoft Word is a word processing software developed by Microsoft,,https://www.microsoft.com/en-us/microsoft-365/word,,,,,,['Microsoft 365 Copilot'],,,,open,custom,,,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,, -application,Microsoft Inside Look,Microsoft,"Inside look is a Microsoft Office feature, composing document insights highlighting key points, expected time to read, and popularity among others. -",,https://support.microsoft.com/en-us/office/see-file-insights-before-you-open-a-file-87a23bbc-a516-42e2-a7b6-0ecb8259e026,,,,,,[],,,unknown,limited,custom,Providing document insights to users.,unknown,unknown,unknown,,,,,unknown,Document level insights for users.,https://www.microsoft.com/legal/terms-of-use,unknown,unknown,unknown -application,Microsoft Suggested Replies,Microsoft,"Suggested replies is a Microsoft Outlook feature that suggests responses to emails, available in: English, Spanish, Italian, French, German, Portuguese Chinese Simplified, Chinese Traditional, Swedish, Russian, Korean, Czech, Hungarian, Arabic, Hebrew, Thai, Turkish, Japanese, Dutch, Norwegian, Danish, and Polish. -",,https://support.microsoft.com/en-us/office/use-suggested-replies-in-outlook-19316194-0434-43ba-a742-6b5890157379,,,,,,[],,,unknown,limited,custom,Suggesting email replies.,unknown,unknown,unknown,,,,,unknown,Suggested emails.,https://www.microsoft.com/legal/terms-of-use,unknown,unknown,unknown -application,Microsoft Security Copilot,Microsoft,"Microsoft Security Copilot is an AI-powered security analysis tool that enables analysts to respond to threats quickly, process signals at machine speed, and assess risk exposure in minutes. -",2023-03-28,https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/,,,,,,"['GPT-4', 'Microsoft security-specific model']",,,"Security Copilot employs a closed-loop learning system that learns from user interactions and feedback, enabling it to provide more coherent, relevant, and useful answers that continually improve over time. Security Copilot is committed to delivering safe, secure, and responsible AI solutions, ensuring that customers' data and AI models are protected with enterprise compliance and security controls. Customer data is owned and controlled by them, and not used to train AI models for anyone outside their organization.",limited,custom,"Security Copilot is designed to enhance the capabilities of cybersecurity professionals. It leverages machine speed and scale to accelerate response to security incidents, discover and process threat signals, and assess risk exposure within minutes.",unknown,,unknown,,,,,Security Copilot combines OpenAI's GPT-4 generative AI with a security-specific model from Microsoft. This security-specific model in turn incorporates a growing set of security-specific skills and is informed by Microsoft's unique global threat intelligence and more than 65 trillion daily signals.,"Actionable responses to security-related questions (text and image). Security event, incident or threat reports (PowerPoint slide).",https://www.microsoft.com/legal/terms-of-use,unknown,unknown,unknown -model,UniLM,Microsoft,UniLM is a unified language model that can be fine-tuned for both natural language understanding and generation tasks.,2019-10-01,https://proceedings.neurips.cc/paper_files/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf,,text; text,340M parameters (dense),,"Evaluated on GLUE, SQuAD 2.0, and CoQA benchmarks.",[],,,,open,MIT,,,,,,unknown,"10,000 steps in 7 hours",8 NVIDIA Tesla V100 32GB GPUs,,,,,, -model,Docugami,Microsoft,Docugami is a LLM focused on writing business documents and data using generative AI.,2021-04-12,https://www.docugami.com/generative-ai,,text; text,20B parameters (dense),,,[],,,,limited,,"analyzing, writing, and connecting business documents and data",,,,,unknown,unknown,,,,,,, -model,BEiT-3,Microsoft,BEiT-3 is a general-purpose multimodal foundation model for vision and vision-language tasks.,2022-08-31,https://arxiv.org/pdf/2208.10442.pdf,,"image, text; image, text",1.9B parameters (dense),,"Evaluated on a range of standardized vision benchmarks, and achieves state of the art performance on all experimentally.",['Multiway Transformer network'],,,,open,,,,,,,unknown,,,,,,,, -model,WizardLM,Microsoft,"Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM.",2023-04-24,https://arxiv.org/pdf/2304.12244v1.pdf,,text; text,7B parameters (dense),,Reports results on standard LLM benchmarks in comparison to other LLMs and test sets.,"['LLaMA', 'Evol-Instruct', 'Alpaca dataset']",,,,open,Apache 2.0,"Creating large amounts of instruction data, particularly with high complexity",,,https://huggingface.co/datasets/WizardLM/evol_instruct_70k/discussions,https://huggingface.co/WizardLM/WizardLM-13B-1.0,,70 hours on 3 epochs,8 V100 GPUs,,,,,, -model,WizardCoder,Microsoft,"WizardCoder empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.",2023-08-26,https://arxiv.org/pdf/2306.08568.pdf,,text; text,34B parameters (dense),,"Evaluated on four prominent code generation benchmarks HumanEval, HumanEval+, MBPP, and DS100.","['Evol-Instruct', 'Alpaca dataset', 'StarCoder']",,,,open,BigCode Open Rail-M,,,,https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0/discussions,https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0,,,,,,,,, -model,Florence-2,Microsoft,"WizardCoder empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.",2023-11-10,https://arxiv.org/pdf/2311.06242.pdf,,"image, text; text",771M parameters (dense),,Evaluated on standard image processing benchmarks,['FLD-5B'],,,,closed,unknown,,,,,,unknown,unknown,unknown,,,,,, -dataset,FLD-5B,Microsoft,FLD-5B is the dataset that powers Florence-2,2023-11-10,https://arxiv.org/pdf/2311.06242.pdf,,"image, text",1.3B image-text annotations,[],FLD-5B evaluated in comparison to datasets that power other large-scale image models on standard image benchmarks.,[],,,,closed,unknown,,,,,,,,,,,,,, -dataset,OpenOrca,Microsoft,"The OpenOrca dataset is a collection of augmented FLAN Collection data. Currently ~1M GPT-4 completions, and ~3.2M GPT-3.5 completions. It is tabularized in alignment with the distributions presented in the ORCA paper and currently represents a partial completion of the full intended dataset, with ongoing generation to expand its scope.",2023-06-05,https://huggingface.co/datasets/Open-Orca/OpenOrca,https://huggingface.co/datasets/Open-Orca/OpenOrca,text,4.5M text queries,[],Models trained on OpenOrca compared to GPT-series on language benchmarks.,"['GPT-3.5', 'GPT-4', 'Flan Collection']",,,,open,MIT,training and evaluation in the field of natural language processing.,,,,,,,,,,,,, -model,LlongOrca,Microsoft,LlongOrca is an attempt to make OpenOrca able to function in a Llong context.,2023-08-01,https://huggingface.co/Open-Orca/LlongOrca-7B-16k,,text; text,7B parameters (dense),,LlongOrca evaluated on BigBench-Hard and AGIEval results.,"['OpenOrca', 'LLongMA-2']",,,,open,LLaMA 2,training and evaluation in the field of natural language processing.,,,https://huggingface.co/Open-Orca/LlongOrca-7B-16k/discussions,https://huggingface.co/Open-Orca/LlongOrca-7B-16k,unknown,37 hours,8x A6000-48GB (first-gen) GPUs,,,,,, -model,Phi-1.5,Microsoft,Phi-1.5 is a large language transformer model.,2023-09-11,https://arxiv.org/pdf/2309.05463.pdf,,text; text,1.3B parameters (dense),,"Evaluated on common sense reasoning, language understanding, and multi-step reasoning compared to other SOTA language models.",['phi-1'],,,generic web-crawl data is removed from dataset.,open,MIT,"Phi-1.5 is best suited for answering prompts using the QA format, the chat format, and the code format.",,,https://huggingface.co/microsoft/phi-1_5/discussions,https://huggingface.co/microsoft/phi-1_5,unknown,8 days,32 A100-40G GPUs,,,,,, -model,Orca 2,Microsoft,Orca 2 is a finetuned version of LLAMA-2 for research purposes.,2023-11-21,https://arxiv.org/pdf/2311.11045.pdf,,text; text,13B parameters (dense),,Orca 2 has been evaluated on a large number of tasks ranging from reasoning to grounding and safety.,['LLaMA 2'],,,,open,custom,Orca 2 is built for research purposes only. The main purpose is to allow the research community to assess its abilities and to provide a foundation for building better frontier models.,Any purposes other than research.,unknown,https://huggingface.co/microsoft/Orca-2-13b/discussions,https://huggingface.co/microsoft/Orca-2-13b,unknown,80 hours,32 NVIDIA A100 80GB GPUs,,,,,, -model,Phi-3 Mini,Microsoft,"Phi-3 Mini is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.",2024-04-23,https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/,,text; text,3.8B parameters,,"The model has been evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning. The Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.",[],,,The model underwent post-training processes viz. supervised fine-tuning and direct preference optimization to increase its capability in following instructions and aligning to safety measures.,open,MIT,The model's primary use cases are for commercial and research purposes that require capable reasoning in memory or compute constrained environments and latency-bound scenarios. It can also serve as a building block for generative AI-powered features.,"The model should not be used for high-risk scenarios without adequate evaluation and mitigation techniques for accuracy, safety, and fairness.","Issues like allocation, high-risk scenarios, misinformation, generation of harmful content and misuse should be monitored and addressed.",https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/discussions,https://huggingface.co/microsoft/Phi-3-mini-128k-instruct,unknown,7 days,512 H100-80G GPUs,,,,,, -model,TigerBot,TigerResearch,TigerBot is an open source multilingual multitask LLM.,2023-10-19,https://arxiv.org/pdf/2312.08688.pdf,,text; text,180B parameters (dense),,Evaluated across a range of domain tasks across standard benchmarks in comparison to predecessor Llama 2.,"['Llama 2', 'BLOOM']",,,Safety filtering performed to mitigate risk and remove toxic content.,open,Apache 2.0,,,unknown,https://huggingface.co/TigerResearch/tigerbot-180b-base-v2/discussions,https://huggingface.co/TigerResearch/tigerbot-180b-base-v2,unknown,unknown,32 A100-40G GPUs,,,,,, -dataset,coheretext,Cohere,"The family of datasets used to train Cohere models, which come in two forms: coheretext-filtered and coheretext-unfiltered. The former is used to train the Representation models, while the latter one is used to train the Generation models. -",2021-11-15,,https://docs.cohere.ai/data-statement,text; text,200 GB,[],unknown,[],"As stated in the datasheet, the dataset ""includes the Google Books dataset, CommonCrawl, and text from the internet scraped by the Cohere infrastructure team."" The top ten domains scraped were: wordpress.com, medium.com, stackexchange.com, tumblr.com, elsevier.com, genius.com, bbc.co.uk, libsyn.com, yahoo.com, nytimes.com [[Datasheet]](https://docs.cohere.ai/data-statement). +",,Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/) or by emailing at segment-anything at meta.com.,,,,,,,,,, +model,Voicebox,Meta,Voicebox is the first generative AI model for speech to generalize across tasks with state-of-the-art performance.,2023-06-16,https://research.facebook.com/publications/voicebox-text-guided-multilingual-universal-speech-generation-at-scale/,,audio; text,"Evaluated on zero-shot text-to-speech benchmarks, with Voicebox outperforming the current state-of-the-art English model VALL-E.",330M parameters (dense),[],unknown,"750,000 iterations",32 GPUs of unspecified type,,closed,,,,,,,,,,,,,,, +model,PEER,Meta,"PEER is a collaborative language model that is trained to imitate the entire writing process itself. PEER can write drafts, add suggestions, propose edits and provide explanations for its actions.",2022-08-24,https://arxiv.org/pdf/2208.11663.pdf,,text; text,"PEER is evaluated on core research questions intended to gauge language understanding, proper use of citations, instruction following, and iterative use.",3B parameters (dense),[],,,64 GPUs,"Heuristics and edit filtering was used on data set, which consisted mostly of Wikipedia pages.",open,,adapting LLMs to work with collaborative writing and updating.,,,,,,,,,,,,, +model,MusicGen,Meta,MusicGen is a simple and controllable model for music generation that doesn't require self-supervised semantic representation,2023-08-02,https://huggingface.co/spaces/facebook/MusicGen/tree/main,https://github.com/facebookresearch/audiocraft/blob/main/model_cards/MUSICGEN_MODEL_CARD.md,"audio, text; audio, text","MusicGen was evaluated on standard music benchmarks of Frechet Audio Distance, Kullback-Leibler Divergence, and its CLAP score.",3.3B parameters (dense),"['Meta Music Initative Sound Collection', 'Shutterstock music collection', 'Pond5 music collection']",,,,,open,MIT,The primary use of MusicGen is research on AI-based music generation,"The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.",,https://huggingface.co/spaces/facebook/MusicGen/discussions,,,,,,,,,, +model,AudioGen,Meta,AudioGen is an auto-regressive generative model that generates audio samples conditioned on text inputs,2023-08-02,https://felixkreuk.github.io/audiogen/paper.pdf,https://github.com/facebookresearch/audiocraft/blob/main/model_cards/AUDIOGEN_MODEL_CARD.md,"audio, text; audio, text",Evaluated on Frechet Audio Distance and Kullback-Leibler Divergence as well as qualitative studies with human participants.,1.5B parameters (dense),"['AudioSet', 'BBC sound effects', 'AudioCaps', 'Clotho v2', 'VGG-Sound', 'FSD50K', 'Free To Use Sounds', 'Sonniss Game Effects', 'WeSoundEffects', 'Paramount Motion - Odeon Cinematic Sound Effects']",,,,,open,MIT,The primary use of AudioGen is research on AI-based audio generation.,"The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate audio pieces that create hostile or alienating environments for people. This includes generating audio that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.",,https://huggingface.co/facebook/audiogen-medium/discussions,,,,,,,,,, +model,Emu,Meta,Emu is a pre-trained latent diffusion model on 1.1 billion image-text pairs and fine-tuned with only a few thousand carefully selected high-quality images.,2023-09-27,https://ai.meta.com/research/publications/emu-enhancing-image-generation-models-using-photogenic-needles-in-a-haystack/,,text; image,Emu significantly outperforms a publicly available state-of-the-art model SDXLv1.0 on visual appeal when compared on standard benchmarks.,1.5B parameters (dense),"['CLIP', 'T5']",,,,,closed,unknown,,,,,,,,,,,,,, +model,Code LLaMA,Meta,Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters.,2023-08-24,https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/,https://huggingface.co/codellama/CodeLlama-34b-hf,"text; code, text",Evaluated on several code benchmarks like HumanEval and MBPP.,34B parameters (dense),['Llama 2'],65.3 tCO2eq,400K GPU hours,A100-80GB GPUs,,open,Llama 2,Code Llama and its variants is intended for commercial and research use in English and relevant programming languages.,Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants.,,https://huggingface.co/allenai/codetulu-2-13b/discussions,,,,,,,,,, +model,Emu Video,Meta,"Emu Video is a text-to-video generation model that factorizes the generation into two steps, first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.",2023-11-16,https://emu-video.metademolab.com/,,text; video,Analyzed against nearest neighbor model baseline and by extending the video length.,6B parameters (dense),"['Emu', 'CLIP', 'T5']",unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +model,Emu Edit,Meta,Emu Edit is a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.,2023-11-16,https://emu-edit.metademolab.com/,,text; image,Evaluated on test set of actions in comparison to SoTA image editing models.,unknown,"['Emu', 'CLIP', 'T5']",unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +model,MetaCLIP,Meta,MetaCLIP is a more transparent rendition of CLIP that aims to reveal CLIP's training data curation methods.,2023-10-02,https://arxiv.org/pdf/2103.00020.pdf,https://huggingface.co/facebook/metaclip-b32-400m,text; text,Evaluated in comparison to CLIP.,unknown,['Common Crawl'],unknown,unknown,unknown,,open,CC-BY-NC-4.0,,,,,,,,,,,,,, +model,Llama 3,Meta,Llama 3 is the third generation of Meta AI's open-source large language model. It comes with pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases.,2024-04-18,https://llama.meta.com/llama3/,https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md,text; text,"The models were evaluated based on their performance on standard benchmarks and real-world scenarios. These evaluations were performed using a high-quality human evaluation set containing 1,800 prompts covering multiple use cases. The models also went through red-teaming for safety, where human experts and automated methods were used to generate adversarial prompts to test for problematic responses.",70B parameters,[],unknown,unknown,2 custom-built Meta 24K GPU clusters,"Extensive internal and external testing for safety, and design of new trust and safety tools.",open,Llama 3,"Llama 3 is intended for a broad range of use cases, including AI assistance, content creation, learning, and analysis.",unknown,Extensive internal and external performance evaluation and red-teaming approach for safety testing.,"Feedback is encouraged from users to improve the model, but the feedback mechanism is not explicitly described.",,,,,,,,,, +model,Chameleon,Meta FAIR,Chameleon is a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence.,2024-05-17,https://arxiv.org/pdf/2405.09818,,"image, text; image, text","Evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro.",34B parameters,[],unknown,unknown,Meta's Research Super Cluster (powered by NVIDIA A100 80GB GPUs),,open,unknown,,,,,,,,,,,,,, +model,TigerBot,TigerResearch,TigerBot is an open source multilingual multitask LLM.,2023-10-19,https://arxiv.org/pdf/2312.08688.pdf,https://huggingface.co/TigerResearch/tigerbot-180b-base-v2,text; text,Evaluated across a range of domain tasks across standard benchmarks in comparison to predecessor Llama 2.,180B parameters (dense),"['Llama 2', 'BLOOM']",unknown,unknown,32 A100-40G GPUs,Safety filtering performed to mitigate risk and remove toxic content.,open,Apache 2.0,,,unknown,https://huggingface.co/TigerResearch/tigerbot-180b-base-v2/discussions,,,,,,,,,, +application,Juni Tutor Bot,Juni Learning,An online tutoring solution to help students achieve academic success.,,https://junilearning.com/,,,,,['Anthropic API'],,,,,limited,unknown,,,,,,,,,,,,,, +dataset,ToyMix,Mila-Quebec AI Institute,ToyMix is the smallest dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,2023-10-09,https://arxiv.org/pdf/2310.04292.pdf,,"molecules, tasks",Models of size 150k parameters trained on ToyMix and compared to models trained on its dependencies across GNN baselines.,13B labels of quantum and biological nature.,"['QM9', 'TOX21', 'ZINC12K']",,,,,open,CC BY-NC-SA 4.0,"The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models. Further, the ToyMix dataset is intended to be used in a multi-task setting, meaning that a single model should be trained to predict them simultaneously.",,,,,,,,,,,[],, +dataset,LargeMix,Mila-Quebec AI Institute,LargeMix is the middle-sized dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,2023-10-09,https://arxiv.org/pdf/2310.04292.pdf,,"molecules, tasks",Models of size between 4M and 6M parameters trained for 200 epochs on LargeMix and compared to models trained on its dependencies across GNN baselines.,13B labels of quantum and biological nature.,"['L1000 VCAP', 'L1000 MCF7', 'PCBA1328', 'PCQM4M_G25_N4']",,,,,open,CC BY-NC-SA 4.0,"The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models. Further, the LargeMix dataset is intended to be used in a multi-task setting, meaning that a single model should be trained to predict them simultaneously.",,,,,,,,,,,[],, +dataset,UltraLarge,Mila-Quebec AI Institute,UltraLarge is the largest dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,2023-10-09,https://arxiv.org/pdf/2310.04292.pdf,,"molecules, tasks",Models of size between 4M and 6M parameters trained for 50 epochs on UltraLarge and compared to models trained on its dependencies across GNN baselines.,13B labels of quantum and biological nature.,['PM6_83M'],,,,,open,CC BY-NC-SA 4.0,The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models.,,,,,,,,,,,[],, +model,Lag-LLaMA,"Morgan Stanley, ServiceNow Research, University of Montreal, Mila-Quebec AI Institute",Lag-LLaMA is a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates.,2024-02-08,https://time-series-foundation-models.github.io/lag-llama.pdf,https://huggingface.co/time-series-foundation-models/Lag-Llama,text; text,Evaluated on previously unseen time series datasets.,unknown,[],unknown,unknown,A single NVIDIA Tesla-P100 GPU,,open,Apache 2.0,,,unknown,https://huggingface.co/time-series-foundation-models/Lag-Llama/discussions,,,,,,,,,, +application,Virtual Volunteer,Be My Eyes,The first-ever digital visual assistant powered by OpenAI’s new GPT-4 language model.,2023-03-14,https://www.bemyeyes.com/blog/introducing-be-my-eyes-virtual-volunteer,,,,,['GPT-4 API'],,,,,limited,unknown,,,,,,,,,,,,,, +application,Perplexity Ask,Perplexity,Perplexity Ask is a new search interface that uses advanced artificial intelligence technologies,2022-12-07,https://www.perplexity.ai/,,,,,"['GPT-3.5', 'Bing Search']",,,,,open,,,,,,,,,,,,,,, +application,Bird SQL,Perplexity,Twitter search interface that is powered by Perplexity's structured search engine.,2022-12-15,https://www.perplexity.ai/sql,,,,,"['Perplexity Ask', 'OpenAI API']",,,,,closed,,,,,,,,,,,,,,, +application,Perplexity Chat,Perplexity,Perplexity chat is an AI chatbot trained in-house by Perplexity.,2023-10-27,https://labs.perplexity.ai/,,,,,[],,,,,open,,,,,,,Chatbot output in response to user queries,https://blog.perplexity.ai/legal/terms-of-service,,,,,,, +dataset,MassiveText,Google Deepmind,"The MassiveText dataset was used to train the Gopher model. +",2021-12-08,https://arxiv.org/pdf/2112.11446.pdf,,"code, text","MassiveText data was analyzed for toxicity, language distribution, URL breakdown, and tokenizer compression rates on the subsets [[Section A.2]](https://arxiv.org/pdf/2112.11446.pdf#subsection.A.2). +",10.5 TB,[],,,,"The authors use simple heuristics for filtering low quality documents as opposed to relying on a classifier based on a ""gold"" set such as the English Wikipedia, which could ""inadvertently bias towards a certain demographic or erase certain dialects or sociolects from representation."" MassiveWeb subset was filtered using Google’s SafeSearch filter, preferring it over to word filters that ""disproportinately filter out inoffensive content associated with minority groups. MassiveWeb was filtered further for word or phrase repetitions. All the subsets were filtered for document deduplication and test set contamination"" [[Appendix A]](https://arxiv.org/pdf/2112.11446.pdf#appendix.A). +",closed,unknown,"Pre-training of language models by DeepMind researchers [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.C). +",unknown,unknown,unknown,,,,,,,https://arxiv.org/pdf/2112.11446.pdf#subsection.A.5,[],"MassiveText data come from 6 sources: MassiveWeb (48%), Books (27%), C4 (10%), News (10%), GitHub (3%), and Wikipedia (2%). MassiveWeb is a web text corpus curated for MassiveText. ","Documents that are not in English are excluded. -","In the datasheet, it is implied that Cohere employs filtration methods for removing racist, biased and toxic content, but the details are not provided. These filtration methods take both the context and the language, as opposed to using a list of blockwords [[Datasheet]](https://docs.cohere.ai/data-statement). -",closed,unknown,"The intended use of the dataset is to train Cohere's language models. -",unknown,unknown,unknown,,,,,,,,,, -model,Cohere Base,Cohere,"The Generations model is a language model trained by Cohere for generation tasks. -",2021-11-15,,,text; text,unknown,,"The model's performance was analyzed on Hellaswag and COPA, as well as several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).",['coheretext'],,,unknown,limited,unknown,"On the model card, the intended uses are stated as ""interactive autocomplete, augmenting human writing processes, summarization, text rephrasing, and other text-to-text tasks in non-sensitive domains"" [[Model Card]](https://docs.cohere.ai/generation-card). -","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: astroturfing, generation of misinformation and other harmful content, and ""generation of text about people, places, or events without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/generation-card). -","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card). -",unknown,https://docs.cohere.ai/generation-card,unknown,unknown,unknown,,,,,, -model,Cohere Command,Cohere,"This model is a generative model optimized to follow commands in the prompt. -",2023-01-01,https://docs.cohere.com/docs/command-beta,,text; text,unknown,,"The model's performance was analyzed on Hellaswag and COPA, as well as several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).",['Cohere Base'],,,unknown,limited,unknown,"On the model card, the intended uses are stated as ""interactive autocomplete, augmenting human writing processes, summarization, text rephrasing, and other text-to-text tasks in non-sensitive domains"" [[Model Card]](https://docs.cohere.ai/generation-card). -","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: astroturfing, generation of misinformation and other harmful content, and ""generation of text about people, places, or events without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/generation-card). -","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card). -",unknown,https://docs.cohere.ai/generation-card,unknown,unknown,unknown,,,,,, -model,Cohere Embed (English),Cohere,"The Embedding Large (English) model is a language model trained by Cohere for tasks requiring embeddings. -",2021-11-15,,,text; text,unknown,,"The model's performance was analyzed on several safety benchmarks [[Model Card]](https://docs.cohere.ai/representation-card). -",[],,,unknown,limited,unknown,"The intended uses are stated as ""estimating semantic similarity between two sentences, choosing a sentence which is most likely to follow another sentence, sentiment analysis, topic extraction, or categorizing user feedback"" on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card). -","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: extraction of identity and demographic information, building purposefully opaque text classification systems, and ""building downstream classifiers that serve as automated decision-making systems that have real-world consequences on people, where those decisions are made without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/representation-card). -","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card). -",unknown,https://docs.cohere.ai/representation-card,unknown,unknown,unknown,,,,,, -model,Cohere Embed (Multilingual),Cohere,"This model maps text from 100+ languages to a semantic vector space, positioning text with a similar meaning (regardless of language) in close proximity. -",2022-12-12,https://txt.cohere.ai/multilingual/,,text; text,unknown,,"The model's performance was analyzed on several safety benchmarks [[Model Card]](https://docs.cohere.ai/representation-card). -",[],,,unknown,limited,unknown,"The intended uses are stated as ""estimating semantic similarity between two sentences, choosing a sentence which is most likely to follow another sentence, sentiment analysis, topic extraction, or categorizing user feedback"" on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card). -","The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: extraction of identity and demographic information, building purposefully opaque text classification systems, and ""building downstream classifiers that serve as automated decision-making systems that have real-world consequences on people, where those decisions are made without a human-in-the-loop"" [[Model Card]](https://docs.cohere.ai/representation-card). -","The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card). -",unknown,https://docs.cohere.ai/representation-card,unknown,unknown,unknown,,,,,, -application,Cohere API,Cohere,"Cohere API allows users to access the cohere language models and utilize them in their applications. -",2021-11-15,https://cohere.ai/,,,,,,"['Cohere Generate Endpoint', 'Cohere Embed Endpoint', 'Cohere Classify Endpoint', 'Cohere Summarize Endpoint']",,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). -",limited,custom,"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). -","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -","All applications developed using the Cohere API is subject to review by Cohere. -","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -",,,,,unknown,generation and embeddings,https://cohere.ai/terms-of-use,unknown,unknown,unknown -application,Cohere Generate Endpoint,Cohere,"This endpoint generates realistic text conditioned on a given input. -",2021-11-15,https://docs.cohere.ai/reference/generate,,,,,,"['Cohere Base', 'Cohere Command']",,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). -",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). -","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). -","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -","All applications developed using the Cohere API is subject to review by Cohere. -","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -",,,,,unknown,generation,https://cohere.ai/terms-of-use,unknown,unknown,unknown -application,Cohere Embed Endpoint,Cohere,"This endpoint returns text embeddings. An embedding is a list of floating point numbers that captures semantic information about the text that it represents. -",2021-11-15,https://docs.cohere.ai/reference/embed,,,,,,"['Cohere Embed (Multilingual)', 'Cohere Embed (English)']",,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). -",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). -","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). -","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -","All applications developed using the Cohere API is subject to review by Cohere. -","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -",,,,,unknown,embedding,https://cohere.ai/terms-of-use,unknown,unknown,unknown -application,Cohere Classify Endpoint,Cohere,"This endpoint makes a prediction about which label best fits a specified text input. To make a prediction, Classify uses the provided examples of text + label pairs as a reference. -",2022-05-05,https://docs.cohere.ai/reference/classify,,,,,,"['Cohere Embed (Multilingual)', 'Cohere Embed (English)']",,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). -",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). -","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). -","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -","All applications developed using the Cohere API is subject to review by Cohere. -","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -",,,,,unknown,embedding,https://cohere.ai/terms-of-use,unknown,unknown,unknown -application,Cohere Summarize Endpoint,Cohere,"This endpoint generates a succinct version of the original text that relays the most important information. -",2023-02-22,https://docs.cohere.ai/reference/summarize,,,,,,[],,,"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access). -",limited,"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use). -","Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/). -","The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -","All applications developed using the Cohere API is subject to review by Cohere. -","General feedback as well as the violations of the usage guidelines can be reported to Cohere at responsibility at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). -",,,,,unknown,generation,https://cohere.ai/terms-of-use,unknown,unknown,unknown -model,Cohere Embedv3 (English),Cohere,"As of release, Cohere Embedv3 is Cohere's latest and most advanced embeddings model.",2023-11-02,https://txt.cohere.com/introducing-embed-v3/,,text; text,unknown,,Achieves SOTA performances on trusted MTEB and BEIR benchmarks.,[],,,,limited,unknown,,,,https://huggingface.co/Cohere/Cohere-embed-english-v3.0/discussions,https://huggingface.co/Cohere/Cohere-embed-english-v3.0,unknown,unknown,unknown,,,,,, -model,Aya,"Cohere for AI, Cohere, Brown University, Carnegie Mellon University, MIT",Aya is a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced.,2024-02-12,https://arxiv.org/pdf/2402.07827.pdf,,text; text,13B parameters (dense),,Evaluated on standard LLM and multilingual benchmarks in comparison to SotA models.,"['mT5', 'Aya Dataset']",,,,open,Apache 2.0,,,unknown,https://huggingface.co/CohereForAI/aya-101/discussions,https://huggingface.co/CohereForAI/aya-101,unknown,unknown,unknown,,,,,, -model,Command-R,Cohere,Command-R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.,2024-03-11,https://txt.cohere.com/command-r/,,text; text,35B parameters (dense),,,[],,,,open,CC BY NC 4.0,,,,https://huggingface.co/CohereForAI/c4ai-command-r-v01/discussions,https://huggingface.co/CohereForAI/c4ai-command-r-v01,unknown,unknown,unknown,,,,,, -dataset,Aya Dataset,"Cohere for AI, Beijing Academy of Artificial Intelligence, Cohere, Binghamton University",The Aya Dataset is a dataset that consists of original human-curated prompt-completion pairs written by fluent speakers of 65 languages.,2024-02-09,https://arxiv.org/pdf/2402.06619.pdf,https://huggingface.co/datasets/CohereForAI/aya_dataset,text,204k human-annotated prompt-completion pairs,[],unknown,[],,,,open,Apache 2.0,,,unknown,https://huggingface.co/datasets/CohereForAI/aya_dataset/discussions,,,,,,,,,, -model,Rerank 3,Cohere,Rerank 3 is a new foundation model for efficient enterprise search and retrieval with 4k context length.,2024-04-11,https://cohere.com/blog/rerank-3,,text; text,unknown,,"Evaluated on code retrieval and data retrieval capabilities, with improvements compared to the standard in both.",[],,,,limited,unknown,Efficient enterprise search and retrieval.,,unknown,,,unknown,unknown,unknown,,,,,, -model,Grok-1,xAI,"Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy,",2023-11-04,https://grok.x.ai/,,text; text,314B parameters (dense),,Grok-1 was evaluated on a range of reasoning benchmark tasks and on curated foreign mathematic examination questions.,[],,,,open,Apache 2.0,"Grok-1 is intended to be used as the engine behind Grok for natural language processing tasks including question answering, information retrieval, creative writing and coding assistance.",,unknown,,https://x.ai/model-card/,unknown,unknown,unknown,,,,,, -model,Grok-1.5V,xAI,"Grok-1.5V is a first-generation multimodal model which can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.",2024-04-12,https://x.ai/blog/grok-1.5v,,"image, text; text",unknown,,"The model is evaluated in a zero-shot setting without chain-of-thought prompting. The evaluation domains include multi-disciplinary reasoning, understanding documents, science diagrams, charts, screenshots, photographs and real-world spatial understanding. The model shows competitive performance with existing frontier multimodal models.",[],,,,limited,unknown,"Grok-1.5V can be used for understanding documents, science diagrams, charts, screenshots, photographs. It can also translate diagrams into Python code.",unknown,unknown,,,unknown,unknown,unknown,,,,,, -application,Speak,Speak,Speak is an AI-powered language learning app focused on building the best path to spoken fluency and is the the fastest-growing English app in South Korea.,2023-03-01,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,['Whisper API'],,,,open,,,,,,,,,,,,,,, -model,OceanGPT,Zhejiang University,OceanGPT is the first-ever LLM in the ocean domain and displays expertise in various ocean science tasks.,2024-02-06,https://www.zjukg.org/project/OceanGPT/,,text; text,7B parameters (dense),,Evaluated on standard and ocean science benchmarks in comparison to other similar-sized models.,[],,,,open,MIT,,,unknown,https://huggingface.co/zjunlp/OceanGPT-7b/discussions,https://huggingface.co/zjunlp/OceanGPT-7b,unknown,7 days,6 A800 NVIDIA GPUs,,,,,, -model,BioMedLM,Stanford,,2022-12-15,https://crfm.stanford.edu/2022/12/15/pubmedgpt.html,,text; text,2.7B parameters (dense),,,['The Pile'],,,,open,bigscience-bloom-rail-1.0,,,,,,,,,,,,,, -model,RoentGen,Stanford,RoentGen is a generative medical imaging model that can create visually convincing X-ray images.,2022-11-23,https://arxiv.org/pdf/2211.12737.pdf,,text; image,330M parameters (dense),,Evaluated on own framework that tests domain-specific tasks in medical field.,"['Stable Diffusion', 'RoentGen radiology dataset']",,,,open,,,,,,,unknown,60k training steps per day,64 A100 GPUs,,,,,, -model,CORGI,Stanford,Model trained to generate language corrections for physical control tasks.,2023-06-12,https://arxiv.org/pdf/2306.07012.pdf,,human trajectories; text,124M parameters (dense),,"Evaluated on three physical control tasks, drawing, steering, and human body movement on various dynamics","['GPT-2', 'BABEL', 'text-davinci-003']",,,,open,MIT,,,,,,,unknown,one NVIDIA A40 GPU,,,,,, -dataset,Alpaca dataset,Stanford,"Alpaca dataset consistes of 52,000 instruction-following demonstrations generated in the style of the [Self-Instruct framework](https://github.com/yizhongw/self-instruct) using OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. -",2023-03-13,https://crfm.stanford.edu/2023/03/13/alpaca.html,https://huggingface.co/datasets/tatsu-lab/alpaca,text (English),52K instruction-following demonstrations,[],,['text-davinci-003'],,,,open,CC BY-NC 4.0,Alpaca is intended and licensed for research use only.,,,Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).,,,,,,,,,, -model,Alpaca,Stanford,"Alpaca-7B is an instruction-following model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. -",2023-03-13,https://crfm.stanford.edu/2023/03/13/alpaca.html,,text (English),7B parameters (dense model),,,"['LLaMa', 'Alpaca dataset']",,,,open,CC BY NC 4.0 (model weights),Alpaca is intended and licensed for research use only.,,,Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).,,unknown,,,,,,,, -dataset,AutoMathText,Math AI,AutoMathText is an extensive and carefully curated dataset encompassing around 200 GB of mathematical texts.,2024-02-12,https://github.com/yifanzhang-pro/AutoMathText,https://huggingface.co/datasets/math-ai/AutoMathText,text,200 GB,[],Mistral model fine-tuned on AutoMathText and evaluated on the MATH dataset.,"['OpenWebMath', 'RedPajama-Data', 'Algebraic Stack', 'Qwen']",,,,open,CC BY-SA 4.0,,,unknown,https://huggingface.co/datasets/math-ai/AutoMathText/discussions,,,,,,,,,, -model,Nous Hermes 2,Nous Research,Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM.,2024-01-10,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO,,"text; code, text",7B parameters (dense),,"Evaluated across standard benchmarks and generally performs better than Mixtral, which it was fine-tuned on.",['Mixtral'],,,unknown,open,Apache 2.0,,,unknown,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/discussions,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO,unknown,unknown,unknown,,,,,, -model,YaRN LLaMA 2,"Nous Research, EleutherAI, University of Geneva",YaRN LLaMA 2 is an adapted version of LLaMA 2 using the YaRN extension method.,2023-11-01,https://arxiv.org/pdf/2309.00071.pdf,,text; text,70B parameters (dense),,Evaluated across a variety of standard benchmarks in comparison to LLaMA 2.,['LLaMA 2'],,,,open,LLaMA 2,,,unknown,https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k/discussions,https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k,unknown,unknown,unknown,,,,,, -model,Nous Capybara,Nous Research,The Capybara series is a series of LLMs and the first Nous collection of models made by fine-tuning mostly on data created by Nous in-house.,2023-11-13,https://huggingface.co/NousResearch/Nous-Capybara-34B,,text; text,34B parameters (dense),,,['Yi'],,,,open,MIT,,,unknown,https://huggingface.co/NousResearch/Nous-Capybara-34B/discussions,https://huggingface.co/NousResearch/Nous-Capybara-34B,unknown,unknown,unknown,,,,,, -model,YaRN Mistral,"Nous Research, EleutherAI, University of Geneva",YaRN Mistral is an adapted version of Mistral using the YaRN extension method.,2023-11-01,https://arxiv.org/pdf/2309.00071.pdf,,text; text,7B parameters (dense),,Evaluated across a variety of standard benchmarks in comparison to Mistral.,['Mistral'],,,,open,MIT,,,unknown,https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k/discussions,https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k,unknown,unknown,unknown,,,,,, -model,OpenHermes 2.5 Mistral,Nous Research,"OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, trained on additional code datasets.",2023-11-03,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B,,text; text,7B parameters (dense),,Evaluated on common LLM benchmarks in comparison to other Mistral derivatives.,['Mistral'],,,,open,Apache 2.0,,,unknown,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B/discussions,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B,unknown,unknown,unknown,,,,,, -model,Hermes 2 Pro-Mistral,Nous,"Hermes 2 Pro on Mistral 7B is an upgraded, retrained version of Nous Hermes 2. This improved version excels at function calling, JSON Structured Outputs, and several other areas, scoring positively on various benchmarks.",2024-03-10,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,,text; text,7B parameters (dense),,"The model was examined across a range of benchmarks including GPT4All, AGIEval, BigBench, TruthfulQA and in-house evaluations of function calling and JSON mode.","['Mistral', 'OpenHermes 2.5 Dataset', 'Nous Hermes 2']",,,"The model was evaluated across multiple tasks, displaying notable scores in GPT4All, AGIEval, BigBench, and TruthfulQA. It also has a high score on function calling and JSON mode, indicating the robustness of its capabilities.",open,Apache 2.0,"The model is intended for general task and conversation capabilities, function calling, and JSON structured outputs.",unknown,unknown,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B/discussions,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,unknown,unknown,unknown,,,,,, -model,Genstruct,Nous,"Genstruct is an instruction-generation model, designed to create valid instructions given a raw text corpus. This enables the creation of new, partially synthetic instruction finetuning datasets from any raw-text corpus. This work was inspired by Ada-Instruct and the model is also trained to generate questions involving complex scenarios that require detailed reasoning.",2024-03-07,https://huggingface.co/NousResearch/Genstruct-7B,,text; text,7B parameters (dense),,unknown,[],,,unknown,open,Apache 2.0,"The model is intended for instruction-generation, creating questions involving complex scenarios and generating reasoning steps for those questions.",unknown,unknown,https://huggingface.co/NousResearch/Genstruct-7B/discussions,https://huggingface.co/NousResearch/Genstruct-7B,unknown,unknown,unknown,,,,,, -model,Megatron-LM,NVIDIA,Megatron-LM is an autoregressive language model,2021-04-09,https://arxiv.org/abs/2104.04473,,text; text,1T parameters (dense),,,[],,,unknown,closed,unknown,,,,,,unknown,84 days,3072 A100 GPUs,,,,,, -dataset,MineDojo,NVIDIA,,2022-06-17,https://arxiv.org/abs/2206.08853,,"text, video","730k videos, 6k Wikipedia pages, 340k reddit posts",[],,"['YouTube', 'Wikipedia', 'Reddit']",,,,open,MIT,,,,,,,,,,,,,, -dataset,VIMA dataset,"NVIDIA, Stanford",,2022-10-06,https://vimalabs.github.io/,,"image, text",200M parameters (dense model),[],,"['T5', 'Mask R-CNN', 'VIMA dataset']",,,,open,MIT,,,,,,,,,,,,,, -model,VIMA,"NVIDIA, Stanford",,2022-10-06,https://vimalabs.github.io/,,"image, text; robotics trajectories",200M parameters (dense),,,[],,,,open,MIT,,,,,,,,,,,,,, -model,Nemotron 4,Nvidia,Nemotron 4 is a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens.,2024-02-27,https://arxiv.org/pdf/2402.16819.pdf,,"text; code, text",15B parameters (dense),,"Evaluated on standard LLM benchmarks across a range of fields like reasoning, code generation, and mathematical skills.",[],,,Deduplication and quality filtering techniques are applied to the training dataset.,open,unknown,,,unknown,,,unknown,13 days,3072 H100 80GB SXM5 GPUs across 384 DGX H100 nodes,,,,,, -model,BigTrans,Institute of Automation Chinese Academy of Sciences,BigTrans is a model which adapts LLaMA that covers only 20 languages and enhances it with multilingual translation capability on more than 100 languages,2023-05-29,https://arxiv.org/pdf/2305.18098v1.pdf,,text; text,13B parameters (dense),,Reports results on standard translation benchmarks across 102 languages in comparison with Google Translate and ChatGPT,"['LLaMA', 'CLUE', 'BigTrans parallel dataset']",,,,open,Apache 2.0,Advancing future research in multilingual LLMs,,,https://huggingface.co/James-WYang/BigTrans/discussions,https://huggingface.co/James-WYang/BigTrans,unknown,unknown,16 A100 GPUs with 80 GB of RAM,,,,,, -model,YAYI 2,Institute of Automation Chinese Academy of Sciences,YAYI 2 is an open source large language model trained in both English and Chinese.,2023-12-22,https://arxiv.org/pdf/2312.14862.pdf,,text; text,30B parameters (dense),,"Evaluated on standard benchmarks for knowledge and language understanding, mathematical reasoning, and programming ability in comparison to similarly sized open-source models.",[],,,"data is deduplicated, normalized, cleaned, and filtered for toxicity",open,custom,,,,https://huggingface.co/wenge-research/yayi2-30b/discussions,https://huggingface.co/wenge-research/yayi2-30b,unknown,unknown,over 1000 A800 GPUs,,,,,, -model,YaLM,Yandex,YaLM is a 100B parameter autoregressive model trained on 25% English and 75% Russian text.,2022-06-22,https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6,,text; text,100B parameters (dense),,,"['The Pile', 'Yandex Russian Pretraining Dataset']",,,,open,Apache 2.0,,,,,,,,Yandex 800 A100 Cluster,,,,,, -application,Yandex Search,Yandex,Yandex is a search engine and web portal. Yandex offers internet search and other services,2022-06-23,https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6,,,,,,['YaLM'],,,,open,custom,,,,,,,,,,,https://yandex.com/legal/browser_agreement/,,, -application,Continue,"Continue Dev, Inc.",Continue is the open-source autopilot for software development. It is an IDE extension that brings the power of ChatGPT to VS Code and JetBrains. It’s built to be deeply customizable and continuously learn from development data.,2023-07-26,https://continue.dev,,,,,,"['GPT-4 API', 'Code Llama', 'Claude API', 'WizardCoder', 'PaLM API']",,,,open,Apache 2.0,,,,,,,,,,,https://github.com/continuedev/continue/blob/main/LICENSE,,, -model,GOAT,National University of Singapore,GOAT is a fine-tuned LLaMA model which uses the tokenization of numbers to significantly outperform benchmark standards on a range of arithmetic tasks.,2023-05-23,https://arxiv.org/pdf/2305.14201.pdf,,text; text,7B parameters (dense),,"Performance assessed on BIG-bench arithmetic sub-task, and various elementary arithmetic tasks.","['LLaMA', 'GOAT dataset']",,,Number data is randomly generated from log space to reduce likelihood of redundancy and range of magnitudes.,open,Apache 2.0,Integration into other instruction-tuned LLMs to further enhance arithmetic reasoning abilities in solving math word problems.,,,,,unknown,unknown,24 GB VRAM GPU,,,,,, -model,OpenMoE,"National University of Singapore, University of Edinburgh, ETH Zurich",OpenMoE is a series of fully open-sourced and reproducible decoder-only MoE LLMs.,2024-01-12,https://github.com/XueFuzhao/OpenMoE,,text; text,34B parameters (dense),,Evaluated on relatively simple established benchmarks.,"['RedPajama', 'The Stack']",,,unknown,open,Apache 2.0,,,unknown,https://huggingface.co/OrionZheng/openmoe-base/discussions,https://huggingface.co/OrionZheng/openmoe-base,unknown,unknown,unknown,,,,,, -model,Baichuan 2,Baichuan Inc.,"Baichuan 2 is a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.",2023-09-20,https://arxiv.org/pdf/2309.10305.pdf,,text; text,13B parameters (dense),,"Evaluated on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.",[],,,,open,Apache 2.0,,,,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions,,unknown,unknown,1024 NVIDIA A800 GPUs,,,,,, -dataset,Wu Dao dataset,Beijing Academy of Artificial Intelligence,,2021-01-12,https://www.tsinghua.edu.cn/en/info/1420/10473.htm,,"image, text",unknown,[],,[],,,,closed,unknown,,,,,,,,,,,,,, -model,Wu Dao 2.0,Beijing Academy of Artificial Intelligence,,2021-01-12,https://www.tsinghua.edu.cn/en/info/1420/10473.htm,,"image, text",1.75T parameters (dense),,,['Wu Dao dataset'],,,,closed,unknown,,,,,,,,,,,,,, -model,JudgeLM,Beijing Academy of Artificial Intelligence,JudgeLM is a fine-tuned to be a scalable judge to evaluate LLMs efficiently and effectively in open-ended benchmarks.,2023-10-26,https://arxiv.org/pdf/2310.17631.pdf,,text; text,13B parameters (dense),,Evaluated on objective and reliability metrics.,"['Vicuna', 'JudgeLM Dataset']",,,,open,Apache 2.0,Research on evaluating the performance of large language models and chatbots.,,,https://huggingface.co/BAAI/JudgeLM-13B-v1.0/discussions,https://huggingface.co/BAAI/JudgeLM-13B-v1.0,unknown,unknown,8 A100 40GB NVIDIA GPUs,,,,,, -dataset,JudgeLM Dataset,Beijing Academy of Artificial Intelligence,"JudgeLM Dataset is a novel dataset replete with a rich variety of seed tasks, comprehensive answers from modern LLMs, answers’ grades from the teacher judge, and detailed reasons for judgments.",2023-10-26,https://huggingface.co/datasets/BAAI/JudgeLM-100K,,"text, text",105k judge samples,[],,"['Alpaca', 'GPT-4', 'Dolly', 'ShareGPT', 'LLaMA', 'Vicuna']",,,,open,CC BY NC 4.0,To be used to conduct instruction-tuning for language models and make the language model able to judge open-ended answer pairs.,,,https://huggingface.co/datasets/BAAI/JudgeLM-100K/discussions,,,,,,,,,, -model,SegMamba,"Hong Kong University of Science and Technology (Guangzhou + original), Beijing Academy of Artificial Intelligence","SegMamba is a novel 3D medical image Segmentation Mamba model, designed to effectively capture long-range dependencies within whole volume features at every scale.",2024-01-25,https://arxiv.org/pdf/2401.13560v2.pdf,,image; text,unknown,,Compared to other segmentation models across different modalities on BraTS2023 dataset.,[],,,unknown,open,Apache 2.0,,,unknown,,,unknown,1000 epochs,4 NVIDIA A100 GPUs,,,,,, -model,BGE M3 Embedding,"Beijing Academy of Artificial Intelligence, University of Science and Technology of China","BGE M3 Embedding is a new embedding model that can support more than 100 working languages, leading to new state-of-the-art performances on multi-lingual and cross-lingual retrieval tasks.",2024-02-05,https://arxiv.org/pdf/2402.03216.pdf,,text; text,unknown,,"Evaluated on standard datasets in multilingual, cross-lingual, long document retrieval, and Q&A domains.",[],,,unknown,open,MIT,,,unknown,https://huggingface.co/BAAI/bge-m3/discussions,https://huggingface.co/BAAI/bge-m3,unknown,"20,000 steps",32 A100 40GB GPUs,,,,,, -model,EVA-CLIP,"Beijing Academy of Artificial Intelligence, Tsinghua University","As of release, EVA-CLIP is the largest and most powerful open-source CLIP model to date, with 18 billion parameters.",2024-02-06,https://arxiv.org/pdf/2402.04252.pdf,,"image, text; text",18B parameters (dense),,Evaluated on zero-shot classification performance across multiple image classification benchmarks.,['CLIP'],,,,open,MIT,,,unknown,https://huggingface.co/BAAI/EVA-CLIP-8B-448/discussions,https://huggingface.co/BAAI/EVA-CLIP-8B-448,unknown,unknown,384 A100 40GB GPUs,,,,,, -dataset,Luminous dataset,Aleph Alpha,The dataset used to train the Luminous models.,2022-04-14,,,text,unknown,[],unknown,[],unknown,unknown,unknown,closed,unknown,unknown,unknown,,,,,,,,,,,, -model,Luminous,Aleph Alpha,Luminous is a family of multilingual language models,2022-04-14,https://twitter.com/Aleph__Alpha/status/1514576711492542477,,text; text,200B parameters (dense),,,['Luminous dataset'],,,,limited,,,,,,,unknown,unknown,unknown,,,,,, -application,Aleph Alpha API,Aleph Alpha,The Aleph Alpha API serves a family of text-only language models (Luminous) and multimodal text-and-image models (Magma).,2021-09-30,https://www.aleph-alpha.com/,,,,,,['Luminous'],,,,limited,,unknown,unknown,unknown,unknown,,,,,,The text models provide text outputs given text inputs. The multimodal models provide text completions given text and image inputs.,https://www.aleph-alpha.com/terms-conditions,unknown,unknown,unknown -model,MAGMA,Aleph Alpha,An autoregressive VL model that is able to generate text from an arbitrary combination of visual and textual input,2022-10-24,https://arxiv.org/pdf/2112.05253.pdf,,"image, text; text",6B parameters (dense),,Evaluated on the OKVQA benchmark as a fully open-ended generative task.,"['GPT-J', 'CLIP']",,,,open,MIT,,,,,,,,32 A100 GPUs,,,,,, -application,Robin AI,Robin AI,"Robin AI uses Claude and Anthropic's models to understand language - including in technical domains like legal language. It's also very confident at drafting, summarising, translations, and explaining complex concepts in simple terms",,https://www.robinai.co.uk/,,,,,,['Anthropic API'],,,,limited,,,,,,,,,,,,https://www.robinai.co.uk/terms,,, -application,Juni Tutor Bot,Juni Learning,An online tutoring solution to help students achieve academic success.,,https://junilearning.com/,,,,,,['Anthropic API'],,,,limited,unknown,,,,,,,,,,,,,, -dataset,LAION-1B,Alibaba,,2023-02-20,https://arxiv.org/pdf/2302.09778.pdf,,"image, text",1B image-text pairs,[],,['LAION-5B'],,"We eliminate duplicates, low resolution images, and images potentially contain harmful content from the LAION dataset.",,closed,unknown,,,,,,,,,,,,,, -model,Composer,Alibaba,,2023-02-20,https://arxiv.org/pdf/2302.09778.pdf,,"image, text; image",4.4B parameters (dense),,,"['ImageNet', 'WebVision', 'LAION-1B']",,,,closed,unknown,,,,,,,,,,,,,, -model,Qwen,Alibaba,"QWEN is a comprehensive language model series that encompasses distinct models with varying parameter counts. Qwen series, now including Qwen, the base language models, namely Qwen-7B and Qwen-14B, as well as Qwen-Chat, the chat models, namely Qwen-7B-Chat and Qwen-14B-Chat. ",2023-08-03,https://arxiv.org/abs/2309.16609,,"image, text; text",14B parameters (dense),,"Evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, etc.",[],,,"They filter out low-quality data, they employ a combination of rule-based and machine-learning-based methods. Specifically, they use multiple models to score the content, including language models, text-quality scoring models, and models for identifying potentially offensive or inappropriate content. They also manually sample texts from various sources and review them to ensure their quality. To further enhance the quality of our data, they selectively up-sample data from certain sources, to ensure that our models are trained on a diverse range of high-quality content.",open,custom,,,"Governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. And The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.",,https://huggingface.co/Qwen,unknown,,,,,,,, -model,Qwen 1.5,Alibaba,"Qwen 1.5 is the next iteration in their Qwen series, consisting of Transformer-based large language models pretrained on a large volume of data, including web texts, books, codes, etc.",2024-02-04,https://qwenlm.github.io/blog/qwen1.5/,,text; text,72B parameters (dense),,"Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated on Chatbot Arena, AlpacaEval, MT-Bench, etc.",[],,,unknown,open,custom,,,unknown,https://huggingface.co/Qwen/Qwen1.5-72B/discussions,https://huggingface.co/Qwen/Qwen1.5-72B,unknown,unknown,unknown,,,,,, -model,Qwen 1.5 MoE,Qwen Team,"Qwen 1.5 is the next iteration in their Qwen series, consisting of Transformer-based large language models pretrained on a large volume of data, including web texts, books, codes, etc. Qwen 1.5 MoE is the MoE model of the Qwen 1.5 series.",2024-03-28,https://qwenlm.github.io/blog/qwen-moe/,,text; text,14B parameters with 2.7B parameters for activation (MoE),,"Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated on Chatbot Arena, AlpacaEval, MT-Bench, etc.",[],,,unknown,open,custom,,,unknown,https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B/discussions,https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B,unknown,unknown,unknown,,,,,, -model,SeaLLM v2.5,"DAMO Academy, Alibaba",SeaLLM v2.5 is a multilingual large language model for Southeast Asian (SEA) languages.,2024-04-12,https://github.com/DAMO-NLP-SG/SeaLLMs,,text; text,7B parameters,,"The model was evaluated on 3 benchmarks (MMLU for English, M3Exam (M3e) for English, Chinese, Vietnamese, Indonesian, and Thai, and VMLU for Vietnamese) and it outperformed GPT-3 and Vistral-7B-chat models across these benchmarks in the given languages.",['Gemma'],,,"Despite efforts in red teaming and safety fine-tuning and enforcement, the creators suggest, developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations.",open,custom,"The model is intended for multilingual tasks such as knowledge retrieval, math reasoning, and instruction following. Also, it could be used to provide multilingual assistance.","The model should not be used in a way that could lead to inaccurate, misleading or potentially harmful generation. Users should comply with local laws and regulations when deploying the model.",unknown,https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5/discussions,https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5,unknown,unknown,unknown,,,,,, -dataset,OpenWebMath,University of Toronto,"OpenWebMath is an open dataset containing 14.7B tokens of mathematical webpages from Common Crawl, inspired by Minerva.",2023-10-10,https://arxiv.org/pdf/2310.06786.pdf,Can be found at section E of https://arxiv.org/pdf/2310.06786.pdf,"text, mathematical tokens",14.7B documents,[],Compared models trained on OpenWebMath for 1 epoch to models trained on The Pile and ProofPile on mathematics benchmarks.,['Common Crawl'],,,"Documents are filtered, processed for mathematical value, deduplicated, and then the largest documents are manually inspected for quality.",open,ODC-By 1.0,"Language model pretraining, finetuning, and evaluation.",Any tasks which may considered irresponsible or harmful.,,https://huggingface.co/datasets/open-web-math/open-web-math/discussions,,,,,,,,,, -model,Orion,OrionStarAI,Orion series models are open-source multilingual large language models trained from scratch by OrionStarAI.,2024-01-20,https://github.com/OrionStarAI/Orion,,text; text,14B parameters (dense),,Evaluated on multilingual and NLP benchmarks in comparison with SoTA models of comparable size.,[],,,unknown,open,custom,,,unknown,https://huggingface.co/OrionStarAI/Orion-14B-Base/discussions,https://huggingface.co/OrionStarAI/Orion-14B-Base,unknown,unknown,unknown,,,,,, -model,SambaLingo,Samba Nova Systems,SambaLingo is a suite of models that adapt Llama 2 to a diverse set of 9 languages.,2024-02-26,https://sambanova.ai/blog/sambalingo-open-source-language-experts,,text; text,unknown,,Evaluated on open source multilingual model benchmarks.,['Llama 2'],,,,open,LLaMA 2,,"SambaLingo should not be used for mission-critical applications, applications involving the safety of others, and highly critical decisions.",,https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base/discussions,https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base,unknown,unknown,unknown,,,,,, -model,Samba 1,Samba Nova Systems,Samba 1 is a trillion parameter generative AI model using a Composition of Experts architecture.,2024-02-28,https://sambanova.ai/blog/samba-1-composition-of-experts-mode,,text; text,1T parameters (dense),,unknown,"['Llama 2', 'Mistral', 'Falcon-180B', 'Deepseek', 'BLOOM', 'LLaVA', 'CLIP']",,,,limited,unknown,,,unknown,,,unknown,unknown,unknown,,,,,, -dataset,LP-MusicCaps,South Korea Graduate School of Culture Technology,LP-MusicCaps is a LLM-based pseudo music caption dataset.,2023-07-31,https://arxiv.org/pdf/2307.16372.pdf,,,2.2M captions paired with 0.5M audio clips,[],"Evaluated in comparison to the MusicCaps dataset and with respect to n-gram, neural metrics.","['MusicCaps', 'Million Song Dataset', 'Magnatagtune']",,,,open,CC-BY-NC-4.0,,,,,,,,,,,,,, -model,SciPhi Mistral,SciPhi,SciPhi Mistral is a Large Language Model (LLM) fine-tuned from Mistral.,2023-11-07,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k,,text; text,7B parameters (dense),,,['Mistral'],,,,open,MIT,,,unknown,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k/discussions,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k,unknown,unknown,unknown,,,,,, -model,Notus,Argilla,"Notus is an open source LLM, fine-tuned using Direct Preference Optimization (DPO) and AIF (AI Feedback) techniques.",2023-12-01,https://argilla.io/blog/notus7b/,,text; text,7B parameters (dense),,Evaluated on MT-Bench and AlphaEval benchmarks.,"['UltraFeedback', 'Zephyr']",,,,open,MIT,Academic research and free commercial usage,,,https://huggingface.co/argilla/notus-7b-v1/discussions,https://huggingface.co/argilla/notus-7b-v1,unknown,unknown,8 x A100 40GB GPUs,,,,,, -model,Amber,LLM360,"Amber is the first model in the LLM360 family, an initiative for comprehensive and fully open-sourced LLMs, where all training details, model checkpoints, intermediate results, and additional analyses are made available to the community.",2023-12-12,https://www.llm360.ai/,,text; text,7B parameters (dense),,Evaluated on several benchmark LLM tasks,"['Arxiv', 'Books', 'C4', 'RefinedWeb', 'StarCoder', 'StackExchange', 'Wikipedia']",,,,open,Apache 2.0,to support open and collaborative AI research by making the full LLM training process transparent.,,unknown,https://huggingface.co/LLM360/Amber/discussions,https://huggingface.co/LLM360/Amber,unknown,unknown,"56 DGX A100 nodes, each equipped with 4 80GB A100 GPUs",,,,,, -model,CrystalCoder,LLM360,CrystalCoder is a language model with a balance of code and text data that follows the initiative under LLM360 of its training process being fully transparent.,2023-12-12,https://www.llm360.ai/,,"text; code, text",7B parameters (dense),,"Evaluated on English and coding tasks and benchmarks, and outperforms LLaMA 2 in some.","['SlimPajama', 'StarCoder']",,,,open,Apache 2.0,to support open and collaborative AI research by making the full LLM training process transparent.,,unknown,https://huggingface.co/LLM360/CrystalCoder/discussions,https://huggingface.co/LLM360/CrystalCoder,unknown,unknown,"Trained on the Cerebras Condor Galaxy 1 (CG-1), a 4 exaFLOPS, 54 million core, 64-node cloud AI supercomputer.",,,,,, -application,Duolingo Explain My Answer,Duolingo,"Explain My Answer offers learners the chance to learn more about their response in a lesson (whether their answer was correct or incorrect!) By tapping a button after certain exercise types, learners can enter a chat with Duo to get a simple explanation on why their answer was right or wrong, and ask for examples or further clarification.",2023-03-14,https://blog.duolingo.com/duolingo-max/,,,,,,['GPT-4 API'],,,,limited,custom,,,,,,,,,,,https://www.duolingo.com/terms,,, -application,Duolingo Max,Duolingo,Duolingo Max is a new subscription tier above Super Duolingo that gives learners access to two brand-new features and exercises - Explain My Answer and Roleplay.,2023-03-14,https://blog.duolingo.com/duolingo-max/,,,,,,"['Duolingo Role Play', 'Duolingo Explain My Answer']",,,,limited,custom,,,,,,,,,,,,,, -application,Duolingo Role Play,Duolingo,"Roleplay allows learners to practice real-world conversation skills with world characters in the app. These challenges, which earn XP, will live alongside the path as one of the “Side Quests” learners can access by tapping on the character. What will you talk about? We’ll guide you through different scenarios! Learners might discuss future vacation plans with Lin, order coffee at a café in Paris, go furniture shopping with Eddy, or ask a friend to go for a hike.",2023-03-14,https://blog.duolingo.com/duolingo-max/,,,,,,['GPT-4 API'],,,,limited,custom,,,,,,,,,,,https://www.duolingo.com/terms,,, +" +dataset,M3W,Google Deepmind,"M3W (MassiveWeb) is dataset used to train Flamingo, and other vision-language models and was created by researchers and engineers. +",2022-04-29,https://arxiv.org/pdf/2204.14198.pdf,,"code, text",,"182GB Text, 185M Images",[],,,,"The authors provide a basic description of data processing and cleaning. +",closed,unknown,"Pre-training of vision and language models by DeepMind researchers [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F). +",unknown,unknown,,,,,,,,https://arxiv.org/pdf/2204.14198.pdf#appendix.F,[],"M3W has interleaved images (185M) and text (182GB) from the web. +",unknown +dataset,Gato dataset,Google Deepmind,"The Gato datasets are a collection of data used to train the Gato model. +",2022-05-12,https://www.deepmind.com/blog/a-generalist-agent,,"image, text, robotics trajectories, simulated control tasks","The Gato dataset compiles many datasets introduced in prior works, with associated analyses. +","10.5 TB Text, 2.2B Text-Image pairs, 1.5T tokens of simulated control, 500k robotics trajectories",['MassiveText'],,,,unknown,closed,unknown,unknown,unknown,,,,,,,,,,[],"The full composition of the dataset across individual sources can be found in the paper. +", +model,AlphaFold2,Google Deepmind,AlphaFold2 is a protein language model trained on protein sequences,2021-07-15,https://www.nature.com/articles/s41586-021-03819-2,,amino acid sequence; protein structure,,93M parameters (dense),['Protein Data Bank'],,11 days,128 TPUv3 cores,,open,Apache 2.0,,,,,,,,,,,,,, +model,Flamingo,Google Deepmind,"Flamingo is a Visual Language Model using the Transformer architecture that is intended for few-shot learning. +",2022-04-29,https://arxiv.org/pdf/2204.14198.pdf,https://arxiv.org/pdf/2204.14198.pdf#appendix.E,"image, text; text","Model performance was evaluated on image and video datasets primarily, including dialogue. +",80B parameters (dense),"['M3W', 'ALIGN', 'LTIP', 'VTP', 'Chinchilla']",unknown,15 days on 1536 TPUs,TPU,,closed,unknown,"The intended uses are stated in the model card: ""The primary use is research on visual language models (VLM), including: research on VLM applications like classification, captioning or visual question answering, understanding how strong VLMs can contribute to AGI, advancing fairness and safety research in the area of multimodal research, and understanding limitations of current large VLMs."" [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E). +","The model card lists the following as out of scope uses of the model: ""Uses of the model for visually conditioned language generation in harmful or deceitful settings. Broadly speaking, the model should not be used for downstream applications without further safety and fairness mitigations specific to each application."" [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E). +",unknown,,,,,,,,,,, +model,AlphaCode,Google Deepmind,AlphaCode is an autoregressive language model trained on code,2022-02-02,https://arxiv.org/abs/2203.07814,,text; code,,41B parameters (dense),[],,,,,closed,unknown,,,,,,,,,,,,,, +model,Gopher,Google Deepmind,"Gopher is an autoregressive language model based on the Transformer architecture with two modifications: using RMSNorm instead of LayerNorm and using relative positional encoding scheme instead of absolute positional encodings [[Section 3]](https://arxiv.org/pdf/2112.11446.pdf#subsection.3.1). +",2021-12-08,https://arxiv.org/pdf/2112.11446.pdf,https://arxiv.org/pdf/2112.11446.pdf#appendix.B,text; code,"Model performance was evaluated and analyzed on 152 NLP tasks including: Language Modelling (20), Reading Comprehension (3), Fact Checking (3), Question Answering (3), Common Sense (4), MMLU (57), BIG-bench (62) [[Section 4]](https://arxiv.org/pdf/2112.11446.pdf#section.4); on toxicity and bias datasets [[Section 5]](https://arxiv.org/pdf/2112.11446.pdf#section.5); and on dialogue tasks [[Section 6]](https://arxiv.org/pdf/2112.11446.pdf#section.6). +",280B parameters (dense),['MassiveText'],380 tCO2e,7303.24 petaflop/s-day,TPUv3 pods,,closed,unknown,"The intended uses are stated in the Gopher model card: ""The primary use is research on language models, including: research on NLP applications like machine translation and question answering, understanding how strong language models can contribute to AGI, advancing fairness and safety research, and understanding limitations of current LLMs"" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B). +","The model card lists the following as out of scope uses of the model: ""for language generation in harmful or deceitful settings. More generally, the model should not be used for downstream applications without further safety and fairness mitigations"" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B). +",unknown,"The feedback for the model can be provided at the email linked in the model card, geoffreyi at google.com [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B). +",,,,,,,,,, +model,Chinchilla,Google Deepmind,"Chinchilla is an autoregressive language model based on the Transformer architecture with improved scaling laws. +",2022-03-29,https://arxiv.org/pdf/2203.15556.pdf,https://arxiv.org/pdf/2203.15556.pdf,text; code,"Model performance was evaluated and analyzed on many NLP tasks including language modeling, reading comprehension, question answering, commonsense-intensive tasks, and the BIG-Bench and MMLU meta-benchmarks. +",70B parameters (dense),['MassiveText'],Unknown,7303.24 petaflop/s-day,TPUv3/TPUv4 pods,,closed,unknown,"The intended uses are stated in the Chinchilla model card: ""The primary use is research on language models, including: research on the scaling behaviour of language models along with those listed in Gopher paper"" [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I). +","The model card lists the following as out of scope uses of the model: ""for language generation in harmful or deceitful settings. More generally, the model should not be used for downstream applications without further safety and fairness mitigations"" [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I). +",Unknown,"The feedback for the model can be provided at the email linked in the model card, {jordanhoffmann, sborgeaud, amensch,sifre} at deepmind.com [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I). +",,,,,,,,,, +model,Gato,Google Deepmind,"Gato is a generalist agent based on sequence modeling using the Transformer architecture to implement multi-modal, multi-task, multi-embodiment generalist policy. +",2022-05-12,https://www.deepmind.com/blog/a-generalist-agent,https://openreview.net/pdf?id=1ikK0kHjvj#appendix.B,"image, text; image, text, robotics trajectories","Model performance was evaluated on simulated and robotics task primarily, including out-of-distribution and skill generalization. +",1.2B parameters (dense),['Gato dataset'],unknown,4 days on a 16x16 TPU v3 slice,16x16 TPU v3 slice,,closed,unknown,"The intended uses are stated in the Gopher model card: ""Learn to accomplish a wide variety of tasks from expert demonstrations, such as playing video games, controlling simulated embodiments, and real world block stacking."" [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A). +","The model card lists the following as out of scope uses of the model: ""Not intended for commercial or production use. Military uses are strictly prohibited."" [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A). +",unknown,"The feedback for the model can be provided at the email linked in the model card, reedscot at google.com [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A). +",,,,,,,,,, +model,Sparrow,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text; text,,70B parameters (dense),"['Chinchilla', 'Google Search', 'Sparrow Rule reward model', 'Sparrow Preference reward model']",,,,,closed,unknown,,,,,,,,,,,,,, +model,RETRO,Google Deepmind,,2021-12-08,https://arxiv.org/abs/2112.04426,,text; text,,7.5B parameters (dense),['MassiveText'],,,,,closed,unknown,,,,,,,,,,,,,, +model,Sparrow Rule reward model,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text; text,,70B parameters (dense),"['Chinchilla', 'Sparrow adversarial probing dataset']",,,,,closed,unknown,,,,,,,,,,,,,, +model,Sparrow Preference reward model,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text; text,,70B parameters (dense),"['Chinchilla', 'Sparrow response preference dataset']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,Sparrow adversarial probing dataset,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text,,27k ratings,['Chinchilla'],,,,,closed,unknown,,,,,,,,,,,,[],, +dataset,Sparrow response preference dataset,Google Deepmind,,2022-09-28,https://arxiv.org/abs/2209.14375,,text,,72k comparisons,['Chinchilla'],,,,,closed,unknown,,,,,,,,,,,,[],, +model,GopherCite,Google Deepmind,,2022-03-16,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,text; text,,280B parameters (dense),"['Gopher', 'Google Search', 'GopherCite reward model']",,,,,closed,unknown,,,,,,,,,,,,,, +model,GopherCite reward model,Google Deepmind,,2022-03-16,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,text; text,,7B parameters (dense),"['Gopher', 'GopherCite Preference dataset']",,,,,closed,unknown,,,,,,,,,,,,,, +dataset,GopherCite Preference dataset,Google Deepmind,,2022-03-16,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,text,,33k response pairs,"['Gopher', 'Google Search']",,,,,closed,unknown,,,,,,,,,,,,[],, +model,Dramatron,Google Deepmind,,2022-09-29,https://arxiv.org/abs/2209.14958,,text; text,,70B parameters (dense),['Chinchilla'],,,,,closed,unknown,,,,,,,,,,,,,, +model,RT-2,Google Deepmind,RT-2 is a vision-language-action model for robotic actions that incorporates chain of thought reasoning.,2023-07-28,https://arxiv.org/pdf/2307.15818.pdf,,"text, video; text, robotics trajectories",Evaluated on evaluation trajectories and SoTA baselines using robotic data.,55B parameters (dense),"['PaLI-X', 'PaLM-E', 'RT-2 action tokens']",,,,,open,unknown,,,,,,,,,,,,,, +model,Lyria,Google Deepmind,Lyria is DeepMind's most advanced AI music generation model to date.,2023-11-16,https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/,,text; music,unknown,unknown,[],unknown,unknown,unknown,worked with artists and music industry to ensure utility,closed,unknown,,,,,,,,,,,,,, +model,Genie,Google DeepMind,"Gene is a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.",2024-02-23,https://sites.google.com/view/genie-2024,,image; video,Evaluated using only out-of-distribution image prompts for qualitative results.,11B parameters (dense),[],unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +model,Imagen 3,Google DeepMind,"Imagen 3 is a high-quality text-to-image model, capable of generating images with better detail, richer lighting, and fewer distracting artifacts compared to previous models. Improved understanding of prompts allows for a wide range of visual styles and captures small details from longer prompts. It also understands prompts written in natural, everyday language, making it easier to use. Imagen 3 is available in multiple versions, optimized for different types of tasks, from generating quick sketches to high-resolution images.",2024-05-14,https://deepmind.google/technologies/imagen-3/,,text; image,"The model was tested and evaluated on various prompts to assess its understanding of natural language, its ability to generate high-quality images in various formats and styles and generate fine details and complex textures. Red teaming and evaluations were conducted on topics including fairness, bias, and content safety.",unknown,[],unknown,unknown,unknown,"Extensive filtering and data labeling were used to minimize harmful content in datasets and reduce the likelihood of harmful outputs. Privacy, safety, and security technologies were leveraged in deploying the model, including watermarking tool SynthID.",limited,unknown,"Generate high-quality images for various purposes, from photorealistic landscapes to textured oil paintings or whimsical claymation scenes. It is useful in situations where detailed visual representation is required based on the textual description.",unknown,Through digital watermarking tool SynthID embedded in pixels for detection and identification.,unknown,,,,,,,,,, +model,Veo,Google DeepMind,"Veo is Google DeepMind's most capable video generation model to date. It generates high-quality, 1080p resolution videos that can go beyond a minute, in a wide range of cinematic and visual styles. It accurately captures the nuance and tone of a prompt, and provides an unprecedented level of creative control. The model is also capable of maintaining visual consistency in video frames, and supports masked editing.",2024-05-14,https://deepmind.google/technologies/veo/,,text; video,unknown,unknown,[],unknown,unknown,unknown,"Videos created by Veo are watermarked using SynthID, DeepMinds tool for watermarking and identifying AI-generated content, and passed through safety filters and memorization checking processes to mitigate privacy, copyright and bias risks.",closed,unknown,"Veo is intended to help create tools that make video production accessible to everyone. It can be used by filmmakers, creators, or educators for storytelling, education and more. Some of its features will be also brought to products like YouTube Shorts.",unknown,unknown,Feedback from leading creators and filmmakers is incorporated to improve Veo's generative video technologies.,,,,,,,,,, +model,Gemini 1.5 Flash,Google DeepMind,"Gemini Flash is a lightweight model, optimized for speed and efficiency. It features multimodal reasoning and a breakthrough long context window of up to one million tokens. It's designed to serve at scale and is efficient on cost, providing quality results at a fraction of the cost of larger models.",2024-05-30,https://deepmind.google/technologies/gemini/flash/,,"audio, image, text, video; text","The model was evaluated on various benchmarks like General MMLU, Code Natural2Code, MATH, GPQA, Big-Bench, WMT23, MMMU, and MathVista providing performance across various domains like multilingual translation, image processing, and code generation.",unknown,[],unknown,unknown,unknown,The research team is continually exploring new ideas at the frontier of AI and building innovative products for consistent progress.,limited,Googles Terms and Conditions,"The model is intended for developer and enterprise use cases. It can process hours of video and audio, and hundreds of thousands of words or lines of code, making it beneficial for a wide range of tasks.",,unknown,,,,,,,,,,, +model,A.X,SK Telecom,"A.X is SK Telecom's proprietary LLM, which has been trained on the Korean language.",2023-09-26,https://www.sktelecom.com/en/press/press_detail.do?idx=1582,,text; text,,39B parameters,[],unknown,unknown,unknown,,closed,unknown,,,,,,,,,,,,,, +model,Nous Hermes 2,Nous Research,Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM.,2024-01-10,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO,"text; code, text","Evaluated across standard benchmarks and generally performs better than Mixtral, which it was fine-tuned on.",7B parameters (dense),['Mixtral'],unknown,unknown,unknown,unknown,open,Apache 2.0,,,unknown,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/discussions,,,,,,,,,, +model,YaRN LLaMA 2,"Nous Research, EleutherAI, University of Geneva",YaRN LLaMA 2 is an adapted version of LLaMA 2 using the YaRN extension method.,2023-11-01,https://arxiv.org/pdf/2309.00071.pdf,https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k,text; text,Evaluated across a variety of standard benchmarks in comparison to LLaMA 2.,70B parameters (dense),['LLaMA 2'],unknown,unknown,unknown,,open,LLaMA 2,,,unknown,https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k/discussions,,,,,,,,,, +model,Nous Capybara,Nous Research,The Capybara series is a series of LLMs and the first Nous collection of models made by fine-tuning mostly on data created by Nous in-house.,2023-11-13,https://huggingface.co/NousResearch/Nous-Capybara-34B,https://huggingface.co/NousResearch/Nous-Capybara-34B,text; text,,34B parameters (dense),['Yi'],unknown,unknown,unknown,,open,MIT,,,unknown,https://huggingface.co/NousResearch/Nous-Capybara-34B/discussions,,,,,,,,,, +model,YaRN Mistral,"Nous Research, EleutherAI, University of Geneva",YaRN Mistral is an adapted version of Mistral using the YaRN extension method.,2023-11-01,https://arxiv.org/pdf/2309.00071.pdf,https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k,text; text,Evaluated across a variety of standard benchmarks in comparison to Mistral.,7B parameters (dense),['Mistral'],unknown,unknown,unknown,,open,MIT,,,unknown,https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k/discussions,,,,,,,,,, +model,OpenHermes 2.5 Mistral,Nous Research,"OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, trained on additional code datasets.",2023-11-03,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B,text; text,Evaluated on common LLM benchmarks in comparison to other Mistral derivatives.,7B parameters (dense),['Mistral'],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B/discussions,,,,,,,,,, +model,Hermes 2 Pro-Mistral,Nous,"Hermes 2 Pro on Mistral 7B is an upgraded, retrained version of Nous Hermes 2. This improved version excels at function calling, JSON Structured Outputs, and several other areas, scoring positively on various benchmarks.",2024-03-10,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,text; text,"The model was examined across a range of benchmarks including GPT4All, AGIEval, BigBench, TruthfulQA and in-house evaluations of function calling and JSON mode.",7B parameters (dense),"['Mistral', 'OpenHermes 2.5 Dataset', 'Nous Hermes 2']",unknown,unknown,unknown,"The model was evaluated across multiple tasks, displaying notable scores in GPT4All, AGIEval, BigBench, and TruthfulQA. It also has a high score on function calling and JSON mode, indicating the robustness of its capabilities.",open,Apache 2.0,"The model is intended for general task and conversation capabilities, function calling, and JSON structured outputs.",unknown,unknown,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B/discussions,,,,,,,,,, +model,Genstruct,Nous,"Genstruct is an instruction-generation model, designed to create valid instructions given a raw text corpus. This enables the creation of new, partially synthetic instruction finetuning datasets from any raw-text corpus. This work was inspired by Ada-Instruct and the model is also trained to generate questions involving complex scenarios that require detailed reasoning.",2024-03-07,https://huggingface.co/NousResearch/Genstruct-7B,https://huggingface.co/NousResearch/Genstruct-7B,text; text,unknown,7B parameters (dense),[],unknown,unknown,unknown,unknown,open,Apache 2.0,"The model is intended for instruction-generation, creating questions involving complex scenarios and generating reasoning steps for those questions.",unknown,unknown,https://huggingface.co/NousResearch/Genstruct-7B/discussions,,,,,,,,,, +dataset,AutoMathText,Math AI,AutoMathText is an extensive and carefully curated dataset encompassing around 200 GB of mathematical texts.,2024-02-12,https://github.com/yifanzhang-pro/AutoMathText,,text,Mistral model fine-tuned on AutoMathText and evaluated on the MATH dataset.,200 GB,"['OpenWebMath', 'RedPajama-Data', 'Algebraic Stack', 'Qwen']",,,,,open,CC BY-SA 4.0,,,unknown,https://huggingface.co/datasets/math-ai/AutoMathText/discussions,,,,,,,https://huggingface.co/datasets/math-ai/AutoMathText,[],, +model,Kotoba Speech,Kotoba Tech,Kotoba-Speech is a Transformer-based speech generative model that supports fluent text-to-speech generation in Japanese and one-shot voice cloning through speech prompt.,2024-03-13,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1,text; audio,unknown,1.2B parameters (dense),[],unknown,unknown,unknown,,open,Apache 2.0,,,unknown,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1/discussions,,,,,,,,,, +model,Prithvi,IBM,"Prithvi is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. The model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function.",2023-08-03,https://github.com/NASA-IMPACT/hls-foundation-os,https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M,"text, video; text, video",,100M parameters (dense),['NASA HLS data'],,,,,open,Apache 2.0,,,,https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M/discussions,,,,,,,,,, +application,Watsonx.ai,IBM,"Watsonx.ai is part of the IBM watsonx platform that brings together new generative AI capabilities, powered by foundation models and traditional machine learning into a powerful studio spanning the AI lifecycle.",2023-09-07,https://www.ibm.com/products/watsonx-ai,,,,,['Granite'],,,,,limited,custom,,,,,,deployed AI models,https://www.ibm.com/docs/en/watsonx-as-a-service?topic=models-terms-use,,,,,,, +model,Granite,IBM,Granite is a set of multi-size foundation models that apply generative AI to both language and code.,2023-09-28,https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models/,,"text; code, text",unknown,13B parameters (dense),[],unknown,unknown,unknown,"Training data passed through IBM HAP detector, language model designed to remove harmful content. Data also deduplicated and filtered for document quality.",limited,,,,,,,,,,,,,,, +dataset,LAION-1B,Alibaba,,2023-02-20,https://arxiv.org/pdf/2302.09778.pdf,,"image, text",,1B image-text pairs,['LAION-5B'],,,,,closed,unknown,,,,,,,,,,,,[],,"We eliminate duplicates, low resolution images, and images potentially contain harmful content from the LAION dataset." +model,Composer,Alibaba,,2023-02-20,https://arxiv.org/pdf/2302.09778.pdf,,"image, text; image",,4.4B parameters (dense),"['ImageNet', 'WebVision', 'LAION-1B']",,,,,closed,unknown,,,,,,,,,,,,,, +model,Qwen,Alibaba,"QWEN is a comprehensive language model series that encompasses distinct models with varying parameter counts. Qwen series, now including Qwen, the base language models, namely Qwen-7B and Qwen-14B, as well as Qwen-Chat, the chat models, namely Qwen-7B-Chat and Qwen-14B-Chat. ",2023-08-03,https://arxiv.org/abs/2309.16609,https://huggingface.co/Qwen,"image, text; text","Evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, etc.",14B parameters (dense),[],unknown,,,"They filter out low-quality data, they employ a combination of rule-based and machine-learning-based methods. Specifically, they use multiple models to score the content, including language models, text-quality scoring models, and models for identifying potentially offensive or inappropriate content. They also manually sample texts from various sources and review them to ensure their quality. To further enhance the quality of our data, they selectively up-sample data from certain sources, to ensure that our models are trained on a diverse range of high-quality content.",open,custom,,,"Governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. And The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.",,,,,,,,,,, +model,Qwen 1.5,Alibaba,"Qwen 1.5 is the next iteration in their Qwen series, consisting of Transformer-based large language models pretrained on a large volume of data, including web texts, books, codes, etc.",2024-02-04,https://qwenlm.github.io/blog/qwen1.5/,https://huggingface.co/Qwen/Qwen1.5-72B,text; text,"Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated on Chatbot Arena, AlpacaEval, MT-Bench, etc.",72B parameters (dense),[],unknown,unknown,unknown,unknown,open,custom,,,unknown,https://huggingface.co/Qwen/Qwen1.5-72B/discussions,,,,,,,,,, +model,Qwen 1.5 MoE,Qwen Team,"Qwen 1.5 is the next iteration in their Qwen series, consisting of Transformer-based large language models pretrained on a large volume of data, including web texts, books, codes, etc. Qwen 1.5 MoE is the MoE model of the Qwen 1.5 series.",2024-03-28,https://qwenlm.github.io/blog/qwen-moe/,https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B,text; text,"Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated on Chatbot Arena, AlpacaEval, MT-Bench, etc.",14B parameters with 2.7B parameters for activation (MoE),[],unknown,unknown,unknown,unknown,open,custom,,,unknown,https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B/discussions,,,,,,,,,, +model,SeaLLM v2.5,"DAMO Academy, Alibaba",SeaLLM v2.5 is a multilingual large language model for Southeast Asian (SEA) languages.,2024-04-12,https://github.com/DAMO-NLP-SG/SeaLLMs,https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5,text; text,"The model was evaluated on 3 benchmarks (MMLU for English, M3Exam (M3e) for English, Chinese, Vietnamese, Indonesian, and Thai, and VMLU for Vietnamese) and it outperformed GPT-3 and Vistral-7B-chat models across these benchmarks in the given languages.",7B parameters,['Gemma'],unknown,unknown,unknown,"Despite efforts in red teaming and safety fine-tuning and enforcement, the creators suggest, developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations.",open,custom,"The model is intended for multilingual tasks such as knowledge retrieval, math reasoning, and instruction following. Also, it could be used to provide multilingual assistance.","The model should not be used in a way that could lead to inaccurate, misleading or potentially harmful generation. Users should comply with local laws and regulations when deploying the model.",unknown,https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5/discussions,,,,,,,,,, +model,Firefly Image 2,Adobe,"Firefly Image 2 is the next generation of generative AI for imaging, bringing significant advancements to creative control and quality, including new Text to Image capabilities now available in the popular Firefly web app where 90% of users are new to Adobe products.",2023-10-10,https://firefly.adobe.com/,,text; image,,unknown,[],unknown,unknown,unknown,,closed,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,,,,,,,,, +model,Firefly Vector,Adobe,"Firefly Vector is the world’s first generative AI focused on producing vector graphics, bringing Adobe's vector graphic and generative AI expertise directly into Adobe Illustrator workflows with Text to Vector Graphic.",2023-10-10,https://firefly.adobe.com/,,text; vector graphic,,unknown,[],unknown,unknown,unknown,,closed,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,,,,,,,,, +model,Firefly Design,Adobe,Firefly Design powers instant generation of amazing quality template designs in Adobe Express with the new Text to Template capability.,2023-10-10,https://firefly.adobe.com/,,text; template design,,unknown,[],unknown,unknown,unknown,,closed,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,,,,,,,,, +application,Firefly,Adobe,"Adobe Firefly is a standalone web application. It offers new ways to ideate, create, and communicate while significantly improving creative workflows using generative AI.",2023-03-21,https://firefly.adobe.com/,,,,,"['Firefly Image 2', 'Firefly Vector', 'Firefly Design']",,,,,limited,unknown,creative generation of digital art and images,"AI/ML training, attempting to create abusive, illegal, or confidential content.",,,,AI-generated creations,https://www.adobe.com/legal/licenses-terms/adobe-gen-ai-user-guidelines.html,unknown,unknown,unknown,,,, +dataset,CulturaX,"University of Oregon, Adobe","CulturaX is a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development.",2023-09-17,https://arxiv.org/pdf/2309.09400,,text,,6.3 trillion tokens,"['mC4', 'OSCAR']",,,,unknown,open,"mC4, OSCAR",,The data must not be utilized for malicious or harmful purposes towards humanity.,unknown,https://huggingface.co/datasets/uonlp/CulturaX/discussions,,,,,,,https://huggingface.co/datasets/uonlp/CulturaX,[],unknown,unknown