auto add assets

stanford-crfm · Jul 9, 2024 · b6d4919 · b6d4919
1 parent f825210
commit b6d4919
Show file tree

Hide file tree

Showing 22 changed files with 994 additions and 53 deletions.
diff --git a/assets/01ai.yaml b/assets/01ai.yaml
@@ -54,3 +54,87 @@
   prohibited_uses: ''
   monitoring: unknown
   feedback: https://huggingface.co/01-ai/Yi-VL-34B/discussions
+- type: model
+  name: MARS5
+  organization: CAMB.AI
+  description: MARS5 is a two-stage AR-NAR English speech model capable of generating speech from text prompts and short audio references. The model can handle prosodically challenging scenarios, like sports commentary and anime dialogue, and allows users to 'deep clone' by providing the transcript of the reference audio. The resulting output can be 'steered' by punctuation and capitalization. The MARS5 model uses two checkpoints - an AR fp16 checkpoint (750M parameters), and an NAR fp16 checkpoint (450M parameters).
+  created_date: Unknown
+  url: https://huggingface.co/CAMB-AI/MARS5-TTS
+  model card: https://huggingface.co/CAMB-AI/MARS5-TTS
+  modality: text and audio; audio
+  analysis: Unknown. Future updates are planned to benchmark performance on standard speech datasets.
+  size: 1.2B parameters (750M AR + 450M NAR)
+  dependencies: ["TransFusion repository", "Multinomial diffusion repository", "Mistral-src repository", "minbpe repository", "Vocos from gemelo-ai", "AWS", "huggingface_hub", "torch", "torchaudio", "librosa", "vocos", "encodec"]
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: NVIDIA H100s
+  quality_control: Unknown. The project roadmap includes improving inference stability, speed, and performance.
+  access: open
+  license: GNU AGPL 3.0
+  intended_uses: The model is designed to synthesize speech from text prompts and audio reference files. These capabilities can be used in TTS and dubbing applications in over 140 languages.
+  prohibited_uses: Unknown
+  monitoring: Unknown. The organization actively accepts contributions on GitHub and is planning improvements to the model.
+  feedback: Users are encouraged to report problems or contribute improvements via GitHub's PR/discussion feature. They can also contact the organization via email at [email protected].
+- type: model
+  name: Kolors
+  organization: Kuaishou Kolors team
+  description: Kolors is a large-scale text-to-image generation model based on latent diffusion. It is trained on billions of text-image pairs and shows significant advantages in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. It also supports Chinese and English inputs.
+  created_date: 2024 (exact date unknown)
+  url: https://huggingface.co/Kwai-Kolors/Kolors
+  model card: https://huggingface.co/Kwai-Kolors/Kolors
+  modality: text; image
+  analysis: Unknown
+  size: Unknown
+  dependencies: [Diffusers, ChatGLM3]
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: Unknown
+  quality_control: Measures have been taken to ensure the compliance, accuracy, and safety of the data during training, but the developers note that due to the diversity and combinability of generated content and the probabilistic randomness affecting the model, they cannot guarantee the accuracy and safety of the output content.
+  access: open
+  license: Apache 2.0 
+  intended_uses: The model is intended to be used for text-to-image synthesis, with the ability to handle both Chinese and English inputs.
+  prohibited_uses: The model should not be used for any purposes that may harm the country and society, or for any services not evaluated and registered for safety. It should not be used in ways that could lead to data security issues, public opinion risks, or risks and liabilities arising from the model being misled, abused, misused, or improperly utilized.
+  monitoring: Unknown
+  feedback: Problems with the model can be reported via email at [email protected].
+- type: model
+  name: ChatTTS
+  organization: 2NOISE
+  description: ChatTTS is a text-to-speech model that converts text input into audio output. The model supports batch processing, and it provides multiple parameter settings for fine control over the generated speech, including specifying the speaker, adjusting the speech speed, and adding laughter. The model does not guarantee accuracy, completeness, or reliability, contingent upon its use for academic and research purposes.
+  created_date: unknown
+  url: https://huggingface.co/2Noise/ChatTTS
+  model card: https://huggingface.co/2Noise/ChatTTS
+  modality: text; audio
+  analysis: unknown
+  size: unknown
+  dependencies: [torch, torchaudio, ChatTTS]
+  training_emissions: unknown
+  training_time: unknown
+  training_hardware: unknown
+  quality_control: unknown
+  access: open
+  license: unknown
+  intended_uses: The model is intended for academic, educational, and research use with the ability to convert text into speech. It provides multiple parameters for fine control over the generated speech.
+  prohibited_uses: The model should not be used for any commercial or legal purposes.
+  monitoring: unknown
+  feedback: Problems with this model can be reported via email at [email protected].
+- type: model
+  name: CodeGeeX4-ALL-9B
+  organization: THUDM
+  description: CodeGeeX4-ALL-9B is a multilingual code generation model that has been trained on the GLM-4-9B. It can perform functions such as code completion and generation, code interpreting, web searching, function calling, and code Q&A, covering various scenarios of software development. It has shown remarkable performance on public benchmarks such as BigCodeBench and NaturalCodeBench.
+  created_date: unknown
+  url: https://huggingface.co/THUDM/codegeex4-all-9b
+  model card: https://huggingface.co/THUDM/codegeex4-all-9b
+  modality: text; text
+  analysis: The model was evaluated on several benchmarks including HumanEval, MBPP, NCB, LCB, HumanEvalFIM, and CRUXEval-O. It achieved competitive performances, standing out even among other, larger models.
+  size: This model has 9 billion parameters.
+  dependencies: This model was directly built on the GLM-4-9B model.
+  training_emissions: unknown
+  training_time: unknown
+  training_hardware: unknown
+  quality_control: The model has been evaluated on numerous public benchmarks for performance assessment.
+  access: open
+  license: unknown
+  intended_uses: It can be used for code completion and generation, code interpreting, web search, function call, repository-level code Q&A, and various software development scenarios.
+  prohibited_uses: unknown
+  monitoring: unknown
+  feedback: Any downstream problems should likely be reported to the model's creators at THUDM, but there's not specific feedback procedure mention in the provided materials about the model.
diff --git a/assets/alibaba.yaml b/assets/alibaba.yaml
@@ -176,3 +176,24 @@
     laws and regulations when deploying the model.
   monitoring: unknown
   feedback: https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5/discussions
+- type: model
+  name: SenseVoice
+  organization: Alibaba
+  description: SenseVoice is a speech foundation model with multiple speech understanding capabilities, such as automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). The model demonstrates high-accuracy multilingual speech recognition, speech emotion recognition, and audio event detection capabilities. It supports over 50 languages and was trained with more than 400,000 hours of data. It is efficient in inference and offers convenient finetuning scripts and strategies, as well as service deployment pipeline.
+  created_date: unknown
+  url: https://huggingface.co/FunAudioLLM/SenseVoiceSmall
+  model card: https://huggingface.co/FunAudioLLM/SenseVoiceSmall
+  modality: audio; text
+  analysis: The model has been evaluated for multilingual speech recognition, speech emotion recognition, and audio event detection. The evaluations compared SenseVoice with other models like the Whisper model and found that it surpasses these in nearly all tasks.
+  size: unknown
+  dependencies: Based on the code and the description provided, it appears that the model may depend on the [Whisper model], although this is not clearly stated. It could potentially also depend on several other models or libraries, such as the [AutoModel] and [funasr] that are mentioned in the provided Python scripts.
+  training_emissions: unknown
+  training_time: unknown
+  training_hardware: unknown
+  quality_control: Not explicitly stated, but given the evaluations mentioned and the performance of the model on multiple benchmark datasets, it seems that significant measures were taken to ensure its quality and accuracy.
+  access: Open
+  license: unknown but it's part of a cloned repo from GitHub which suggests that usage may be subject to GitHub's Terms and Conditions.
+  intended_uses: The model can be used for tasks such as automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection. It could be useful in a variety of fields, including translation services, virtual assistants, voice-activated systems, and emotion-detection services, among others.
+  prohibited_uses: Not stated
+  monitoring: There is no specific mention of monitoring measures being taken for downstream uses of the model.
+  feedback: Issues could be reported through the GitHub project page.
diff --git a/assets/bain.yaml b/assets/bain.yaml
@@ -22,3 +22,24 @@
   monthly_active_users: ''
   user_distribution: ''
   failures: ''
+- type: model
+  name: XEUS
+  organization: Carnegie Mellon University's WAVLab
+  description: XEUS is a large-scale multilingual speech encoder that was pre-trained on over 1 million hours of publicly available speech datasets and covers over 4000 languages. It requires fine-tuning to be used in downstream tasks such as Speech Recognition or Translation and also supports Flash Attention. Its hidden states can also be used with k-means for semantic Speech Tokenization. XEUS uses the E-Branchformer architecture and is trained using HuBERT-style masked prediction of discrete speech tokens extracted from WavLabLM. During training, the input speech is also augmented with acoustic noise and reverberation.
+  created_date: 2024 (exact date unknown)
+  url: https://huggingface.co/espnet/xeus
+  model card: https://huggingface.co/espnet/xeus
+  modality: Audio; Text
+  analysis: XEUS tops the ML-SUPERB multilingual speech recognition leaderboard, outperforming others such as MMS, w2v-BERT 2.0, and XLS-R. It also sets a new state-of-the-art on 4 tasks in the monolingual SUPERB benchmark.
+  size: 577M Parameters
+  dependencies: ['The WAVLabLM dataset', 'Publicly available speech datasets']
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: Unknown
+  quality_control: Acoustic noise and reverberation added during training for better robustness; model evaluated on multiple benchmarks including ML-SUPERB and SUPERB.
+  access: Open
+  license: Unknown
+  intended_uses: For downstream tasks such as Speech Recognition, Translation, and semantic Speech Tokenization.
+  prohibited_uses: Should not be used without required fine-tuning for downstream tasks.
+  monitoring: Unknown
+  feedback: Unknown, potentially through the project's GitHub or through the authors' direct communication channels.
diff --git a/assets/casia.yaml b/assets/casia.yaml
@@ -49,3 +49,24 @@
   prohibited_uses: ''
   monitoring: ''
   feedback: https://huggingface.co/wenge-research/yayi2-30b/discussions
+- type: model
+  name: AstroPT
+  organization: Aspia Space, Instituto de Astrofísica de Canarias (IAC), UniverseTBD, Astrophysics Research Institute, Liverpool John Moores University, Departamento Astrofísica, Universidad de la Laguna, Observatoire de Paris, LERMA, PSL University, Universit´e Paris-Cit´e
+  description: AstroPT is an autoregressive pretrained transformer developed with astronomical use-cases in mind. The models are trained on 8.6 million 512x512 pixel grz-band galaxy postage stamp observations from the DESI Legacy Survey DR8. The training resulted in the creation of foundation models ranging in size from 1 million to 2.1 billion parameters. It is a step towards creating a 'Large Observation Model' – a model trained on data from observational sciences at a scale similar to natural language processing models.
+  created_date: Unknown.
+  url: https://arxiv.org/pdf/2405.14930v1
+  model card: https://arxiv.org/pdf/2405.14930v1
+  modality: Image; Image.
+  analysis: The models' performance on downstream tasks, as measured by linear probing, was found to improve with model size up to a certain saturation point.
+  size: The model ranges from 1 million to 2.1 billion parameters. Given that the specific mention of the models being a Mixture of Experts or sparse in the information provided, we can't affirm about the model's sparsity.
+  dependencies: [DESI Legacy Survey DR8 Dataset].
+  training_emissions: Unknown.
+  training_time: Unknown.
+  training_hardware: Unknown.
+  quality_control: The models underwent linear probing to measure performance and identify the parameter saturation point beyond which size no longer improves performance.
+  access: Open. The source code, weights, and dataset for AstroPT have been released under the MIT license.
+  license: MIT.
+  intended_uses: Developed with astronomical use-cases in mind. The models can be utilized to extract meaningful information from astronomical observations.
+  prohibited_uses: Unknown.
+  monitoring: Description of measures taken to monitor downstream uses of this model is not mentioned in the provided information.
+  feedback: Potential collaborators and users are invited to join the research activities surrounding these models. It can be inferred that any feedback or issues can be reported to Michael J. Smith ([email protected]).
diff --git a/assets/cognition.yaml b/assets/cognition.yaml
@@ -21,3 +21,24 @@
   prohibited_uses: ''
   monitoring: ''
   feedback: none
+- type: model
+  name: ESM3
+  organization: EvolutionaryScale
+  description: ESM3 is a breakthrough language model for the life sciences designed to program and simulate the code of life. It is the first generative model for biology that simultaneously reasons over the sequence, structure, and function of proteins. It is trained on billions of proteins from diverse environments and is capable of understanding the underlying principles of biology due to its large-scale training. The model has marked a milestone in creating a completely new green fluorescent protein. It can generate new proteins by responding to prompts, thereby offering an unparalleled degree of control over protein production.
+  created_date: 2024-06-25
+  url: https://www.evolutionaryscale.ai/blog/esm3-release
+  model card: https://www.evolutionaryscale.ai/blog/esm3-release
+  modality: text; text
+  analysis: Not provided
+  size: 98B parameters (dense)
+  dependencies: [ESM2, AI models inspired by natural language processing models]
+  training_emissions: unknown
+  training_time: unknown
+  training_hardware: One of the highest throughput GPU clusters in the world today.
+  quality_control: ESM3 was developed using a responsible framework informed by principles of responsible development, including transparency and accountability from the start.
+  access: unknown
+  license: unknown
+  intended_uses: ESM3 is made to generate new proteins for a myriad of applications such as for medicine, biology research, and clean energy. It can also be used to simulate evolution and provide understanding of the principles of biology through the generation of synthetic data points including predicted structures and functions for diverse sequences.
+  prohibited_uses: Not provided
+  monitoring: Not provided
+  feedback: Not provided
diff --git a/assets/deci.yaml b/assets/deci.yaml
@@ -23,3 +23,25 @@
   prohibited_uses: ''
   monitoring: unknown
   feedback: none
+- type: model
+  name: Poseidon
+  organization: Seminar for Applied Mathematics, ETH Zurich, Switzerland & ETH AI Center, Zurich, Switzerland
+  description: Poseidon is a foundation model for learning the solution operators of partial differential equations (PDEs). It is based on a multiscale operator transformer and implements a new training strategy that leverages semi-group properties of time-dependent PDEs for scalable training data. Poseidon is pretrained on a large scale dataset for the governing equations of fluid dynamics and exhibits a high level of performance across multiple downstream tasks. It also shows remarkable generalizability to new physics that did not feature during pretraining. 
+  created_date: Unknown 
+  url: https://arxiv.org/pdf/2405.19101
+  model card: https://arxiv.org/pdf/2405.19101
+  modality: Text; Unknown
+  analysis: Poseidon was evaluated on a suite of 15 downstream tasks of varying complexity involving different PDE types and operators. The model demonstrated excellent performance across all tasks, significantly outperforming baselines in both sample efficiency and accuracy.
+  size: Unknown 
+  dependencies: [Pretraining dataset for the governing equations of fluid dynamics]
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: Unknown
+  quality_control: Poseidon underwent extensive evaluations on 15 diverse downstream tasks to gauge its efficacy. The tasks span a wide range of complexities and PDE types. The model has shown the ability to generalize well to new, unseen physics that did not feature during pretraining.
+  access: Open
+  license: Unknown
+  intended_uses: Poseidon is intended for learning the solution operators of PDEs in contexts such as computational physics, fluid dynamics, and more. It can be effectively utilized in any task requiring the efficient and accurate resolution of PDEs.
+  prohibited_uses: Unknown
+  monitoring: Unknown
+  feedback: Problems with the model can be reported through the GitHub where the source code is hosted.
+