Skip to content

Commit

Permalink
auto add assets
Browse files Browse the repository at this point in the history
  • Loading branch information
jxue16 committed Jul 9, 2024
1 parent f825210 commit b6d4919
Show file tree
Hide file tree
Showing 22 changed files with 994 additions and 53 deletions.
84 changes: 84 additions & 0 deletions assets/01ai.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,87 @@
prohibited_uses: ''
monitoring: unknown
feedback: https://huggingface.co/01-ai/Yi-VL-34B/discussions
- type: model
name: MARS5
organization: CAMB.AI
description: MARS5 is a two-stage AR-NAR English speech model capable of generating speech from text prompts and short audio references. The model can handle prosodically challenging scenarios, like sports commentary and anime dialogue, and allows users to 'deep clone' by providing the transcript of the reference audio. The resulting output can be 'steered' by punctuation and capitalization. The MARS5 model uses two checkpoints - an AR fp16 checkpoint (750M parameters), and an NAR fp16 checkpoint (450M parameters).
created_date: Unknown
url: https://huggingface.co/CAMB-AI/MARS5-TTS
model card: https://huggingface.co/CAMB-AI/MARS5-TTS
modality: text and audio; audio
analysis: Unknown. Future updates are planned to benchmark performance on standard speech datasets.
size: 1.2B parameters (750M AR + 450M NAR)
dependencies: ["TransFusion repository", "Multinomial diffusion repository", "Mistral-src repository", "minbpe repository", "Vocos from gemelo-ai", "AWS", "huggingface_hub", "torch", "torchaudio", "librosa", "vocos", "encodec"]
training_emissions: Unknown
training_time: Unknown
training_hardware: NVIDIA H100s
quality_control: Unknown. The project roadmap includes improving inference stability, speed, and performance.
access: open
license: GNU AGPL 3.0
intended_uses: The model is designed to synthesize speech from text prompts and audio reference files. These capabilities can be used in TTS and dubbing applications in over 140 languages.
prohibited_uses: Unknown
monitoring: Unknown. The organization actively accepts contributions on GitHub and is planning improvements to the model.
feedback: Users are encouraged to report problems or contribute improvements via GitHub's PR/discussion feature. They can also contact the organization via email at [email protected].
- type: model
name: Kolors
organization: Kuaishou Kolors team
description: Kolors is a large-scale text-to-image generation model based on latent diffusion. It is trained on billions of text-image pairs and shows significant advantages in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. It also supports Chinese and English inputs.
created_date: 2024 (exact date unknown)
url: https://huggingface.co/Kwai-Kolors/Kolors
model card: https://huggingface.co/Kwai-Kolors/Kolors
modality: text; image
analysis: Unknown
size: Unknown
dependencies: [Diffusers, ChatGLM3]
training_emissions: Unknown
training_time: Unknown
training_hardware: Unknown
quality_control: Measures have been taken to ensure the compliance, accuracy, and safety of the data during training, but the developers note that due to the diversity and combinability of generated content and the probabilistic randomness affecting the model, they cannot guarantee the accuracy and safety of the output content.
access: open
license: Apache 2.0
intended_uses: The model is intended to be used for text-to-image synthesis, with the ability to handle both Chinese and English inputs.
prohibited_uses: The model should not be used for any purposes that may harm the country and society, or for any services not evaluated and registered for safety. It should not be used in ways that could lead to data security issues, public opinion risks, or risks and liabilities arising from the model being misled, abused, misused, or improperly utilized.
monitoring: Unknown
feedback: Problems with the model can be reported via email at [email protected].
- type: model
name: ChatTTS
organization: 2NOISE
description: ChatTTS is a text-to-speech model that converts text input into audio output. The model supports batch processing, and it provides multiple parameter settings for fine control over the generated speech, including specifying the speaker, adjusting the speech speed, and adding laughter. The model does not guarantee accuracy, completeness, or reliability, contingent upon its use for academic and research purposes.
created_date: unknown
url: https://huggingface.co/2Noise/ChatTTS
model card: https://huggingface.co/2Noise/ChatTTS
modality: text; audio
analysis: unknown
size: unknown
dependencies: [torch, torchaudio, ChatTTS]
training_emissions: unknown
training_time: unknown
training_hardware: unknown
quality_control: unknown
access: open
license: unknown
intended_uses: The model is intended for academic, educational, and research use with the ability to convert text into speech. It provides multiple parameters for fine control over the generated speech.
prohibited_uses: The model should not be used for any commercial or legal purposes.
monitoring: unknown
feedback: Problems with this model can be reported via email at [email protected].
- type: model
name: CodeGeeX4-ALL-9B
organization: THUDM
description: CodeGeeX4-ALL-9B is a multilingual code generation model that has been trained on the GLM-4-9B. It can perform functions such as code completion and generation, code interpreting, web searching, function calling, and code Q&A, covering various scenarios of software development. It has shown remarkable performance on public benchmarks such as BigCodeBench and NaturalCodeBench.
created_date: unknown
url: https://huggingface.co/THUDM/codegeex4-all-9b
model card: https://huggingface.co/THUDM/codegeex4-all-9b
modality: text; text
analysis: The model was evaluated on several benchmarks including HumanEval, MBPP, NCB, LCB, HumanEvalFIM, and CRUXEval-O. It achieved competitive performances, standing out even among other, larger models.
size: This model has 9 billion parameters.
dependencies: This model was directly built on the GLM-4-9B model.
training_emissions: unknown
training_time: unknown
training_hardware: unknown
quality_control: The model has been evaluated on numerous public benchmarks for performance assessment.
access: open
license: unknown
intended_uses: It can be used for code completion and generation, code interpreting, web search, function call, repository-level code Q&A, and various software development scenarios.
prohibited_uses: unknown
monitoring: unknown
feedback: Any downstream problems should likely be reported to the model's creators at THUDM, but there's not specific feedback procedure mention in the provided materials about the model.
21 changes: 21 additions & 0 deletions assets/alibaba.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -176,3 +176,24 @@
laws and regulations when deploying the model.
monitoring: unknown
feedback: https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5/discussions
- type: model
name: SenseVoice
organization: Alibaba
description: SenseVoice is a speech foundation model with multiple speech understanding capabilities, such as automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). The model demonstrates high-accuracy multilingual speech recognition, speech emotion recognition, and audio event detection capabilities. It supports over 50 languages and was trained with more than 400,000 hours of data. It is efficient in inference and offers convenient finetuning scripts and strategies, as well as service deployment pipeline.
created_date: unknown
url: https://huggingface.co/FunAudioLLM/SenseVoiceSmall
model card: https://huggingface.co/FunAudioLLM/SenseVoiceSmall
modality: audio; text
analysis: The model has been evaluated for multilingual speech recognition, speech emotion recognition, and audio event detection. The evaluations compared SenseVoice with other models like the Whisper model and found that it surpasses these in nearly all tasks.
size: unknown
dependencies: Based on the code and the description provided, it appears that the model may depend on the [Whisper model], although this is not clearly stated. It could potentially also depend on several other models or libraries, such as the [AutoModel] and [funasr] that are mentioned in the provided Python scripts.
training_emissions: unknown
training_time: unknown
training_hardware: unknown
quality_control: Not explicitly stated, but given the evaluations mentioned and the performance of the model on multiple benchmark datasets, it seems that significant measures were taken to ensure its quality and accuracy.
access: Open
license: unknown but it's part of a cloned repo from GitHub which suggests that usage may be subject to GitHub's Terms and Conditions.
intended_uses: The model can be used for tasks such as automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection. It could be useful in a variety of fields, including translation services, virtual assistants, voice-activated systems, and emotion-detection services, among others.
prohibited_uses: Not stated
monitoring: There is no specific mention of monitoring measures being taken for downstream uses of the model.
feedback: Issues could be reported through the GitHub project page.
21 changes: 21 additions & 0 deletions assets/bain.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,24 @@
monthly_active_users: ''
user_distribution: ''
failures: ''
- type: model
name: XEUS
organization: Carnegie Mellon University's WAVLab
description: XEUS is a large-scale multilingual speech encoder that was pre-trained on over 1 million hours of publicly available speech datasets and covers over 4000 languages. It requires fine-tuning to be used in downstream tasks such as Speech Recognition or Translation and also supports Flash Attention. Its hidden states can also be used with k-means for semantic Speech Tokenization. XEUS uses the E-Branchformer architecture and is trained using HuBERT-style masked prediction of discrete speech tokens extracted from WavLabLM. During training, the input speech is also augmented with acoustic noise and reverberation.
created_date: 2024 (exact date unknown)
url: https://huggingface.co/espnet/xeus
model card: https://huggingface.co/espnet/xeus
modality: Audio; Text
analysis: XEUS tops the ML-SUPERB multilingual speech recognition leaderboard, outperforming others such as MMS, w2v-BERT 2.0, and XLS-R. It also sets a new state-of-the-art on 4 tasks in the monolingual SUPERB benchmark.
size: 577M Parameters
dependencies: ['The WAVLabLM dataset', 'Publicly available speech datasets']
training_emissions: Unknown
training_time: Unknown
training_hardware: Unknown
quality_control: Acoustic noise and reverberation added during training for better robustness; model evaluated on multiple benchmarks including ML-SUPERB and SUPERB.
access: Open
license: Unknown
intended_uses: For downstream tasks such as Speech Recognition, Translation, and semantic Speech Tokenization.
prohibited_uses: Should not be used without required fine-tuning for downstream tasks.
monitoring: Unknown
feedback: Unknown, potentially through the project's GitHub or through the authors' direct communication channels.
21 changes: 21 additions & 0 deletions assets/casia.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,24 @@
prohibited_uses: ''
monitoring: ''
feedback: https://huggingface.co/wenge-research/yayi2-30b/discussions
- type: model
name: AstroPT
organization: Aspia Space, Instituto de Astrofísica de Canarias (IAC), UniverseTBD, Astrophysics Research Institute, Liverpool John Moores University, Departamento Astrofísica, Universidad de la Laguna, Observatoire de Paris, LERMA, PSL University, Universit´e Paris-Cit´e
description: AstroPT is an autoregressive pretrained transformer developed with astronomical use-cases in mind. The models are trained on 8.6 million 512x512 pixel grz-band galaxy postage stamp observations from the DESI Legacy Survey DR8. The training resulted in the creation of foundation models ranging in size from 1 million to 2.1 billion parameters. It is a step towards creating a 'Large Observation Model' – a model trained on data from observational sciences at a scale similar to natural language processing models.
created_date: Unknown.
url: https://arxiv.org/pdf/2405.14930v1
model card: https://arxiv.org/pdf/2405.14930v1
modality: Image; Image.
analysis: The models' performance on downstream tasks, as measured by linear probing, was found to improve with model size up to a certain saturation point.
size: The model ranges from 1 million to 2.1 billion parameters. Given that the specific mention of the models being a Mixture of Experts or sparse in the information provided, we can't affirm about the model's sparsity.
dependencies: [DESI Legacy Survey DR8 Dataset].
training_emissions: Unknown.
training_time: Unknown.
training_hardware: Unknown.
quality_control: The models underwent linear probing to measure performance and identify the parameter saturation point beyond which size no longer improves performance.
access: Open. The source code, weights, and dataset for AstroPT have been released under the MIT license.
license: MIT.
intended_uses: Developed with astronomical use-cases in mind. The models can be utilized to extract meaningful information from astronomical observations.
prohibited_uses: Unknown.
monitoring: Description of measures taken to monitor downstream uses of this model is not mentioned in the provided information.
feedback: Potential collaborators and users are invited to join the research activities surrounding these models. It can be inferred that any feedback or issues can be reported to Michael J. Smith ([email protected]).
21 changes: 21 additions & 0 deletions assets/cognition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,24 @@
prohibited_uses: ''
monitoring: ''
feedback: none
- type: model
name: ESM3
organization: EvolutionaryScale
description: ESM3 is a breakthrough language model for the life sciences designed to program and simulate the code of life. It is the first generative model for biology that simultaneously reasons over the sequence, structure, and function of proteins. It is trained on billions of proteins from diverse environments and is capable of understanding the underlying principles of biology due to its large-scale training. The model has marked a milestone in creating a completely new green fluorescent protein. It can generate new proteins by responding to prompts, thereby offering an unparalleled degree of control over protein production.
created_date: 2024-06-25
url: https://www.evolutionaryscale.ai/blog/esm3-release
model card: https://www.evolutionaryscale.ai/blog/esm3-release
modality: text; text
analysis: Not provided
size: 98B parameters (dense)
dependencies: [ESM2, AI models inspired by natural language processing models]
training_emissions: unknown
training_time: unknown
training_hardware: One of the highest throughput GPU clusters in the world today.
quality_control: ESM3 was developed using a responsible framework informed by principles of responsible development, including transparency and accountability from the start.
access: unknown
license: unknown
intended_uses: ESM3 is made to generate new proteins for a myriad of applications such as for medicine, biology research, and clean energy. It can also be used to simulate evolution and provide understanding of the principles of biology through the generation of synthetic data points including predicted structures and functions for diverse sequences.
prohibited_uses: Not provided
monitoring: Not provided
feedback: Not provided
22 changes: 22 additions & 0 deletions assets/deci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,25 @@
prohibited_uses: ''
monitoring: unknown
feedback: none
- type: model
name: Poseidon
organization: Seminar for Applied Mathematics, ETH Zurich, Switzerland & ETH AI Center, Zurich, Switzerland
description: Poseidon is a foundation model for learning the solution operators of partial differential equations (PDEs). It is based on a multiscale operator transformer and implements a new training strategy that leverages semi-group properties of time-dependent PDEs for scalable training data. Poseidon is pretrained on a large scale dataset for the governing equations of fluid dynamics and exhibits a high level of performance across multiple downstream tasks. It also shows remarkable generalizability to new physics that did not feature during pretraining.
created_date: Unknown
url: https://arxiv.org/pdf/2405.19101
model card: https://arxiv.org/pdf/2405.19101
modality: Text; Unknown
analysis: Poseidon was evaluated on a suite of 15 downstream tasks of varying complexity involving different PDE types and operators. The model demonstrated excellent performance across all tasks, significantly outperforming baselines in both sample efficiency and accuracy.
size: Unknown
dependencies: [Pretraining dataset for the governing equations of fluid dynamics]
training_emissions: Unknown
training_time: Unknown
training_hardware: Unknown
quality_control: Poseidon underwent extensive evaluations on 15 diverse downstream tasks to gauge its efficacy. The tasks span a wide range of complexities and PDE types. The model has shown the ability to generalize well to new, unseen physics that did not feature during pretraining.
access: Open
license: Unknown
intended_uses: Poseidon is intended for learning the solution operators of PDEs in contexts such as computational physics, fluid dynamics, and more. It can be effectively utilized in any task requiring the efficient and accurate resolution of PDEs.
prohibited_uses: Unknown
monitoring: Unknown
feedback: Problems with the model can be reported through the GitHub where the source code is hosted.

Loading

0 comments on commit b6d4919

Please sign in to comment.