clean up changes

stanford-crfm · Sep 28, 2024 · 3517dc0 · 3517dc0
1 parent 2195c62
commit 3517dc0
Show file tree

Hide file tree

Showing 22 changed files with 83 additions and 172 deletions.
diff --git a/assets/ai21.yaml b/assets/ai21.yaml
@@ -319,7 +319,7 @@
   monitoring: ''
   feedback: https://huggingface.co/ai21labs/Jamba-v0.1/discussions
 - type: model
-  name: Jamba 1.5 Open Model Family (Jamba 1.5 Mini, Jamba 1.5 Large)
+  name: Jamba 1.5
   organization: AI21
   description: A family of models that demonstrate superior long context handling,
     speed, and quality. Built on a novel SSM-Transformer architecture, they surpass
@@ -342,7 +342,7 @@
     and Jamba 1.5 Large used 8xA100 80GB GPUs.
   quality_control: The models were evaluated on the Arena Hard benchmark. For maintaining
     long context performance, they were tested on the RULER benchmark.
-  access: Open
+  access: open
   license: Jamba Open Model License
   intended_uses: The models are built for enterprise scale AI applications. They
     are purpose-built for efficiency, speed, and ability to solve critical tasks

diff --git a/assets/aleph_alpha.yaml b/assets/aleph_alpha.yaml
@@ -127,7 +127,7 @@
   quality_control: The model comes with additional safety guardrails via alignment
     methods to ensure safe usage. Training data is carefully curated to ensure compliance
     with EU and national regulations.
-  access: Open
+  access: open
   license: Aleph Open
   intended_uses: The model is intended for use in domain-specific applications,
     particularly in the automotive and engineering industries. It can also be tailored

diff --git a/assets/anthropic.yaml b/assets/anthropic.yaml
@@ -625,16 +625,15 @@
   quality_control: The model underwent a red-teaming assessment, and has been tested
     and refined by external experts. It was also provided to the UK's AI Safety
     Institute (UK AISI) for a pre-deployment safety evaluation.
-  access: Open
+  access: open
   license: unknown
   intended_uses: The model is intended for complex tasks such as context-sensitive
     customer support, orchestrating multi-step workflows, interpreting charts and
     graphs, transcribing text from images, as well as writing, editing, and executing
     code.
   prohibited_uses: Misuse of the model is discouraged though specific use cases
     are not mentioned.
-  monitoring: Unknown
-    of misuse, and policy feedback from external experts has been integrated to
-    ensure robustness of evaluations.
+  monitoring: Unknown of misuse, and policy feedback from external experts has been
+    integrated to ensure robustness of evaluations.
   feedback: Feedback on Claude 3.5 Sonnet can be submitted directly in-product to
     inform the development roadmap and improve user experience.
diff --git a/assets/aspia_space,_institu.yaml b/assets/aspia_space,_institu.yaml
@@ -14,12 +14,11 @@
   url: https://arxiv.org/pdf/2405.14930v1
   model_card: unknown
   modality: image; image
-    provide insights)
   analysis: The models’ performance on downstream tasks was evaluated by linear
     probing. The models follow a similar saturating log-log scaling law to textual
     models, their performance improves with the increase in model size up to the
     saturation point of parameters.
-  size: Ranges from 1 million to 2.1 billion parameters.
+  size: 2.1B parameters
   dependencies: [DESI Legacy Survey DR8]
   training_emissions: Unknown
   training_time: Unknown

diff --git a/assets/evolutionaryscale.yaml b/assets/evolutionaryscale.yaml
@@ -12,25 +12,23 @@
   created_date: 2024-06-25
   url: https://www.evolutionaryscale.ai/blog/esm3-release
   model_card: unknown
-  modality: text; text
-    textual descriptions of proteins as outputs)
+  modality: text; image, text
   analysis: The model was tested in the generation of a new green fluorescent protein.
     Its effectiveness was compared to natural evolutionary processes, and it was
     deemed to simulate over 500 million years of evolution.
   size: 98B parameters (Dense)
-  dependencies: [ESM2(base model), largest dataset of proteins]
+  dependencies: []
   training_emissions: Unknown
   training_time: Unknown
-  training_hardware: One of the highest throughput GPU clusters in the world.
+  training_hardware: unknown
   quality_control: The creators have put in place a responsible development framework
     to ensure transparency and accountability from the start. ESM3 was tested in
     the generation of a new protein, ensuring its quality and effectiveness.
-  access: Open
+  access: open
   license: Unknown
   intended_uses: To engineer biology from first principles. It functions as a tool
     for scientists to create proteins for various applications, including medicine,
     biology research, and clean energy.
   prohibited_uses: Unknown
-  monitoring: Unknown
-    though specific measures are not specified.
+  monitoring: Unknown though specific measures are not specified.
   feedback: Unknown
diff --git a/assets/team_glm,_zhipu_ai,_.yaml → assets/glm.yaml b/assets/team_glm,_zhipu_ai,_.yaml → assets/glm.yaml
@@ -12,22 +12,22 @@
     which tool(s) to use.
   created_date: 2023-07-02
   url: https://arxiv.org/pdf/2406.12793
-  model_card: unknown
+  model_card: https://huggingface.co/THUDM/glm-4-9b
   modality: text; text
   analysis: Evaluations show that GLM-4, 1) closely rivals or outperforms GPT-4
     in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval,
     2) gets close to GPT-4-Turbo in instruction following as measured by IFEval,
     3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms
     GPT-4 in Chinese alignments as measured by AlignBench.
-  size: From 6.2 billion parameters to 9 billion parameters and 130 billion parameters.
-  dependencies: [GPT models, GLM-10B, GLM-130B]
+  size: 9B parameters
+  dependencies: []
   training_emissions: Unknown
   training_time: Unknown
   training_hardware: Unknown
   quality_control: High-quality alignment is achieved via a multi-stage post-training
     process, which involves supervised fine-tuning and learning from human feedback.
   access: Open
-  license: Unknown
+  license: Apache 2.0
   intended_uses: General language modeling, complex tasks like accessing online
     information via web browsing and solving math problems using Python interpreter.
   prohibited_uses: Unknown

diff --git a/assets/google.yaml b/assets/google.yaml
@@ -1870,7 +1870,7 @@
     2. The model has improvements in safety and efficiency over the first generation.
     The deployment of Gemma 2 on Vertex AI, scheduled for the next month, will offer
     effortless management of the model.
-  access: Open
+  access: open
   license: Gemma (commercially-friendly license given by Google DeepMind)
   intended_uses: Gemma 2 is designed for developers and researchers for various
     AI tasks. It can be used via the integrations it offers with other AI tools/platforms

diff --git a/assets/laion_e.v..yaml b/assets/laion_e.v..yaml
@@ -25,15 +25,14 @@
   quality_control: The model utilized lists of link and image hashes provided by
     partner organizations. These were used to remove inappropriate links from the
     original LAION-5B dataset to create Re-LAION-5B.
-  access: Open
+  access: open
   license: Apache 2.0
   intended_uses: Re-LAION-5B is designed for research on language-vision learning.
     It can also be used by third parties to clean existing derivatives of LAION-5B
     by generating diffs and removing all matched content from their versions.
   prohibited_uses: The dataset should not be utilized for purposes that breach legal
     parameters or ethical standards, such as dealing with illegal content.
-  monitoring: This version is a response to continuous scrutiny & safety revisions.
-    It's also meant to allow inspection and validation by a broad community.
+  monitoring: unknown
   feedback: Problems with the dataset should be reported to the LAION organization.
     They have open lines for communication with their partners and the broader research
     community.
diff --git a/assets/lg_ai_research.yaml b/assets/lg_ai_research.yaml
@@ -16,19 +16,14 @@
     EXAONE 3.0 was competitive in English and excellent in Korean compared to other
     large language models of a similar size.
   size: 7.8B parameters (dense)
-  dependencies:
-    - GQA
-    - SwiGLU
-    - Rotary Position Embeddings
-    - MeCab
-    - BBPE
+  dependencies: [MeCab]
   training_emissions: Unknown
   training_time: Unknown
   training_hardware: Unknown
   quality_control: Extensive pre-training on a diverse dataset, and advanced post-training
     techniques were employed to enhance instruction-following capabilities. The
     model was also trained to fully comply with data handling standards.
-  access: Open
+  access: open
   license: Unknown
   intended_uses: The model was intended for non-commercial and research purposes.
     The capabilities of the model allow for use cases that involve advanced AI and

diff --git a/assets/meta.yaml b/assets/meta.yaml
@@ -865,8 +865,8 @@
     text summarization, multilingual conversational agents, and coding assistants.
     It is the largest and most capable openly available foundation model.
   created_date: 2024-07-23
-  url: https://ai.meta.com/blog/meta-llama-3-1/?utm_source=twitter&utm_medium=organic_social&utm_content=video&utm_campaign=llama31
-  model_card: unknown
+  url: https://ai.meta.com/blog/meta-llama-3-1/
+  model_card: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md
   modality: text; text
   analysis: The model was evaluated on over 150 benchmark datasets that span a wide
     range of languages. An experimental evaluation suggests that the model is competitive

diff --git a/assets/microsoft.yaml b/assets/microsoft.yaml
@@ -1003,23 +1003,23 @@
   created_date: 2024-09-08
   url: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
   model_card: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
-  modality: Unknown
+  modality: text; text
   analysis: The model was evaluated across a variety of public benchmarks, comparing
     with a set of models including Mistral-Nemo-12B-instruct-2407, Llama-3.1-8B-instruct,
     Gemma-2-9b-It, Gemini-1.5-Flash, and GPT-4o-mini-2024-07-18. It achieved a similar
     level of language understanding and math as much larger models. It also displayed
     superior performance in reasoning capability, even with only 6.6B active parameters.
     It was also evaluated for multilingual tasks.
-  size: 6.6B active parameters
-  dependencies: [Phi-3]
+  size: 61B parameters (sparse); 6.6B active parameters
+  dependencies: [Phi-3 dataset]
   training_emissions: Unknown
   training_time: Unknown
   training_hardware: Unknown
   quality_control: The model was enhanced through supervised fine-tuning, proximal
     policy optimization, and direct preference optimization processes for safety
     measures.
-  access: Open
-  license: Unknown
+  access: open
+  license: MIT
   intended_uses: The model is intended for commercial and research use in multiple
     languages. It is designed to accelerate research on language and multimodal
     models, and for use as a building block for generative AI powered features.

diff --git a/assets/mistral.yaml b/assets/mistral.yaml
@@ -122,7 +122,7 @@
   quality_control: The model underwent an advanced fine-tuning and alignment phase.
     Various measures such as accuracy comparisons with other models and instruction-tuning
     were implemented to ensure its quality.
-  access: Open
+  access: open
   license: Apache 2.0
   intended_uses: The model can be used for multilingual applications, understanding
     and generating natural language as well as source code, handling multi-turn
@@ -153,7 +153,7 @@
   training_time: Unknown
   training_hardware: Unknown
   quality_control: Unknown
-  access: Open
+  access: open
   license: Apache 2.0
   intended_uses: The model is intended for code generation and can be utilized as
     a local code assistant.
@@ -172,7 +172,7 @@
   created_date: 2024-07-16
   url: https://mistral.ai/news/mathstral/
   model_card: unknown
-  modality: Text-to-text (presumed based on description)
+  modality: text; text
   analysis: The model's performance has been evaluated on the MATH and MMLU industry-standard
     benchmarks. It scored notably higher on both these tests than the base model
     Mistral 7B.

diff --git a/assets/qwen_team.yaml b/assets/qwen_team.yaml
@@ -9,32 +9,20 @@
     in terms of mathematical capabilities.
   created_date: 2024-08-08
   url: https://qwenlm.github.io/blog/qwen2-math/
-  model_card: unknown
+  model_card: https://huggingface.co/Qwen/Qwen2-Math-72B
   modality: text; text
   analysis: Models have been evaluated on a series of math benchmarks, demonstrating
     outperformance of the state-of-the-art models in both the English and Chinese
     language.
-  size: The size of the largest model in the Qwen2-Math series is 72B parameters.
-  dependencies:
-    - GSM8K
-    - Math
-    - MMLU-STEM
-    - CMATH
-    - GaoKao Math Cloze
-    - GaoKao Math QA
-    - OlympiadBench
-    - CollegeMath
-    - GaoKao
-    - AIME2024
-    - AMC2023
-    - CN Middle School 24
+  size: 72B parameters
+  dependencies: []
   training_emissions: Unknown
   training_time: Unknown
   training_hardware: Unknown
   quality_control: The models were tested with few-shot chain-of-thought prompting
     and evaluated across mathematical benchmarks in both English and Chinese.
   access: open
-  license: Unknown
+  license: Tongyi Qianwen
   intended_uses: These models are intended for solving complex mathematical problems.
   prohibited_uses: Uses that go against the ethical usage policies of Qwen Team.
   monitoring: Unknown

diff --git a/assets/roblox.yaml b/assets/roblox.yaml
diff --git a/assets/samba.yaml b/assets/samba.yaml
@@ -57,30 +57,3 @@
   prohibited_uses: ''
   monitoring: unknown
   feedback: none
-- type: model
-  name: sarvam-2b
-  organization: sarvamAI
-  description: This is an early checkpoint of sarvam-2b, a small, yet powerful language
-    model pre-trained from scratch on 2 trillion tokens. It is designed to be proficient
-    in 10 Indic languages (Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi,
-    Oriya, Punjabi, Tamil, and Telugu) + English.
-  created_date: 2024-08-15
-  url: https://huggingface.co/sarvamai/sarvam-2b-v0.5
-  model_card: https://huggingface.co/sarvamai/sarvam-2b-v0.5
-  modality: text; text
-  analysis: Analysis for the model is not yet provided; however, it has been reported
-    that more technical details like evaluations and benchmarking will be posted
-    soon.
-  size: Unknown
-  dependencies: []
-  training_emissions: Unknown
-  training_time: Unknown
-  training_hardware: NVIDIA NeMo™ Framework, Yotta Shakti Cloud, HGX H100 systems.
-  quality_control: Unknown
-  access: Open
-  license: Unknown
-  intended_uses: The model can be used for text completion and supervised fine-tuning,
-    particularly in the languages it was trained on.
-  prohibited_uses: Unknown
-  monitoring: Unknown
-  feedback: Unknown
diff --git a/assets/stability_ai.yaml b/assets/stability_ai.yaml
@@ -18,14 +18,13 @@
   dependencies: []
   training_emissions: Unknown
   training_time: Unknown
-  training_hardware: NVIDIA RTX GPUs, TensorRT, AMD’s APUs, consumer GPUs and MI-300X
-    Enterprise GPUs
+  training_hardware: unknown
   quality_control: They have conducted extensive internal and external testing of
     this model and have implemented numerous safeguards to prevent harms. Safety
     measures were implemented from the start of training the model and continued
     throughout testing, evaluation, and deployment.
   access: open
-  license: Stability Non-Commercial Research Community License
+  license: Stability Community License
   intended_uses: The model can be used by professional artists, designers, developers,
     and AI enthusiasts for creating high-quality image outputs from text inputs.
   prohibited_uses: Large-scale commercial use requires contacting the organization
@@ -63,7 +62,7 @@
     working to refine and optimize the model beyond the current synthetic datasets
     it has been trained on.
   access: open
-  license: Stable AI License
+  license: Stability Community License
   intended_uses: This model can be used for creating dynamic multi-angle videos,
     with applications in game development, video editing, and virtual reality. It
     allows professionals in these fields to visualize objects from multiple angles,
@@ -88,22 +87,22 @@
     professions.
   created_date: 2024-08-01
   url: https://stability.ai/news/introducing-stable-fast-3d
-  model_card: unknown
+  model_card: https://huggingface.co/stabilityai/stable-fast-3d
   modality: image; 3D
   analysis: The model was evaluated on its ability to quickly and accurately transform
     a single image into a detailed 3D asset. This evaluation highlighted the model's
     unprecedented speed and quality, marking it as a valuable tool for rapid prototyping
     in 3D work. Compared to the previous SV3D model, Stable Fast 3D offers significantly
     reduced inference times--0.5 seconds versus 10 minutes--while maintaining high-quality
     output.
-  size: Unknown
+  size: unknown
   dependencies: [TripoSR]
   training_emissions: Unknown
   training_time: Unknown
   training_hardware: unknown
   quality_control: Unknown
   access: open
-  license: Stability AI Community
+  license: Stability Community License
   intended_uses: The model is intended for use in game development, virtual reality,
     retail, architecture, design and other graphically intense professions. It allows
     for rapid prototyping in 3D work, assisting both enterprises and indie developers.

diff --git a/assets/stanford.yaml b/assets/stanford.yaml
@@ -156,7 +156,7 @@
   created_date: 2024-09-08
   url: https://arxiv.org/pdf/2406.06512
   model_card: unknown
-  modality: Image; text
+  modality: image; text
   analysis: Merlin has been comprehensively evaluated on 6 task types and 752 individual
     tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification,
     phenotype classification, and zero-shot cross-modal retrieval, while model adapted