Skip to content

Commit

Permalink
fix colon
Browse files Browse the repository at this point in the history
  • Loading branch information
rishibommasani committed Apr 9, 2024
1 parent e5700cd commit dfea0f7
Showing 1 changed file with 61 additions and 26 deletions.
87 changes: 61 additions & 26 deletions assets/bigcode.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,18 @@
size: 15.5B parameters (dense)
dependencies: [The Stack]
training_emissions: 16.68 tons of CO2eq
training_time: StarCoderBase: 320,256 GPU hours
training_time: 320,256 GPU hours
training_hardware: 512 A100 80GB GPUs distributed across 64 nodes
quality_control: No specific quality control is mentioned in model training, though
details on data processing and how the tokenizer was trained are provided in
the paper.
access: open
license: BigCode Open RAIL-M v1.0
intended_uses: As a foundation model to fine-tune and create more specialized models that support use cases such as code completion, fill-in-the-middle, and text summarization. Can also be used as a Tech Assistant prompt and not as an instruction model given
training limitations.
prohibited_uses: 'See BigCode Open RAIL-M license and FAQ'
intended_uses: As a foundation model to fine-tune and create more specialized
models that support use cases such as code completion, fill-in-the-middle, and
text summarization. Can also be used as a Tech Assistant prompt and not as an
instruction model given training limitations.
prohibited_uses: See BigCode Open RAIL-M license and FAQ
monitoring: ''
feedback: https://huggingface.co/bigcode/starcoder/discussions
- type: model
Expand All @@ -32,31 +34,39 @@
analysis on Github stars' association to data quality.
created_date: 2023-02-24
url: https://arxiv.org/pdf/2301.03988.pdf
model_card: 'https://huggingface.co/bigcode/santacoder'
model_card: https://huggingface.co/bigcode/santacoder
modality: code; code
analysis: Evaluated on MultiPL-E system benchmarks.
size: 1.1B parameters (dense)
dependencies: [The Stack, BigCode Dataset]
training_emissions: '124 kg of CO2eq'
training_emissions: 124 kg of CO2eq
training_time: 14,284 GPU hours
training_hardware: 96 NVIDIA Tesla V100 GPUs
quality_control: ''
access: open
license: BigCode Open RAIL-M v1
intended_uses: 'The model was trained on GitHub code. As such it is not an instruction model and commands like "Write a function that computes the square root." do not work well. You should phrase commands like they occur in source code such as comments (e.g. # the following function computes the sqrt) or write a function signature and docstring and let the model complete the function body.'
prohibited_uses: 'See BigCode Open RAIL-M license and FAQ'
intended_uses: 'The model was trained on GitHub code. As such it is not an instruction
model and commands like "Write a function that computes the square root." do
not work well. You should phrase commands like they occur in source code such
as comments (e.g. # the following function computes the sqrt) or write a function
signature and docstring and let the model complete the function body.'
prohibited_uses: See BigCode Open RAIL-M license and FAQ
monitoring: ''
feedback: 'https://huggingface.co/bigcode/santacoder/discussions'
feedback: https://huggingface.co/bigcode/santacoder/discussions
- type: dataset
name: The Stack
organization: BigCode
description: The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The Stack serves as a pre-training dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis of programs from natural language descriptions as well as other from code snippets.
description: The Stack contains over 6TB of permissively-licensed source code
files covering 358 programming languages. The Stack serves as a pre-training
dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis
of programs from natural language descriptions as well as other from code snippets.
created_date: 2022-11-20
url: https://arxiv.org/pdf/2211.15533.pdf
datasheet: https://huggingface.co/datasets/bigcode/the-stack
modality: code
size: 6 TB
sample: [https://huggingface.co/datasets/bigcode/the-stack/viewer/default/train]
sample:
- https://huggingface.co/datasets/bigcode/the-stack/viewer/default/train
analysis: Evaluated models trained on The Stack on HumanEval and MBPP and compared
against similarly-sized models.
dependencies: [GitHub]
Expand All @@ -65,16 +75,20 @@
quality_control: allowed users whose data were part of The Stack's training data
to opt-out
access: open
license: The Stack is a collection of source code from repositories with various licenses. Any use of all or part of the code gathered in The Stack must abide by the terms of the original licenses, including attribution clauses when relevant. Provenance information is provided for each data point.
license: The Stack is a collection of source code from repositories with various
licenses. Any use of all or part of the code gathered in The Stack must abide
by the terms of the original licenses, including attribution clauses when relevant.
Provenance information is provided for each data point.
intended_uses: creating code LLMs
prohibited_uses: 'See https://huggingface.co/datasets/bigcode/the-stack'
prohibited_uses: See https://huggingface.co/datasets/bigcode/the-stack
monitoring: ''
feedback: 'https://huggingface.co/datasets/bigcode/the-stack/discussions'
feedback: https://huggingface.co/datasets/bigcode/the-stack/discussions
- type: model
name: StarCoder2-15B
organization: BigCode
description: StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded. The training was carried out using the Fill-in-the-Middle
objective on 4+ trillion tokens.
description: StarCoder2-15B model is a 15B parameter model trained on 600+ programming
languages from The Stack v2, with opt-out requests excluded. The training was
carried out using the Fill-in-the-Middle objective on 4+ trillion tokens.
created_date: 2024-02-28
url: https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html
model_card: https://huggingface.co/bigcode/starcoder2-15b
Expand All @@ -90,15 +104,22 @@
came from to apply the proper attribution.
access: open
license: BigCode OpenRail-M
intended_uses: The model was trained on GitHub code as well as additional selected data sources such as Arxiv and Wikipedia. As such it is not an instruction model and commands like "Write a function that computes the square root." do not work well. Intended to generate code snippets from given context, but not
for writing actual functional code directly.
prohibited_uses: 'See BigCode Open RAIL-M license and FAQ'
intended_uses: The model was trained on GitHub code as well as additional selected
data sources such as Arxiv and Wikipedia. As such it is not an instruction model
and commands like "Write a function that computes the square root." do not work
well. Intended to generate code snippets from given context, but not for writing
actual functional code directly.
prohibited_uses: See BigCode Open RAIL-M license and FAQ
monitoring: unknown
feedback: https://huggingface.co/bigcode/starcoder2-15b/discussions
- type: model
name: StarCoder2-7B
organization: BigCode
description: StarCoder2-7B model is a 7B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3.5+ trillion tokens.
description: StarCoder2-7B model is a 7B parameter model trained on 17 programming
languages from The Stack v2, with opt-out requests excluded. The model uses
Grouped Query Attention, a context window of 16,384 tokens with a sliding window
attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective
on 3.5+ trillion tokens.
created_date: 2024-02-28
url: https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html
model_card: https://huggingface.co/bigcode/starcoder2-7b
Expand All @@ -115,22 +136,31 @@
access: open
license: BigCode OpenRail-M
intended_uses: Intended to generate code snippets from given context, but not
for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.
prohibited_uses: 'See BigCode Open RAIL-M license and FAQ'
for writing actual functional code directly. The model has been trained on source
code from 17 programming languages. The predominant language in source is English
although other languages are also present. As such the model is capable of generating
code snippets provided some context but the generated code is not guaranteed
to work as intended. It can be inefficient and contain bugs or exploits. See
the paper for an in-depth discussion of the model limitations.
prohibited_uses: See BigCode Open RAIL-M license and FAQ
monitoring: unknown
feedback: https://huggingface.co/bigcode/starcoder2-7b/discussions
- type: model
name: StarCoder2-3B
organization: BigCode
description: StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.
description: StarCoder2-3B model is a 3B parameter model trained on 17 programming
languages from The Stack v2, with opt-out requests excluded. The model uses
Grouped Query Attention, a context window of 16,384 tokens with a sliding window
attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective
on 3+ trillion tokens.
created_date: 2024-02-28
url: https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html
model_card: https://huggingface.co/bigcode/starcoder2-3b
modality: code; text
analysis: See https://arxiv.org/pdf/2402.19173.pdf
size: 3B parameters (dense)
dependencies: [The Stack v2]
training_emissions: 16,107.01 kgCO2eq
training_emissions: 16,107.01 kgCO2eq
training_time: 97,120 hours (cumulative)
training_hardware: 160 A100 GPUs
quality_control: The model was filtered for permissive licenses and code with
Expand All @@ -139,7 +169,12 @@
access: open
license: BigCode OpenRail-M
intended_uses: Intended to generate code snippets from given context, but not
for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.
prohibited_uses: 'See BigCode Open RAIL-M license and FAQ'
for writing actual functional code directly. The model has been trained on source
code from 17 programming languages. The predominant language in source is English
although other languages are also present. As such the model is capable of generating
code snippets provided some context but the generated code is not guaranteed
to work as intended. It can be inefficient and contain bugs or exploits. See
the paper for an in-depth discussion of the model limitations.
prohibited_uses: See BigCode Open RAIL-M license and FAQ
monitoring: unknown
feedback: https://huggingface.co/bigcode/starcoder2-3b/discussions

0 comments on commit dfea0f7

Please sign in to comment.