Skip to content

InfiniteAICreations/awesome-llm-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ˜Ž Awesome lists about all kinds of LLM related datasets

Mathematics Datasets

  • Automated Programming Progress Standard: A collection of 12,500 challenging mathematical problems from competitions, providing step-by-step solutions for training models in answer derivation and explanation generation
  • GSM8k Dataset: A collection of 8,500 grade school math problems. This dataset tests the multi-step reasoning abilities of models, highlighting their limitations despite the simplicity of the problems
  • MathQA:A large-scale dataset of math word problems.
  • AQUA-RAT: A algebraic word problem dataset, with multiple choice questions annotated with rationales.

Coding Datasets

Perplexity(PPL)

CommonSenseQA

MMLU

Image

Medical Science

Cartoon Animation Dataset

Web dataset

  • MS MARCO Web Search: A large-scale information-rich web dataset, featuring millions of real clicked query-document labels

Conversational Datasets

About

๐Ÿ˜Ž Awesome lists about all kinds of LLM related datasets.

Resources

License

Stars

Watchers

Forks