You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this feature belongs to this library or would it require a complete separate library. I am proposing the creation of a library where llm benchmarks can be ran. For example, evaluating a model on HumanEval. Such library would make evaluating LLMs much easier. I liked the dockerized approach they used at https://github.com/NVlabs/verilog-eval to safely evaluate code. Evaluating math and reasoning skill of llm could also be benificial.
The text was updated successfully, but these errors were encountered:
Not sure if this feature belongs to this library or would it require a complete separate library. I am proposing the creation of a library where llm benchmarks can be ran. For example, evaluating a model on HumanEval. Such library would make evaluating LLMs much easier. I liked the dockerized approach they used at https://github.com/NVlabs/verilog-eval to safely evaluate code. Evaluating math and reasoning skill of llm could also be benificial.
The text was updated successfully, but these errors were encountered: