-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when to leverage COT for answering #5
Comments
The current evaluation of chartqa in the article does not use the instruction template of mathqa. We have tested that using the instruction template of mathqa to ask mathmatical questions in chartqa can continue to improve the accuracy compared with the accuracy in the current version of the paper (we will update the paper later). As for when to use the COT form to answer this question, we think it should depend on the user (consistent with think step-by-step when using gpt), such as What is the difference of the x1 and the x2? Of course the user can Use the normal QA template to ask questions, but this question is obviously a mathematical question, so users should be more inclined to use the mathematical template to ask questions,which can obtain more accurate results than normal QA (this conclusion is also verified in the paper) |
Thank you for your response. Regarding the application of the instruction template, could you please clarify whether it was utilized across all questions within the ChartQA dataset to enhance accuracy, or was it specifically employed only for mathematical questions? |
As I said in my previous answer, the accuracy of chartqa in the current version of the article is all measured using ordinary QA instructions, so the problem you mentioned does not exist; Secondly, we later changed the mathematical problems in chartqa to mathematical templates (most of the problems in chartqa are element extraction and mathematical problems). This method can continue to improve the accuracy of chartqa based on the current version of the article. We will update the article later and make the instructions.json for testing chartqa public, but for the current version of the article, this is not needed because the normal qa template is used for all chartqa |
Looking forward to your update!!! |
Could you please make the instructions.json file available at your earliest convenience? |
For this version, we just use normal QA template, which is in accessory/single_turn_eval.py, you can refer the issue 6. |
Hi, i notice that in the updated version, the performance of MathQA is not changed but their is performance increase on ChartQA. It seems you just changed the evaluation process. Could you provide any details about the evaluation of ChartQA in the current version? Thanks a lot! |
We change the test instruction of ChartQA, use the instruction template for mathQA in the mathmatical problem in ChartQA, while we just use normal QA template for ChartQA earlier. |
Thank you for the excellent work! After a thorough review of the paper, I have some inquiries:
It's clear that the COT command line is utilized for generating responses to numerical questions involving charts. I noticed that COT is applied within the MathQA dataset as mentioned in your paper. However, for other datasets, was COT employed consistently? How do you determine when to leverage COT for answering, and when to provide direct responses? Specifically, in the evaluation of the ChartQA dataset, was COT used?
I eagerly await your response.
The text was updated successfully, but these errors were encountered: