-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recreate results found in table 1 #4
Comments
Hello, thanks for your question. Our recent results on another task set also reveal the performance decrease of gpt-3.5-instruct compared to text-davince-003 on decision-making tasks. Maybe this is attributed to the base capability variation of GPT models. Besides, it is weird that your history memory size is only 4 after an epoch of training on 10 tasks. Could you please have a double check on your training process? Currently, I don't find unusual arguments in your launch command. |
@zdy023 it seems I didn't run with --train arguments, now I get a much higher history memory size. However with the new arguments I do not yield a higher success rate ( 0.022 compare to 0.070 ) is this normal on your side? |
Hello, I don't think this is a normal result. Currently, I haven't conducted experiments on WebShop with gpt-instruct. I will follow your setting to try to reproduce the results in these weeks when I'm free. |
@theblackcat102 Hello, just for sure, are you using the model |
Hello, we conducted experiments with gpt-3.5-turbo-instruct and obtained the results as average score of 0.54 and success rate of 0.22. This is about a half performance of text-davinci-003, which is consistent with our observation on WikiHow task set. We plan to test more recent models in the following weeks. Once the results are ready, we will update it in the repository. |
Hi, I wanted to check if running launchw.sh is the command which recreates the number for table 1?
Cause I'm trying to rerun REMEMBERER for gpt-3.5-instruct-0913 due davinci-003 was no longer accessible from openai platform.
But the results I got is quite low with only 0.07 success rate
I was wonder if there's any params I didn't get right for the launchw.sh?
This was the command found in launchw.sh:
The text was updated successfully, but these errors were encountered: