Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow new steps/actions if the isCompleted flag is set #31

Open
PeterAJansen opened this issue Oct 28, 2022 · 8 comments
Open

Disallow new steps/actions if the isCompleted flag is set #31

PeterAJansen opened this issue Oct 28, 2022 · 8 comments

Comments

@PeterAJansen
Copy link
Collaborator

Certain conditions set the isCompleted flag to be set -- for example, a negative score in the task, signifying task failure:

Right now we set the isCompleted flag to be true, but if an agent doesn't check for this, it's still possible for it to continue to send commands to the environment. There is a report that this might let the tasks be gamed (especially the forced-choice tasks), as the agent might be able to take steps that ultimately further increase the score. We should likely modify the step() code so that if the isCompleted flag is set, it disallows further steps to be processed, to prevent any issues with agents reporting erroneously high scores in the future.

@MarcCote
Copy link
Collaborator

MarcCote commented Nov 1, 2022

Should that be handled on the server side?

@PeterAJansen
Copy link
Collaborator Author

It sounds like a good idea to handle it on the server side.

It's also been suggested that we have this as a flag that can be set, so let folks choose whether they continue to run after task failure -- that should be easy enough to add, as long as we make sure that folks report this in their papers, and add some very obvious output somewhere to remind folks which mode they're using. :)

@MarcCote
Copy link
Collaborator

MarcCote commented Nov 3, 2022

That seems weird. What is the need for ignoring task failure? Do they want a no-goal variation, ie just exploration mode?

@yuchenlin
Copy link
Member

yuchenlin commented Apr 5, 2023

Hey Marc and Peter,

I actually really want the feature for "ignoring task failure" so that we can have an env that enables agent to learn from their failures and acquire the knowledge for avoid such failures. I can manually set this to be False, but it seems that I cannot change the negative score (-1) back to its original score on server side, even though the game can be continued. Any suggestions?

More ideas on this issue:
I think there are two types of task failure: 1) some actions that cannot reverses and these are indeed need to be disallowed, and 2) some failures caused by unclear issues.
Say, in the task of boil something. I found that if agent pick up X and X is in its inventory. Then, when the agent wants to focus on X, then it will cause a negative score and the task ends. This might be a bit too strict for evaluation.

Thank you very much! :D

@PeterAJansen
Copy link
Collaborator Author

The issue with focusing is that it's a critical part of the evaluation for most tasks, and that removing the hard criterion that the agent has to focus on the right thing would allow an agent to quickly game the tasks. For example, if the task is to measure something, and focus on box A if it's greater than some threshold, and focus on box B if it's less than some threshold, the agents will quickly learn to just focus on box A then B and get 100% task performance in two steps. Adding the focus mechanism was a method of (a) ensuring that the actions are intentional, and (b) providing a method of scoring that doesn't rely on natural language generation, but instead uses an analog of a forced choice task. If you remove the failure of the forced choice task, then it's like being able to select every multiple choice answer on a multiple choice test. :)

@yuchenlin
Copy link
Member

Thank you very much for the explanations, Peter! :D Really appreciate it.

@MarcCote
Copy link
Collaborator

Say, in the task of boil something. I found that if agent pick up X and X is in its inventory. Then, when the agent wants to focus on X, then it will cause a negative score and the task ends. This might be a bit too strict for evaluation.

That seems like an issue. The agent should be able to focus on the task object even if it is already in its inventory.

@PeterAJansen
Copy link
Collaborator Author

Hmm, I think in this case the object it's focusing on isn't a task object, right @yuchenlin ? Could you copy/paste the playthrough, and we could figure out if there's an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants