-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training SAC with raw image as input #25
Comments
hello, |
The policy that I used is DDPG and SAC. I have updated on the issue above. Thanks for your reply~ |
I wanted to say "policy architecture", it seems that you are not using a CNN if you are using the default hyperparameters... This explains your results. |
Yes I am using the default hyperparameters.... May I know which part should I change in order to using raw image to train a SAC model? In the sac.yml, change the policy from policy: 'MlpPolicy' to policy: 'CnnPolicy' ? |
I would recommend you to read stable-baselines documentation and look at the rl zoo, you have plenty of examples of RL with images. |
Hello, I change the policy to |
@ChunJyeBehBeh did you manage to train without VAE ? |
@ChunJyeBehBeh @Adnan-annan I am also trying to train without VAE. Did you have any success yet? would you mind sharing your results and methods you've tried? |
The policy that I have tried is DDPG and SAC. I used master branch and below is the two command to reproduce the error.
python train.py --algo sac -n 5000
python train.py --algo ddpg -n 5000
Thanks for this good repo. Is a very good start to learning reinforcement learning in autonomous driving area. I had successfully trained a SAC model using VAE as input.
Now I want to try using raw image as input. I have set
N_COMMAND_HISTORY
to zero. I use the master branch. For the first 300 steps, the steering and throttle will be varied between -1 and 1 because of the sampling random action.https://github.com/araffin/learning-to-drive-in-5-minutes/blob/fb82bc77593605711289e03f95dcfb6d3ea9e6c3/algos/custom_sac.py#L89
But after that, the policy will keep output the extreme value either 1 or 1 for the steering value. So the donkey car will go out the lane quickly and it will keep repeat without showing any learning progress.
The image below showed that the episode step drop 95 to 50 after the policy start to output the action.
Below is the plot of throttle value output [SAC with raw image input]. It keep constant at 1 after few episode.
Below is the plot of throttle value output [SAC with vae input]. The model tried to learn how to steer and vary the output between -1 and 1.
Sorry for keep open issues.
The text was updated successfully, but these errors were encountered: