Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple starting issues #23

Open
rtasik opened this issue Apr 3, 2019 · 7 comments
Open

Multiple starting issues #23

rtasik opened this issue Apr 3, 2019 · 7 comments

Comments

@rtasik
Copy link

rtasik commented Apr 3, 2019

Dear community,

first of all, I am not able to install torcs 1.3.6 by following your commands you provided. I tried this multiple times, every time reverting my VM, following your version 1 branch and again trying with the master branch and so on, but after the installation, torcs is just not found. On my other VM, I tried applying MADRaS with the gym-torcs 1.3.1, which I have installed already based on the repository of yanpanlau (DDPG-Keras-TensorFlow). I just wanted to explore DDPG driving with multiple vehicles, this is how came to your repository. I dont know if the version is the issue, but it would be sufficient enough if I get the experiments run, even with version 1.3.1. So I am trying to get some experiments run, based on your description, by just running the examples, which ends up problematic. More precisely:

Behavior reflex -> single agent:
For Quickrace, one scr_server is already selected. Then I close torcs and go with the following:

robert@robert-VirtualBox:~/Desktop/MADRaS/MADRaS$ python3 -m example_controllers.behavior_reflex.playGame_DDPG 3101
is_training : 1
Starting best_reward : -10000
600000.0
6000
10000
1
config_file : ~/.torcs/config/raceman/quickrace.xml
2019-04-03 11:20:54.783635: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-04-03 11:20:54.795673: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2019-04-03 11:20:54.801725: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x34ab010 executing computations on platform Host. Devices:
2019-04-03 11:20:54.801761: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/robert/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Could not find old network weights
I have been asked to use port:  3101
Trying to set connection
Trying to establish connection
Waiting for server on 3101............
Count Down : 15
Trying to establish connection
Waiting for server on 3101............
Count Down : 14
Trying to establish connection
Waiting for server on 3101............

I tried again, but this time starting a quickrace in torcs due to "waiting for server on 3101" and executing again with port 3101. Interestingly, the simulation starts by displaying the vehicle and the track and so on, but it is not starting to drive. From the first console, where i started torcs, i get:

Visual Properties Report
------------------------
Compatibility mode, properties unknown.
Can't open file tracks/oval/backyard4/backyard4.png
gfParmSetStr: fopen (config/raceman/quickrace.xml, "wb") failed
WARNING: grscene:initBackground Failed to open shadow2.rgb for reading
WARNING:         no shadow mapping on cars for this track 
Waiting for request on port 3101
OpenAL backend info:
  Vendor: OpenAL Community
  Renderer: OpenAL Soft
  Version: 1.1 ALSOFT 1.18.2
  Available sources: 256
  Available buffers: 1024 or more
  Dynamic Sources: requested: 235, created: 235
  #static sources: 21
  #dyn sources   : 235
gfParmSetStr: fopen (config/graph.xml, "wb") failed
Timeout for client answer
Timeout for client answer
Timeout for client answer
Timeout for client answer

and from the second, where I executed the start for the single agent, it looks familiar again:

robert@robert-VirtualBox:~/Desktop/MADRaS/MADRaS$ python3 -m example_controllers.behavior_reflex.playGame_DDPG 3101
is_training : 1
Starting best_reward : -10000
600000.0
6000
10000
1
config_file : ~/.torcs/config/raceman/quickrace.xml
2019-04-03 11:30:29.466835: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-04-03 11:30:29.470110: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2019-04-03 11:30:29.470318: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x1967420 executing computations on platform Host. Devices:
2019-04-03 11:30:29.470376: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/robert/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Could not find old network weights
I have been asked to use port:  3101
Trying to set connection
Trying to establish connection
Trying to set connection
Trying to establish connection
Waiting for server on 3101............
Count Down : 15
Trying to establish connection
Waiting for server on 3101............
Count Down : 14
Trying to establish connection
Waiting for server on 3101............

Behavior reflex -> multiple agents

robert@robert-VirtualBox:~/Desktop/MADRaS/MADRaS$ python3 -m example_controllers.behavior_reflex.multi_agent
numb of workers is3
2019-04-03 11:36:17.432516: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-04-03 11:36:17.439372: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2019-04-03 11:36:17.439582: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x1cf2d40 executing computations on platform Host. Devices:
2019-04-03 11:36:17.439647: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/robert/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
/home/robert/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1702: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).
  warnings.warn('An interactive session is already active. This can '
Could not find old network weights
I have been asked to use port:  3001
Trying to set connection
Trying to establish connection
Could not find old network weights
I have been asked to use port:  3002
Trying to set connection
Trying to establish connection
Waiting for server on 3001............
Count Down : 15
Trying to establish connection
Could not find old network weights
I have been asked to use port:  3003
Trying to set connection
Trying to establish connection
Waiting for server on 3002............
Count Down : 15
Trying to establish connection
Waiting for server on 3001............
Count Down : 14
Trying to establish connection
Waiting for server on 3003............
Count Down : 15
Trying to establish connection
Waiting for server on 3002............
Count Down : 14
Trying to establish connection
Waiting for server on 3001............
Count Down : 13
Trying to establish connection
Waiting for server on 3003............
Count Down : 14
Trying to establish connection
Waiting for server on 3002............

PID -> single agent

robert@robert-VirtualBox:~/Desktop/MADRaS/MADRaS$ python3 -m example_controllers.pid.playGame_DDPG_pid 3001
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/robert/Desktop/MADRaS/MADRaS/example_controllers/pid/playGame_DDPG_pid.py", line 15, in <module>
    from utils.gym_madras import MadrasEnv
ModuleNotFoundError: No module named 'utils.gym_madras'

For the multple agents file, I receive the same as for single agent.

I am hoping that the cause of the problem is not version 1.3.1 because I saw the vehicle, the track, etc. thus the simulation is able to work properly. I think that actually starting it right is rather the issue and maybe the missing utils.gym_madras. Can somebody please help in any way? I am becoming really desperate and for any help I would be really grateful!

Best regards,
Robert

@rudrasohan
Copy link
Member

rudrasohan commented Apr 3, 2019

Hello Robert,

Thanks a lot for identifying these issues.

The wiki for the Version1 Branch has not been properly updated in a long time, which is why you faced the above problems.

  • A crucial step was missing in the instructions for running which was sh scripts/startTorcs.sh. Running this script in a separate terminal tab automatically relaunches TORCS when an episode ends by calling this. Sometimes this script autostart.sh may need a bit tinkering as for each machine xte(uses virtual keystrokes) reacts a bit differently and depends on which state the sim window opens each time.

  • For training & saving in the Version 1 toggle train_indicator. Before running in training mode you would be required to make two folders named weights & save_network_checkpoints at the base of the repo. The behavioral_reflex & pid uses these respectively for saving.

  • For running the experiments you need to specify the port through which the scripts interact with the sim. The port which you use depends on the scr-server you have chosen, for e.g. scr-server1 waits on port 3001 and so on.

  • For TORCS you can use the TORCS install instructions on the master even for Version1 branch as that version contains some major fixes over the original.

We will be updating the Version1 wiki ASAP.

@rtasik
Copy link
Author

rtasik commented Apr 4, 2019

Hello Sohan!

Thank you very much for answering that quick! Also thank you for clarifying some stuff.

I launched sh startTorcs.sh and clicked on Quick Race -> New Race and then the scr_server is loading and Waiting for request on port 3101. On the other shell, I am executing python3 -m example_controllers.behavior_reflex.playGame_DDPG 3101. So the port is correct, the simulation starts, but the vehicle is not driving, one the shell with sh startTorcs.sh is prompting timeouts and the other shell tries to connect, as you may observe from my attached screen image below.

playGame_DDPG

For the multi_agent experiment, the vehicles are there but they just remain idle. The result is the same as with the single agent, only that three instances are trying to connect:

robert@robert-VirtualBox:~$ cd Desktop/MADRaS/MADRaS/
robert@robert-VirtualBox:~/Desktop/MADRaS/MADRaS$ python3 -m example_controllers.behavior_reflex.multi_agent
numb of workers is3
2019-04-04 10:52:19.272680: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-04-04 10:52:19.279133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2019-04-04 10:52:19.279341: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x1895370 executing computations on platform Host. Devices:
2019-04-04 10:52:19.279413: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/robert/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
/home/robert/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1702: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).
  warnings.warn('An interactive session is already active. This can '
Could not find old network weights
I have been asked to use port:  3101
Trying to set connection
Trying to establish connection
Trying to set connection
Trying to establish connection
Waiting for server on 3101............
Count Down : 15
Trying to establish connection
Could not find old network weights
I have been asked to use port:  3102
Trying to set connection
Trying to establish connection
Trying to set connection
Trying to establish connection
Could not find old network weights
I have been asked to use port:  3103
Trying to set connection
Trying to establish connection
Trying to set connection
Trying to establish connection
Waiting for server on 3101............
Count Down : 14
Trying to establish connection
Waiting for server on 3102............
Count Down : 15
Trying to establish connection
Waiting for server on 3103............
Count Down : 15
Trying to establish connection
Waiting for server on 3101............
Count Down : 13
Trying to establish connection
Waiting for server on 3102............
Count Down : 14
Trying to establish connection
Waiting for server on 3103............

Also for multi_agent, training is activated with 1. The folders weights and save_network_checkpoints were in the base of the repo as you said, so in Desktop/MADRaS . With this placement, the training did not occur. Are the folders really in the correct directory?

Maybe, there is another code fragment or line that needs to be adjusted for trainign/testing the vehicles? I simply don't know why it is not working since the ports are correct. For the PID examples, I still receive both errors, which i described in my first post above.

I am running with Ubuntu 18.04.2 and i cannot install the Version1 torcs, nor the master torcs. I have followed the installation guide, but for both versions, torcs is not found. I have also reported this to the torcs members. But I am still trying to install... But as the simulation opens and reacts to MADRaS, i guess the version is not dependent. I am wondering even more, why I was able to install version 1.3.1.

Best,
Robert

@rudrasohan
Copy link
Member

rudrasohan commented Apr 5, 2019

When you are using python3 -m example_controllers.behavior_reflex.playGame_DDPG 3101 the last value specifies the port number, so it must be set accordingly.

For running the experiments you need to specify the port through which the scripts interact with the sim. The port which you use depends on the scr-server you have chosen, for e.g. scr-server1 waits on port 3001 and so on.

So what you need to do is replace 3101 with 3001 python3 -m example_controllers.behavior_reflex.playGame_DDPG 3001 for single agent training with the vehicle scr-server1.

I think multi-agent training is also being started as the port was changed. And for multi-agent training do make sure that you have got in as many scr-servers as specified in the code. See the running instructions on how to add more scr-servers.

@rtasik
Copy link
Author

rtasik commented Apr 6, 2019

I am aware of the specification of the port numbers. Despite the correct selection of the ports, the server yields Timeout for client answer and the client Waiting for server.

SingleAgent

Replacing 3101 with 3001 does not solve this issue, since first the race is not starting at all, and secondly, the scr_server1 is Waiting for request on port 3101 and not 3001. The same issue holds for multi_agent.

@rudrasohan
Copy link
Member

Which version of torcs are you using? Is this a different one from the one specified in our repo.

@rtasik
Copy link
Author

rtasik commented Apr 6, 2019

I am applying torcs version 1.3.1, which is based on the gym-torcs repo https://github.com/ugo-nama-kun/gym_torcs. As mentioned in my first comment, I was not able to install the torcs version from your repo based on your given installation instructions. For this installation, I created a clean VM and tried to install your torcs 1.3.6, with exactly the same commands.

Thus, make && make install yields:

Bildschirmfoto 2019-04-06 um 20 34 33

I tried executing both commands separately with a preceding sudo. I get the following errors:

sudo make:
sudomakeinstall

sudo make install:
sudomake

Do you think that I am still missing any packages? I have installed all packages that are listed on your README.

@rtasik
Copy link
Author

rtasik commented Apr 7, 2019

I was able to install the torcs from your repo, so torcs 1.3.6 is able to run. The single agent as well as the multi agent from behavior reflex are connecting. But again, I am facing some issues:

  • When starting training with the behavior reflex single agent, in the first episode, the car turns right and drives against a wall. Then, due to the startTorcs.sh, torcs is relaunched and displays SELECT RACE but the other shell continues what looks like training with the single agent by displaying increments of the episode and the steps. Selecting then a race in torcs displays again the car turning right and driving to a wall, but this does not seem right, does it?

  • After some training episodes I receive from torcs cannot bind socket that prevents then starting torcs

  • For the pid training with single as well as multi agent I receive the following:

robert@robert-VirtualBox:~/Desktop/MADRaS/MADRaS$ python3 -m example_controllers.pid.playGame_DDPG_pid 3001
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/robert/Desktop/MADRaS/MADRaS/example_controllers/pid/playGame_DDPG_pid.py", line 15, in <module>
    from utils.gym_madras import MadrasEnv
ModuleNotFoundError: No module named 'utils.gym_madras'
robert@robert-VirtualBox:~/Desktop/MADRaS/MADRaS$
  • For the command python3 -m example_controllers.pid.multi_agent, the above error is the same. What may I do to solve this?

Can you please help me out again? I would really appreciate any further help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants