Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run eqPly when socket connection cannot be made #650

Open
mateooo300 opened this issue Apr 29, 2019 · 11 comments
Open

How to run eqPly when socket connection cannot be made #650

mateooo300 opened this issue Apr 29, 2019 · 11 comments

Comments

@mateooo300
Copy link

Cannot run any of the example programs because the program cannot connect complete the handshake. Error attached below.

PID.Thread | Filename:line | ms | Message
3736.Main ../eq/init.cpp:115 0 Equalizer v2.1.0 initializing
3736.Main /Pression/pression/plugin.cpp:167 1 Loaded 22 plugins from libPression.so.2.0.0
3736.Main /Pression/pression/plugin.cpp:167 1 Loaded 22 plugins from /home/matthew/Equalizer2/Equalizer/build/lib/libPression.so.2.0.0
3736.Main /Pression/pression/plugin.cpp:167 1 Loaded 22 plugins from /home/matthew/Equalizer2/Equalizer/build/lib/Debug/libPression.so.2.0.0
3736.Main /Pression/pression/plugin.cpp:167 3 Loaded 30 plugins from /mnt/to_hdd1/matthew/Equalizer2/Equalizer/build/lib/libEqualizerCompressor.so
3736.Main ../Lunchbox/lunchbox/init.cpp:70 3 Log level DEBUG topics 4 date Sun Apr 28 19:48:51 2019
3736.Cmd0 /Lunchbox/lunchbox/thread.cpp:155 5 Thread #3 type co::detail::CommandThread successfully initialized
3736.Rcv0 /Lunchbox/lunchbox/thread.cpp:155 5 Thread #2 type co::detail::ReceiverThread successfully initialized
3736.Main ../Collage/co/localNode.cpp:395 5 node 1ed07d1060b23668:6d2e12a6d268894a listening
3736.Main ../eq/fabric/client.cpp:74 5 Connecting server TCPIP#0#localhost##2032#default#
3736.Main ../Collage/co/localNode.cpp:1004 6 Connecting RP[7:node 36476dcdf8e2d45c:2d28029123718bb4 closed, TCPIP#0#localhost##2032#default#]
3736.Main llage/co/socketConnection.cpp:193 6 Could not connect to 'localhost:2032': Connection refused (111)
3736.Main ../Collage/co/localNode.cpp:1021 6 Node RP[7:node 36476dcdf8e2d45c:2d28029123718bb4 closed, TCPIP#102400#localhost##2032#default#] unreachable, all connections failed to connect
3736.Cmd1 /Lunchbox/lunchbox/thread.cpp:155 0 Thread #5 type co::detail::CommandThread successfully initialized
3736.Rcv1 /Lunchbox/lunchbox/thread.cpp:155 1 Thread #4 type co::detail::ReceiverThread successfully initialized
3736.Main ../Collage/co/localNode.cpp:395 1 node f7ed5c6bc7b62b2e:d4f88630a91b8bbe listening
3736.1 /Lunchbox/lunchbox/thread.cpp:155 1 Thread #1 type eq::server::(anonymous namespace)::ServerThread successfully initialized
Running server:
#Equalizer 0 ascii

global
{
}
server 
{
}

3736.Server ../eq/server/server.cpp:107 1
3736.Main ../Collage/co/localNode.cpp:1057 1 Node connection handshake timeout - RP[6:node 36476dcdf8e2d45c:2d28029123718bb4 closed] not a Collage node?
3736.Main ../examples/eqPly/eqPly.cpp:98 1 Can't open server
3736.Rcv0 ../Collage/co/localNode.cpp:1808 1 Can't create node of type 256, disconnecting
3736.Rcv1 ./Collage/co/fdConnection.cpp:82 2 Got EOF, closing ANON_PIPE#1024000###0#default#
3736.Rcv1 ../Collage/co/connection.cpp:282 2 Read on dead connection
3736.Cmd0 ../Collage/co/worker.ipp:53 2 Leaving worker thread co::WorkerThreadco::CommandQueue*
3736.Rcv0 ../Collage/co/localNode.cpp:1367 2 Leaving receiver thread of co::LocalNode*
3736.Main ../Collage/co/localNode.cpp:538 2 1 nodes connected during cleanup
3736.Main ../Collage/co/localNode.cpp:545 2 RP[4:node 1ed07d1060b23668:6d2e12a6d268894a closed]
3736.Main ../Collage/co/localNode.cpp:409 2 0 connections open after close
3736.Main ../Collage/co/init.cpp:84 3 DataOStream compressed 0 -> 0 of 268 @ -2147483648 MB/s 0 runs, saved 0 of 0 brutto sent (-nan%)

@eile
Copy link
Member

eile commented Apr 30, 2019

Which Linux are you using? For some reason Collage can't send any data through a pipe() created to the process-local server. I've never seen this before.

@mateooo300
Copy link
Author

Using Ubuntu version 14.04.5. Could the issue be stemming from the fact that I am attempting to run the program via SSH, and I am not localhost? When I specify --eq-server it seems to take the host name and IP credentials, however Collage is still unable to establish a connection.

@eile
Copy link
Member

eile commented May 2, 2019

Running OpenGL programs through SSH is of little use nowadays, as indirekt rendering is very poorly supported. I haven't tested this setup for at least 10 years. If you really need to run remotely, I recommend a VirtualGL setup.

@mateooo300
Copy link
Author

The same issue arises when the example is ran locally as well as through VirtualGL. Could the issue be stemming from anything else? Could this be a firewall issue since the connection is being refused? We turned our firewall off in an attempt to establish a connection, but still no luck. I installed Equalizer under my user profile, which is not under /opt. Does it need to be installed somewhere else for appropriate connections to be established?

@mateooo300
Copy link
Author

Also, am i supposed to start the 'eqServer' first? When I do that, and specify the server credentials to 'eqPly' I get the following error:

PID.Thread | Filename:line | ms | Message
5610.Main ../eq/init.cpp:115 1 Equalizer v2.1.0 initializing
5610.Main /Pression/pression/plugin.cpp:167 1 Loaded 22 plugins from libPression.so.2.0.0
5610.Main /Pression/pression/plugin.cpp:167 1 Loaded 22 plugins from /home/matthew/Equalizer2/Equalizer/build/lib/libPression.so.2.0.0
5610.Main /Pression/pression/plugin.cpp:167 1 Loaded 22 plugins from /home/matthew/Equalizer2/Equalizer/build/lib/Debug/libPression.so.2.0.0
5610.Main /Pression/pression/plugin.cpp:167 3 Loaded 30 plugins from Equalizer2/Equalizer/build/lib/libEqualizerCompressor.so
5610.Main ../Lunchbox/lunchbox/init.cpp:70 4 Log level DEBUG topics 4 date Mon May 6 17:37:17 2019
5610.Rcv0 /Lunchbox/lunchbox/thread.cpp:155 5 Thread #2 type co::detail::ReceiverThread successfully initialized
5610.Cmd0 /Lunchbox/lunchbox/thread.cpp:155 5 Thread #3 type co::detail::CommandThread successfully initialized
5610.Main ../Collage/co/localNode.cpp:395 5 node 62aff2aa52890235:4995eaae636af959 listening
5610.Main ../eq/fabric/client.cpp:74 5 Connecting server TCPIP#0#caes_wks2##2032#default#
5610.Main ../Collage/co/localNode.cpp:1004 5 Connecting RP[7:node 24c13fff5e3fafee:8de285d2e319cf40 closed, TCPIP#0#caes_wks2##2032#default#]
5610.Main llage/co/socketConnection.cpp:202 6 Connected TCPIP#102400#caes_wks2##2032#default#
5610.Main ../Collage/co/localNode.cpp:1057 6 Node connection handshake timeout - RP[8:node 24c13fff5e3fafee:8de285d2e319cf40 closed, TCPIP#102400#caes_wks2##2032#default#] not a Collage node?
5610.Main ../seq/detail/application.cpp:70 6 Can't open Equalizer server
5610.Cmd0 ../Collage/co/worker.ipp:53 7 Leaving worker thread co::WorkerThreadco::CommandQueue*
5610.Rcv0 ../Collage/co/localNode.cpp:580 7 RP[3:node 9daaf4d159bde9d1:ea8f662180b95ac6 closed] disconnected from node 62aff2aa52890235:4995eaae636af959 closed
5610.Rcv0 chbox/lunchbox/referenced.cpp:34 7 Assert: _refCount == 0 [Deleting object with ref count 1] in:
11: lunchbox::abort(bool)
10: lunchbox::Referenced::~Referenced()
9: co::Buffer::~Buffer()
8: co::Buffer::~Buffer()
7: co::detail::BufferCache::flush()
6: co::BufferCache::flush()
5: co::LocalNode::_runReceiverThread()
4: co::detail::ReceiverThread::run()
3: lunchbox::Thread::_runChild()
2: lunchbox::Thread::runChild(void*)
1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f9d10f58184]
0: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f9d1000503d]

And the following information is generated in the terminal window which has the server running:

5368.Rcv0 llage/co/socketConnection.cpp:431 263237 Accepted TCPIP#102400#127.0.0.1##41406#default#
5368.Rcv0 ./Collage/co/fdConnection.cpp:82 263238 Got EOF, closing TCPIP#102400#127.0.0.1##41406#default#
5368.Rcv0 ../Collage/co/connection.cpp:282 263238 Read on dead connection
5368.Rcv0 /Collage/co/connectionSet.cpp:614 263238 Cannot select connection RP[5:co::SocketConnection 0x7f7c480038c0 state closed description TCPIP#102400#127.0.0.1##41406#default#], connection co::Connection* doesn't have a file descriptor
5368.Rcv0 ../Collage/co/localNode.cpp:580 263238 RP[3:node 62aff2aa52890235:4995eaae636af959 closed] disconnected from node 9daaf4d159bde9d1:ea8f662180b95ac6 listening, TCPIP#102400#caes_wks2##2032#default#

@mateooo300
Copy link
Author

I attempted to compile the program on yet another machine with no firewall protocols, and completely updated versions of C++, CMake, Ninja, and BOOST. We also tried it using a different graphics card. We received the same exact errors as before. Do you think you could provide us with version numbers for all the necessary components that are needed to compile eqPly. Also, do you have a compiled example executives that we can try so that we can trouble shoot further? Thanks

@mateooo300
Copy link
Author

Any luck on this issue?

@eile
Copy link
Member

eile commented Jun 13, 2019

I'm not sure what's going on on your system. I'm right now on a stock debian, with a fresh checkout and compile and everything works.

  • You should not need to launch a server manually
  • Your problem is that the client can't communicate with the in-process server. Your separate server has the same issue. 'Node connection handshake timeout' means that the client connected to the server, but the initial handshake did not get anything back from the server.

Can you run coNetperf locally:

  • ./bin/coNetperf -s :4242
  • ./bin/coNetperf -c :4242 in another shell

You should see Send perf: 6324.23MB/s (6324.23pps) messages

@mateooo300
Copy link
Author

mateooo300 commented Jun 13, 2019

Thank you for getting back to me.

When I run coNetperf on client side I see:

PID.Thread | Filename:line | ms | Message
26230.Main ../Lunchbox/lunchbox/init.cpp:70 0 Log level DEBUG topics 4 date Thu Jun 13 09:05:04 2019
26230.Main llage/co/socketConnection.cpp:202 1 Connected TCPIP#102400#127.0.0.1##4242#default#
Send perf: 3636.74MB/s (3636.74pps)
Send perf: 4047.74MB/s (4047.74pps)
Send perf: 4000.82MB/s (4000.82pps)......

When I run on server side I see:

PID.Thread | Filename:line | ms | Message
26228.Main ../Lunchbox/lunchbox/init.cpp:70 0 Log level DEBUG topics 4 date Thu Jun 13 09:04:53 2019
26228.1 llage/co/socketConnection.cpp:704 1 Listening on caes_wks2[*]:4242 (TCPIP#102400#caes_wks2##4242#default#)
26228.1 llage/co/socketConnection.cpp:431 10804 Accepted TCPIP#102400#127.0.0.1##56120#default#
26228.1 /Lunchbox/lunchbox/thread.cpp:155 10805 Thread #1 type (anonymous namespace)::Selector successfully initialized
Recv perf: 3604.17MB/s (3604.17pps) from TCPIP#102400#127.0.0.1##56120#default#
Recv perf: 4048.28MB/s (4048.28pps) from TCPIP#102400#127.0.0.1##56120#default#
Recv perf: 4000.33MB/s (4000.33pps) from TCPIP#102400#127.0.0.1##56120#default#

So they're communicating, however the example program still timed out. Do I have a mismatch somewhere in the client or server?

@eile
Copy link
Member

eile commented Jun 14, 2019

So they're communicating, however the example program still timed out. Do I have a mismatch somewhere in the client or server?

If you're not hand-starting the server, that can't happen.

Let's move up one level: Please retry the same with coNodePerf. This should use a similar code path as the client/server handshake.

@mateooo300
Copy link
Author

When I run coNodePerf on server side I see:
./coNodeperf -s :4242
PID.Thread | Filename:line | ms | Message
18423.Main ../Lunchbox/lunchbox/init.cpp:70 0 Log level DEBUG topics 4 date Fri Jun 14 09:12:30 2019
18423.Cmd0 /Lunchbox/lunchbox/thread.cpp:155 2 Thread #2 type co::detail::CommandThread successfully initialized
18423.Rcv0 /Lunchbox/lunchbox/thread.cpp:155 2 Thread #1 type co::detail::ReceiverThread successfully initialized
18423.Main ../Collage/co/localNode.cpp:395 2 node e326aec2b0dee12c:d151883e1b1008a2 listening
18423.Main llage/co/socketConnection.cpp:704 3 Listening on caes_wks2[*]:43472 (TCPIP#102400#caes_wks2##43472#default#)

When I run coNodePerf on client side I see:

./coNodeperf -c :4242
PID.Thread | Filename:line | ms | Message
18503.Main ../Lunchbox/lunchbox/init.cpp:70 0 Log level DEBUG topics 4 date Fri Jun 14 09:13:15 2019
18503.Cmd0 /Lunchbox/lunchbox/thread.cpp:155 1 Thread #2 type co::detail::CommandThread successfully initialized
18503.Rcv0 /Lunchbox/lunchbox/thread.cpp:155 1 Thread #1 type co::detail::ReceiverThread successfully initialized
18503.Main ../Collage/co/localNode.cpp:395 1 node b64e68d5ba8fdd12:56202c66bff6b88f listening
18503.Main ../Collage/co/localNode.cpp:1004 2 Connecting RP[4:node 1d789f8d5dfa3a2b:a1917fd8c4e38776 closed, TCPIP#0###4242#default#]
18503.Main llage/co/socketConnection.cpp:193 2 Could not connect to '127.0.0.1:4242': Connection refused (111)
18503.Main ../Collage/co/localNode.cpp:1021 2 Node RP[4:node 1d789f8d5dfa3a2b:a1917fd8c4e38776 closed, TCPIP#102400#127.0.0.1##4242#default#] unreachable, all connections failed to connect
18503.Main llage/co/socketConnection.cpp:704 2 Listening on caes_wks2[*]:40233 (TCPIP#102400#caes_wks2##40233#default#)

The program seems to hangup when trying to connect to my localhost 127.0.0.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants