Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skein.exceptions.ConnectionError: Unable to connect to application #231

Open
lucio-f opened this issue Mar 24, 2021 · 0 comments
Open

skein.exceptions.ConnectionError: Unable to connect to application #231

lucio-f opened this issue Mar 24, 2021 · 0 comments

Comments

@lucio-f
Copy link

lucio-f commented Mar 24, 2021

Hi, I'm trying to run the echo-server example but I'm having trouble communicating with the running server. I can start the server, but when I run the client (or even try to retrieve something from the key-value store) I get the same error skein.exceptions.ConnectionError: Unable to connect to application. As an example:

$ kinit
$ skein driver start
$ APPID=$(skein application submit ./spec.yaml)
$ python

>>> import skein
>>> client = skein.Client(log_level="debug")
21/03/24 16:34:54 DEBUG skein.Driver: Starting Skein version 0.8.1
21/03/24 16:34:54 DEBUG skein.Driver: Logging in using ticket cache
21/03/24 16:34:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/24 16:34:55 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/03/24 16:34:56 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:34:56 INFO client.AHSProxy: Connecting to Application History server at epod-master3.vgt.vito.be/192.168.207.58:10200
21/03/24 16:34:56 INFO skein.Driver: Driver started, listening on 45765
21/03/24 16:34:56 DEBUG skein.Driver: Reporting gRPC server port back to the launching process
>>> apps = client.get_applications()
21/03/24 16:35:34 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:35:34 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:35:34 INFO conf.Configuration: resource-types.xml not found
21/03/24 16:35:34 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
>>> app = client.connect(apps[0].id)
>>> app.ui
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 86, in __repr__
    return "WebUI<address=%r>" % self.address
  File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 83, in address
    return self._ui_info.address
  File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/utils.py", line 210, in __get__
    res = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 59, in _ui_info
    resp = self._client._call('UiInfo', proto.UIInfoRequest())
  File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/core.py", line 279, in _call
    raise ConnectionError("Unable to connect to %s" % self._server_name)
skein.exceptions.ConnectionError: Unable to connect to application
  • Relevant logs/tracebacks
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:application.master.log
LogLastModifiedTime:Wed Mar 24 16:32:40 +0100 2021
LogLength:2073
LogContents:
21/03/24 16:32:37 INFO skein.ApplicationMaster: Starting Skein version 0.8.1
21/03/24 16:32:37 INFO skein.ApplicationMaster: Running as user luciof
21/03/24 16:32:37 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
21/03/24 16:32:37 INFO skein.ApplicationMaster: Application specification successfully loaded
21/03/24 16:32:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/24 16:32:38 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/03/24 16:32:38 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:32:38 INFO skein.ApplicationMaster: gRPC server started at epod071.vgt.vito.be:43071
21/03/24 16:32:39 INFO skein.ApplicationMaster: WebUI server started at epod071.vgt.vito.be:34293
21/03/24 16:32:39 INFO skein.ApplicationMaster: Registering application with resource manager
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:32:39 INFO client.AHSProxy: Connecting to Application History server at epod-master3.vgt.vito.be/192.168.207.58:10200
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:32:39 INFO skein.ApplicationMaster: Initializing service 'server'.
21/03/24 16:32:39 INFO skein.ApplicationMaster: REQUESTED: server_0
21/03/24 16:32:40 INFO skein.ApplicationMaster: Starting container_e4897_1611572280718_141323_01_000002...
21/03/24 16:32:40 INFO skein.ApplicationMaster: RUNNING: server_0 on container_e4897_1611572280718_141323_01_000002
End of LogType:application.master.log.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
***************************************************************************************


Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:937
LogContents:
ls -l:
total 16
-rw-------. 1 luciof hadoop  491 Mar 24 16:32 container_tokens
-rwx------. 1 luciof hadoop 5303 Mar 24 16:32 launch_container.sh
drwxr-s---. 2 luciof hadoop 4096 Mar 24 16:32 tmp
find -L . -maxdepth 5 -ls:
144966289    4 drwxr-s---   3 luciof   hadoop       4096 Mar 24 16:32 .
144966295    4 -r-x------   1 luciof   luciof       1013 Mar 24 16:32 ./.skein.crt
144966299    4 -rw-------   1 luciof   hadoop        491 Mar 24 16:32 ./container_tokens
144966298    8 -rwx------   1 luciof   hadoop       5303 Mar 24 16:32 ./launch_container.sh
168559658    4 -r-x------   1 luciof   luciof       1704 Mar 24 16:32 ./.skein.pem
127926301    4 -r-x------   1 luciof   luciof       1407 Mar 24 16:32 ./.skein.proto
144966297    4 drwxr-s---   2 luciof   hadoop       4096 Mar 24 16:32 ./tmp
144966292 7660 -r-x------   1 luciof   luciof    7842343 Mar 24 16:32 ./.skein.jar
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
*******************************************************************************


End of LogType:prelaunch.err.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
******************************************************************************


Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:container-localizer-syslog
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:506
LogContents:
2021-03-24 16:32:35,545 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-03-24 16:32:36,471 WARN [ContainerLocalizer Downloader] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
End of LogType:container-localizer-syslog.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
*******************************************************************************************


Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
******************************************************************************


Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:launch_container.sh
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:5303
LogContents:
#!/bin/bash

set -o pipefail -e
export PRELAUNCH_OUT="/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/prelaunch.out"
exec >"${PRELAUNCH_OUT}"
export PRELAUNCH_ERR="/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/prelaunch.err"
exec 2>"${PRELAUNCH_ERR}"
echo "Setting up env variables"
export JAVA_HOME=${JAVA_HOME:-"/usr/java/default"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/3.1.4.0-315/hadoop/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/3.1.4.0-315/hadoop-yarn"}
export HADOOP_HOME=${HADOOP_HOME:-"/usr/hdp/3.1.4.0-315/hadoop"}
export PATH=${PATH:-"/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin"}
export HADOOP_TOKEN_FILE_LOCATION="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/container_tokens"
export CONTAINER_ID="container_e4897_1611572280718_141323_01_000001"
export NM_PORT="45454"
export NM_HOST="epod071.vgt.vito.be"
export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323,/data2/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323,/data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323"
export LOCAL_USER_DIRS="/data1/hadoop/yarn/local/usercache/luciof/,/data2/hadoop/yarn/local/usercache/luciof/,/data3/hadoop/yarn/local/usercache/luciof/"
export LOG_DIRS="/data1/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001,/data2/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001,/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001"
export USER="luciof"
export LOGNAME="luciof"
export HOME="/home/"
export PWD="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001"
export JVM_PID="$$"
export MALLOC_ARENA_MAX="4"
export NM_AUX_SERVICE_spark_shuffle=""
export NM_AUX_SERVICE_timeline_collector=""
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
export NM_AUX_SERVICE_spark2_shuffle=""
export SKEIN_APPLICATION_ID="application_1611572280718_141323"
export LANG="en_US.UTF-8"
export APP_SUBMIT_TIME_ENV="1616599954132"
export TIMELINE_FLOW_NAME_TAG="echoserver"
export TIMELINE_FLOW_VERSION_TAG="1"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1611572280718_141323"
export CLASSPATH="$CLASSPATH:./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*"
export TIMELINE_FLOW_RUN_ID_TAG="1616599954132"
echo "Setting up job resources"
ln -sf "/data2/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/11/.skein.pem" ".skein.pem"
ln -sf "/data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/13/.skein.proto" ".skein.proto"
ln -sf "/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/12/.skein.crt" ".skein.crt"
ln -sf "/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/10/skein.jar" ".skein.jar"
echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/launch_container.sh"
chmod 640 "/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
ls -l 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx128M -Dskein.log.level=INFO -Dskein.log.directory=/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001 com.anaconda.skein.ApplicationMaster hdfs://hacluster/user/luciof/.skein/application_1611572280718_141323 >/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/application.master.log 2>&1"
End of LogType:launch_container.sh.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
************************************************************************************


End of LogType:server.log.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
***************************************************************************


Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Mar 24 16:32:45 +0100 2021
LogLength:882033
LogContents:
ls -l:
total 24
-rw-------. 1 luciof hadoop  427 Mar 24 16:32 container_tokens
lrwxrwxrwx. 1 luciof hadoop  115 Mar 24 16:32 environment -> /data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/10/environment.tar.gz
-rwx------. 1 luciof hadoop 4791 Mar 24 16:32 launch_container.sh
lrwxrwxrwx. 1 luciof hadoop  106 Mar 24 16:32 server.py -> /data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/12/server.py
drwxr-s---. 2 luciof hadoop 4096 Mar 24 16:32 tmp
find -L . -maxdepth 5 -ls:
161091031    4 drwxr-s---   3 luciof   hadoop       4096 Mar 24 16:32 .
161090987    4 -r-x------   1 luciof   luciof       1334 Mar 24 16:32 ./server.py
161091039    8 -rwx------   1 luciof   hadoop       4791 Mar 24 16:32 ./launch_container.sh
42467355    4 -r-x------   1 luciof   luciof         49 Mar 24 16:32 ./.skein.sh
161091032    4 drwxr-s---   2 luciof   hadoop       4096 Mar 24 16:32 ./tmp
42467353    4 -r-x------   1 luciof   luciof       1704 Mar 24 16:32 ./.skein.pem
41033904    4 -r-x------   1 luciof   luciof       1013 Mar 24 16:32 ./.skein.crt
41025562    4 drwx------  11 luciof   luciof       4096 Mar 24 16:32 ./environment
41025565    4 drwx------   3 luciof   luciof       4096 Mar 24 16:32 ./environment/ssl
[rest of the unpacking]
161091040    4 -rw-------   1 luciof   hadoop        427 Mar 24 16:32 ./container_tokens
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
*******************************************************************************


End of LogType:prelaunch.err.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
******************************************************************************


Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:container-localizer-syslog
LogLastModifiedTime:Wed Mar 24 16:32:42 +0100 2021
LogLength:506
LogContents:
2021-03-24 16:32:41,690 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-03-24 16:32:42,505 WARN [ContainerLocalizer Downloader] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
End of LogType:container-localizer-syslog.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
*******************************************************************************************


Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Mar 24 16:32:45 +0100 2021
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
******************************************************************************
  • Version information

    • Python version: 3.8.8
    • Hadoop version: 3.1.1.3.1.4.0-315
    • Skein version: 0.8.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant