You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to run the echo-server example but I'm having trouble communicating with the running server. I can start the server, but when I run the client (or even try to retrieve something from the key-value store) I get the same error skein.exceptions.ConnectionError: Unable to connect to application. As an example:
$ kinit
$ skein driver start
$ APPID=$(skein application submit ./spec.yaml)
$ python
>>> import skein
>>> client = skein.Client(log_level="debug")
21/03/24 16:34:54 DEBUG skein.Driver: Starting Skein version 0.8.1
21/03/24 16:34:54 DEBUG skein.Driver: Logging in using ticket cache
21/03/24 16:34:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/24 16:34:55 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/03/24 16:34:56 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:34:56 INFO client.AHSProxy: Connecting to Application History server at epod-master3.vgt.vito.be/192.168.207.58:10200
21/03/24 16:34:56 INFO skein.Driver: Driver started, listening on 45765
21/03/24 16:34:56 DEBUG skein.Driver: Reporting gRPC server port back to the launching process
>>> apps = client.get_applications()
21/03/24 16:35:34 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:35:34 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:35:34 INFO conf.Configuration: resource-types.xml not found
21/03/24 16:35:34 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
>>> app = client.connect(apps[0].id)
>>> app.ui
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 86, in __repr__
return "WebUI<address=%r>" % self.address
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 83, in address
return self._ui_info.address
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/utils.py", line 210, in __get__
res = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 59, in _ui_info
resp = self._client._call('UiInfo', proto.UIInfoRequest())
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/core.py", line 279, in _call
raise ConnectionError("Unable to connect to %s" % self._server_name)
skein.exceptions.ConnectionError: Unable to connect to application
Relevant logs/tracebacks
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:application.master.log
LogLastModifiedTime:Wed Mar 24 16:32:40 +0100 2021
LogLength:2073
LogContents:
21/03/24 16:32:37 INFO skein.ApplicationMaster: Starting Skein version 0.8.1
21/03/24 16:32:37 INFO skein.ApplicationMaster: Running as user luciof
21/03/24 16:32:37 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
21/03/24 16:32:37 INFO skein.ApplicationMaster: Application specification successfully loaded
21/03/24 16:32:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/24 16:32:38 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/03/24 16:32:38 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:32:38 INFO skein.ApplicationMaster: gRPC server started at epod071.vgt.vito.be:43071
21/03/24 16:32:39 INFO skein.ApplicationMaster: WebUI server started at epod071.vgt.vito.be:34293
21/03/24 16:32:39 INFO skein.ApplicationMaster: Registering application with resource manager
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:32:39 INFO client.AHSProxy: Connecting to Application History server at epod-master3.vgt.vito.be/192.168.207.58:10200
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:32:39 INFO skein.ApplicationMaster: Initializing service 'server'.
21/03/24 16:32:39 INFO skein.ApplicationMaster: REQUESTED: server_0
21/03/24 16:32:40 INFO skein.ApplicationMaster: Starting container_e4897_1611572280718_141323_01_000002...
21/03/24 16:32:40 INFO skein.ApplicationMaster: RUNNING: server_0 on container_e4897_1611572280718_141323_01_000002
End of LogType:application.master.log.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
***************************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:937
LogContents:
ls -l:
total 16
-rw-------. 1 luciof hadoop 491 Mar 24 16:32 container_tokens
-rwx------. 1 luciof hadoop 5303 Mar 24 16:32 launch_container.sh
drwxr-s---. 2 luciof hadoop 4096 Mar 24 16:32 tmp
find -L . -maxdepth 5 -ls:
144966289 4 drwxr-s--- 3 luciof hadoop 4096 Mar 24 16:32 .
144966295 4 -r-x------ 1 luciof luciof 1013 Mar 24 16:32 ./.skein.crt
144966299 4 -rw------- 1 luciof hadoop 491 Mar 24 16:32 ./container_tokens
144966298 8 -rwx------ 1 luciof hadoop 5303 Mar 24 16:32 ./launch_container.sh
168559658 4 -r-x------ 1 luciof luciof 1704 Mar 24 16:32 ./.skein.pem
127926301 4 -r-x------ 1 luciof luciof 1407 Mar 24 16:32 ./.skein.proto
144966297 4 drwxr-s--- 2 luciof hadoop 4096 Mar 24 16:32 ./tmp
144966292 7660 -r-x------ 1 luciof luciof 7842343 Mar 24 16:32 ./.skein.jar
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
*******************************************************************************
End of LogType:prelaunch.err.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
******************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:container-localizer-syslog
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:506
LogContents:
2021-03-24 16:32:35,545 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-03-24 16:32:36,471 WARN [ContainerLocalizer Downloader] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
End of LogType:container-localizer-syslog.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
*******************************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
******************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:launch_container.sh
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:5303
LogContents:
#!/bin/bash
set -o pipefail -e
export PRELAUNCH_OUT="/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/prelaunch.out"
exec >"${PRELAUNCH_OUT}"
export PRELAUNCH_ERR="/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/prelaunch.err"
exec 2>"${PRELAUNCH_ERR}"
echo "Setting up env variables"
export JAVA_HOME=${JAVA_HOME:-"/usr/java/default"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/3.1.4.0-315/hadoop/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/3.1.4.0-315/hadoop-yarn"}
export HADOOP_HOME=${HADOOP_HOME:-"/usr/hdp/3.1.4.0-315/hadoop"}
export PATH=${PATH:-"/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin"}
export HADOOP_TOKEN_FILE_LOCATION="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/container_tokens"
export CONTAINER_ID="container_e4897_1611572280718_141323_01_000001"
export NM_PORT="45454"
export NM_HOST="epod071.vgt.vito.be"
export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323,/data2/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323,/data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323"
export LOCAL_USER_DIRS="/data1/hadoop/yarn/local/usercache/luciof/,/data2/hadoop/yarn/local/usercache/luciof/,/data3/hadoop/yarn/local/usercache/luciof/"
export LOG_DIRS="/data1/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001,/data2/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001,/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001"
export USER="luciof"
export LOGNAME="luciof"
export HOME="/home/"
export PWD="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001"
export JVM_PID="$$"
export MALLOC_ARENA_MAX="4"
export NM_AUX_SERVICE_spark_shuffle=""
export NM_AUX_SERVICE_timeline_collector=""
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
export NM_AUX_SERVICE_spark2_shuffle=""
export SKEIN_APPLICATION_ID="application_1611572280718_141323"
export LANG="en_US.UTF-8"
export APP_SUBMIT_TIME_ENV="1616599954132"
export TIMELINE_FLOW_NAME_TAG="echoserver"
export TIMELINE_FLOW_VERSION_TAG="1"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1611572280718_141323"
export CLASSPATH="$CLASSPATH:./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*"
export TIMELINE_FLOW_RUN_ID_TAG="1616599954132"
echo "Setting up job resources"
ln -sf "/data2/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/11/.skein.pem" ".skein.pem"
ln -sf "/data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/13/.skein.proto" ".skein.proto"
ln -sf "/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/12/.skein.crt" ".skein.crt"
ln -sf "/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/10/skein.jar" ".skein.jar"
echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/launch_container.sh"
chmod 640 "/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
ls -l 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx128M -Dskein.log.level=INFO -Dskein.log.directory=/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001 com.anaconda.skein.ApplicationMaster hdfs://hacluster/user/luciof/.skein/application_1611572280718_141323 >/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/application.master.log 2>&1"
End of LogType:launch_container.sh.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
************************************************************************************
End of LogType:server.log.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
***************************************************************************
Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Mar 24 16:32:45 +0100 2021
LogLength:882033
LogContents:
ls -l:
total 24
-rw-------. 1 luciof hadoop 427 Mar 24 16:32 container_tokens
lrwxrwxrwx. 1 luciof hadoop 115 Mar 24 16:32 environment -> /data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/10/environment.tar.gz
-rwx------. 1 luciof hadoop 4791 Mar 24 16:32 launch_container.sh
lrwxrwxrwx. 1 luciof hadoop 106 Mar 24 16:32 server.py -> /data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/12/server.py
drwxr-s---. 2 luciof hadoop 4096 Mar 24 16:32 tmp
find -L . -maxdepth 5 -ls:
161091031 4 drwxr-s--- 3 luciof hadoop 4096 Mar 24 16:32 .
161090987 4 -r-x------ 1 luciof luciof 1334 Mar 24 16:32 ./server.py
161091039 8 -rwx------ 1 luciof hadoop 4791 Mar 24 16:32 ./launch_container.sh
42467355 4 -r-x------ 1 luciof luciof 49 Mar 24 16:32 ./.skein.sh
161091032 4 drwxr-s--- 2 luciof hadoop 4096 Mar 24 16:32 ./tmp
42467353 4 -r-x------ 1 luciof luciof 1704 Mar 24 16:32 ./.skein.pem
41033904 4 -r-x------ 1 luciof luciof 1013 Mar 24 16:32 ./.skein.crt
41025562 4 drwx------ 11 luciof luciof 4096 Mar 24 16:32 ./environment
41025565 4 drwx------ 3 luciof luciof 4096 Mar 24 16:32 ./environment/ssl
[rest of the unpacking]
161091040 4 -rw------- 1 luciof hadoop 427 Mar 24 16:32 ./container_tokens
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
*******************************************************************************
End of LogType:prelaunch.err.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
******************************************************************************
Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:container-localizer-syslog
LogLastModifiedTime:Wed Mar 24 16:32:42 +0100 2021
LogLength:506
LogContents:
2021-03-24 16:32:41,690 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-03-24 16:32:42,505 WARN [ContainerLocalizer Downloader] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
End of LogType:container-localizer-syslog.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
*******************************************************************************************
Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Mar 24 16:32:45 +0100 2021
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
******************************************************************************
Version information
Python version: 3.8.8
Hadoop version: 3.1.1.3.1.4.0-315
Skein version: 0.8.1
The text was updated successfully, but these errors were encountered:
Hi, I'm trying to run the echo-server example but I'm having trouble communicating with the running server. I can start the server, but when I run the client (or even try to retrieve something from the key-value store) I get the same error
skein.exceptions.ConnectionError: Unable to connect to application
. As an example:Version information
The text was updated successfully, but these errors were encountered: