Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configure yarn cluster in skein #208

Open
priyamgupta01 opened this issue Feb 7, 2020 · 5 comments
Open

configure yarn cluster in skein #208

priyamgupta01 opened this issue Feb 7, 2020 · 5 comments

Comments

@priyamgupta01
Copy link

Where we need to provide the configurations of yarn in skein. By default when I submit the skein application using "skein application submit sample.yaml" it tries to connect on 0.0.0.0:8032

log:
INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

As my yarn cluster is remote, how can I provide the details of it in skein.

sample.yaml:

name: hello_world
queue: default

master:
resources:
vcores: 1
memory: 512 MiB
script: |
sleep 60
echo "Hello World!"

@Kriszhou1
Copy link

Have you solved the problem?

@priyamgupta01
Copy link
Author

priyamgupta01 commented Mar 9, 2020 via email

@georgepachitariu
Copy link

I think that I can answer this:
Skein uses the environment variables from Hadoop. So you need to set this:
export HADOOP_CONF_DIR=/my/hadoop_conf

I haven't tried it to run like this, but depending on the error that you will get afterwards you might need to add also the Hadoop jar libraries. For now just post here the error you get afterwards.

@wgwi
Copy link

wgwi commented May 2, 2020

We had same Error, While normal MR job works.
Each time we submit a skein job, it will always stunk at ACCEPT stage. After a few time, the job failed. We do has HADOOP_HOME, HADOOP_CONF_DIR setting. Track the logs:

licy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
20/05/01 13:26:07 INFO retry.RetryInvocationHandler: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort, while invoking ApplicationMasterProtocolPBClientImpl.registerApplicationMaster over null after 16 failover attempts. Trying to failover after sleeping for 20759ms.
20/05/01 13:26:29 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)"

Try to use IPYTHON to find out why:

In [20]: app = client.submit_and_connect(spec)                                               
20/05/02 08:42:58 INFO skein.Driver: Uploading application resources to hdfs://node14:9000/user/root/.skein/application_1588338572036_0007
20/05/02 08:42:58 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
20/05/02 08:42:58 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
20/05/02 08:42:58 DEBUG skein.Driver: Writing script for service 'hello' to hdfs://node14:9000/user/root/.skein/application_1588338572036_0007/hello.sh
20/05/02 08:42:58 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
20/05/02 08:42:58 DEBUG skein.Driver: Uploading file:/opt/anaconda3/lib/python3.7/site-packages/skein/java/skein.jar to hdfs://node14:9000/user/root/.skein/application_1588338572036_0007/skein.jar
20/05/02 08:42:58 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
20/05/02 08:42:59 DEBUG skein.Driver: Writing application specification to hdfs://node14:9000/user/root/.skein/application_1588338572036_0007/.skein.proto
20/05/02 08:42:59 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
20/05/02 08:42:59 INFO skein.Driver: Submitting application...
20/05/02 08:42:59 INFO impl.YarnClientImpl: Submitted application application_1588338572036_0007
20/05/02 08:42:59 DEBUG skein.Driver: New watcher callback requested for application application_1588338572036_0007
20/05/02 08:42:59 DEBUG skein.Driver: No watcher exists for application_1588338572036_0007, creating one

then it stop here.

@wgwi
Copy link

wgwi commented May 10, 2020

After complied skein with source, and with mini yarn configuration, 8031, 8032,... addresses, everything OK with Hadoop 3.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants