Skip to content
This repository has been archived by the owner on Oct 7, 2021. It is now read-only.

Stop server from crashing when agent reports failure #91

Open
Mierdin opened this issue Sep 23, 2016 · 7 comments
Open

Stop server from crashing when agent reports failure #91

Mierdin opened this issue Sep 23, 2016 · 7 comments

Comments

@Mierdin
Copy link
Member

Mierdin commented Sep 23, 2016

Need to do something meaningful here, preferably passing status to client (as well as performing any cleanup actions)

ERRO[0101] Agent reported failure during testing

This probably has a bit to do with https://github.com/Mierdin/todd/issues/90 as well.

Previosly, there was a simple error message posted to the gatheredData map in executetestrun, and has since been commented out. The "status" of a testrun is tracked elsewhere, so these two things need to be coordinated

@hanya1203
Copy link

hanya1203 commented Feb 28, 2017

hello
I'm trying to test tod.
But got error
ERRO[0185] Agent reported failure during testing

I configured todd on CentOS7.
Server, agent, rabitmq server, etcd server is on one host.
rabbitmq 3.6.6-1.el6.noarch
etcd 2.3.7

run command is as follows.

root@localhost scripts]# todd run test-ping-dns-dc -j -y
RUNNING TEST: a9e6d3cf4595194b1c737bb41593ae4f555f3f753aee63415b00bd3818a8a7ba
(Please be patient while the test finishes...)
Problem subscribing to testrun updates stream: Invalid status received.
Will now watch the testrun metrics API for 45 seconds to see if we get a result that way. Please wait...
Failed to retrieve test data after 45 seconds. Something must be wrong -

Would you have any idea?
I can show you more infomation in need.

@hanya1203
Copy link

hanya1203 commented Feb 28, 2017

upload serve.cfg, agent.cfg
converted file name.

server.txt
agent.txt

@matthieugouel
Copy link

Hello !
When you have that error in the server logs, have you an error into the agent logs as well ?
If there is an error, can you paste it here ?

@Mierdin
Copy link
Member Author

Mierdin commented Mar 6, 2017

@hanya1203 Yes, please post full server and agent logs so we can help you out.

@hanya1203
Copy link

hanya1203 commented Mar 10, 2017

@Mierdin @slydetech
Thank you for your advise.
When I try to take logs I struck new error.
And server didn't recognize agents
Would you have any idea?

todd command log

[root@localhost todd]# docker exec toddserver todd agents
No agents found.

Agent log

INFO[0520] AGENTADV -- 2017-03-10 07:21:46.560882068 +0000 UTC
DEBU[0520] Agent task received: {"type":"DownloadAsset","assets":["http://\u003cnil\u003e:8090/factcollectors/get_hostname","http://\u003cnil\u003e:8090/factcollectors/get_addresses","http://\u003cnil\u003e:8090/testlets/http","http://\u003cnil\u003e:8090/testlets/iperf"]}
INFO[0520] Downloading http://:8090/factcollectors/get_hostname to /opt/todd/agent/assets/factcollectors/get_hostname
ERRO[0520] Error while downloading 'http://:8090/factcollectors/get_hostname': Get http://%3Cnil%3E:8090/factcollectors/get_hostname: dial tcp: lookup : invalid domain name
ERRO[0520] Get http://%3Cnil%3E:8090/factcollectors/get_hostname: dial tcp: lookup : invalid domain name
WARN[0520] The KeyValue task failed to initialize
DEBU[0522] Retrieving value of key - unackedGroup
DEBU[0524] Retrieving value of key - unackedGroup
DEBU[0526] Retrieving value of key - unackedGroup
DEBU[0528] Retrieving value of key - unackedGroup
DEBU[0530] Retrieving value of key - unackedGroup
DEBU[0530] Asset found locally: get_addresses (with hash e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
DEBU[0530] Asset found locally: get_hostname (with hash e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
DEBU[0530] Asset found locally: http (with hash e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
ERRO[0530] Problem running fact-gathering collector

Serverlog

INFO[0060] Beginning group calculation
INFO[0060] Getting /todd/agents' key value
WARN[0060] Agent list empty when queried
INFO[0060] Accessing objects at/todd/objects/group/
WARN[0060] ToDD object store empty when queried
DEBU[0060] Setting '/todd/groupmap' key
INFO[0060] Updated group map in etcd: {}
DEBU[0065] Loaded assets: map[factcollectors:map[get_addresses:630e6d795ee363e3741e0eba74350645c8ea1896bb1a4cb1c7f18c71fdee6462 get_hostname:e3dd6bd5888c29046bafcf013c258bbc32e4ff3c502a45b1ea5a125894db98a2] testlets:map[http:9bbb9448e6dfa8f0f09c70871f73a1176a3f2f95a3aeae940c3cd0079cbd5426 iperf:ff8ddea702fcf07bbabf3f6e78943d8be339626573931e5dcc5ef1bd8521d134]]
DEBU[0067] Agent advertisement recieved: {"Uuid":"cd11bcc3d24094564d52716bfcfef64778eab35d0f3208b9b7c5e95261e2f5ad","DefaultAddr":"172.17.0.4","Expires":0,"LocalTime":"2017-03-10T07:22:16.593628356Z","Facts":{},"FactCollectors":{"get_addresses":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","get_hostname":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"Testlets":{"http":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}}
WARN[0067] Agent cd11bcc3d24094564d52716bfcfef64778eab35d0f3208b9b7c5e95261e2f5ad did not have the required asset files. This advertisement is ignored.
DEBU[0067] Sent task to cd11bcc3d24094564d52716bfcfef64778eab35d0f3208b9b7c5e95261e2f5ad: {"type":"DownloadAsset","assets":["http://\u003cnil\u003e:8090/factcollectors/get_addresses","http://\u003cnil\u003e:8090/factcollectors/get_hostname","http://\u003cnil\u003e:8090/testlets/http","http://\u003cnil\u003e:8090/testlets/iperf"]}
DEBU[0070] Loaded assets: map[factcollectors:map[get_addresses:630e6d795ee363e3741e0eba74350645c8ea1896bb1a4cb1c7f18c71fdee6462 get_hostname:e3dd6bd5888c29046bafcf013c258bbc32e4ff3c502a45b1ea5a125894db98a2] testlets:map[http:9bbb9448e6dfa8f0f09c70871f73a1176a3f2f95a3aeae940c3cd0079cbd5426 iperf:ff8ddea702fcf07bbabf3f6e78943d8be339626573931e5dcc5ef1bd8521d134]]

@Mierdin
Copy link
Member Author

Mierdin commented Mar 11, 2017

@hanya1203 I moved this to #132 - let's please discuss this there

@namachieli
Copy link

A workaround for this im using in my testlets is to always return nil for err, but include in the returned data from the testlet {"result":"Successful"} or {"result":"UnSuccessful"}

That way i can still allow testing failures without a panic.

Granted this isn't preferable over gracefully handling failures, but its a workaround i can use for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants