-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and report corrupted files #7867
Detect and report corrupted files #7867
Conversation
I need to rebase |
Jenkins results:
|
e37c9bc
to
43d4a34
Compare
Jenkins results:
|
43d4a34
to
6e28a8c
Compare
Jenkins results:
|
I tested this locally on crab-dev-tw01 But surely a look by a second couple of eyes will be good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Stefano. Please see the inline comments.
good points. I have more fundamental issues now, see #7548 , but will get to these as well. |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
I did some changes and after the gfal timeout and this https://github.com/belforte/CRABServer/blob/7be608a0a25fb4149b0ed3ff7a8a68312450d2d1/src/python/TaskWorker/Actions/RetryJob.py#L313-L321 @novicecpp maybe you can consider using this you your copy-cat tests, to give it a first shake ? |
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (+an inline suggestion).
Thanks Stefano.
""" | ||
check if job stdout contains a message indicating a corrupted file and reports this | ||
via a json file taskname.corrupted.job.<crabid>.<retry>.json | ||
returns True/Falso accordingly to corrupted yes/no | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
check if job stdout contains a message indicating a corrupted file and reports this | |
via a json file taskname.corrupted.job.<crabid>.<retry>.json | |
returns True/Falso accordingly to corrupted yes/no | |
""" | |
""" | |
check if job stdout contains a message indicating a corrupted file and reports this | |
via a json file taskname.corrupted.job.<crabid>.<retry>.json | |
returns True/Falso accordingly to corrupted yes/no | |
Ref: https://github.com/dmwm/CRABServer/issues/7548 | |
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ref for contexts and the schema of logs we are trying to parse.
Sure. I will deploy it tomorrow, in test12. |
Jenkins results:
|
a first step toward fixing #7548
this PR includes code to write a JSON file in
/eos/cms/store/temp/user/corrupted/
which we can use to check that things are OK before callingrucioClient.declare_suspicious_file_replicas()
There are also a few simple pylint fixes