-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JENKINS-39203 - Only show blocks containing a WarningAction as unstable #1954
Conversation
@@ -27,7 +27,7 @@ | |||
<revision>1.15.0</revision> | |||
<changelist>-SNAPSHOT</changelist> | |||
<java.level>8</java.level> | |||
<jenkins.version>2.121.1</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement--> | |||
<jenkins.version>2.138.4</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required for the workflow-api update. Current LTS is 2.164.2, not sure if you want to go all the way to that version though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you need to update some Dockerfile
as the comment asks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dockerfile uses FROM jenkins/jenkins:lts
, and that version is currently 2.164.2. I wasn't sure if we want to update to such a new version, hence my comment in the PR. Perhaps the comment in pom.xml should just be deleted now that the Dockerfile always tracks the latest LTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the comment in pom.xml should just be deleted now that the Dockerfile always tracks the latest LTS.
Or the Dockerfile
should be made to pin a version. Depends on what it is being used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would however update the one in Jenkinsfile (but that looks like its already 2.138.4, so it doesn't matter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, this line needs to be deleted or modified since we wouldn't be able to test on a baseline older than 2.138.4 anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead.
I did learn recently when i cleaned up the jenkinsfile, the weekly ath isn't run anymore anyways so no fears of breaking things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d5861ae.
@@ -45,6 +45,7 @@ | |||
<git.version>3.8.0</git.version> | |||
<access-modifier-checker.failOnError>true</access-modifier-checker.failOnError> | |||
<workflow-support.version>3.2</workflow-support.version> | |||
<pipeline-model-definition.version>1.3.7</pipeline-model-definition.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a property to make it easy to keep everything in sync, and the update is required for the JENKINS-43292 regression tests to work.
</dependency> | ||
<dependency> | ||
<groupId>org.jenkinsci.plugins</groupId> | ||
<artifactId>pipeline-model-extensions</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one of the other Declarative plugins, not sure why it wasn't here already.
assertStageAndEdges(nodes, "stage 1 marked as unstable", BlueRun.BlueRunState.NOT_BUILT, BlueRun.BlueRunResult.NOT_BUILT, "stage 2 wait"); | ||
assertStageAndEdges(nodes, "stage 2 wait", BlueRun.BlueRunState.FINISHED, BlueRun.BlueRunResult.UNSTABLE); | ||
assertStageAndEdges(nodes, "stage 1 marked as unstable", BlueRun.BlueRunState.FINISHED, BlueRun.BlueRunResult.UNSTABLE, "stage 2 wait"); | ||
assertStageAndEdges(nodes, "stage 2 wait", BlueRun.BlueRunState.FINISHED, BlueRun.BlueRunResult.SUCCESS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test changed, does it mean existing behaviour might break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this test in particular is the exact case described in JENKINS-39203. stage 2 wait
succeeded, but previously it would have been shown as unstable. The fix is that only the stage that actually set the status to unstable should be marked as unstable. I'm not sure why stage 1 was previously computed to be NOT_BUILT, but I will see if I can figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably all good, i just get super nervous when tests change. Josh will be able to give better feedback next week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this test is definitely expected to change since it was asserting exactly the behavior that we don't want anymore (i.e. it was testing the bug). No rush on reviewing, this is on top of 3 other PRs and will likely change over time, so probably better to wait to review until I upgrade it from draft status to real PR status anyway.
@@ -1,6 +1,6 @@ | |||
node { | |||
stage ("stage 1 marked as unstable") { | |||
currentBuild.result = 'UNSTABLE'; | |||
unstable('foo') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look into whether RunWrapper.setResult
can be deprecated. Tricky because this was done as a method on a POJO rather than a Step
, so we have no TaskListener
available. Ideally currentBuild
would be deprecated as a whole: jenkinsci/workflow-basic-steps-plugin#71
@@ -27,7 +27,7 @@ | |||
<revision>1.15.0</revision> | |||
<changelist>-SNAPSHOT</changelist> | |||
<java.level>8</java.level> | |||
<jenkins.version>2.121.1</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement--> | |||
<jenkins.version>2.138.4</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you need to update some Dockerfile
as the comment asks?
I can't seem to reply to the comment, but I think the comment was resolved a long time ago as https://github.com/jenkinsci/blueocean-plugin/blob/master/docker/official/Dockerfile says lts? maybe? |
Hrm, want to label this correctly, is it a bug or enhancement? I'm thinking bug fix more than new feature |
@halkeye I guess it's somewhat on the border between them, but from a user's perspective it probably makes the most sense to treat it as a bug. |
I wonder what release drafter will do if its labeled as both :) |
NodeGraphBuilder graph = NodeGraphBuilder.NodeGraphBuilderFactory.getInstance(run); | ||
List<FlowNodeWrapper> nodes = graph.getPipelineNodes(); | ||
|
||
// top should be BlueRunResult.FAILURE, but the status is currently computed from the beginning of top to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because StandardChunkVisitor
doesn't handled nested stages. See further discussion and an ignored test here: https://github.com/jenkinsci/pipeline-graph-analysis-plugin/pull/25/files#r277844096.
Closing and reopening to see if those ATH failures were flakes or not. |
Restart stage is still slightly flakey, but the other one is a new error |
Tried reproducing the |
All downstream changes have been released. Strictly speaking, the only necessary non-test change in this plugin is the one in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one @dwnusbaum, going to make lots of folks happy 🎉
When I get a few min I'll poke at the test and see if I can figure out what's wrong |
Ok, so from the last 3 builds, which don't have any interesting changes between them, it looks like there are some test flakes. In build 8, the following 2 tests failed:
In build 9, 4 tests failed:
In Build 10, 2 tests failed:
The AllTest.javaTest failure is consistent, but I don't understand what that test is doing. It looks like it just reruns the integration test using Based on the screenshot for |
yea, All test is just a wrapper to run all the other tests. So it always
fails if the others fail.
Restart stage and fav's have been flaking for a while, probably a 90%
success rate
PipelineNodeTest.nodesWithPartialParallels is new to me
Based on the screenshot for FavoritesCardTest, it looks like maybe the
build completed before the status was detected as running, so the test
timed out because the build was already done, but I'm not sure.
Yep thats about right, we've cleaned it up a lot, but there's still a
couple races we've missed.
…On Wed, May 15, 2019 at 2:25 PM Devin Nusbaum ***@***.***> wrote:
Ok, so from the last 3 builds, which don't have any interesting changes
between them, it looks like there are some test flakes.
In build 8, the following 2 tests failed:
- AllTest.javaTest
- RestartStageTest.restartStage
In build 9, 4 tests failed:
- AllTest.javaTest
- FavoritesCardsTest.testFreestyle
- PipelineNodeTest.nodesWithPartialParallels
- RestartStageTest.restartStage
In Build 10, 2 tests failed:
- AllTest.javaTest
- FavoritesCardsTest.testFreestyle
The AllTest.javaTest failure is consistent, but I don't understand what
that test is doing. It looks like it just reruns the integration test using mvn
test, but as part of the ATH? Or maybe that test is just a wrapper for
all of the other tests, so it fails if *any* ATH test fails?
Based on the screenshot
<https://ci.blueocean.io/job/blueocean/job/PR-1954/10/artifact/target/screenshots/io.blueocean.ath.offline.personalization.FavoritesCardsTest_testFreestyle.png>
for FavoritesCardTest, it looks like maybe the build completed before the
status was detected as running, so the test timed out because the build was
already done, but I'm not sure.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1954?email_source=notifications&email_token=AAA24B37SO62JNEASAO7XXTPVR5TDA5CNFSM4HHAJTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVP7W6A#issuecomment-492829560>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA24B7M2JUIMFOKDYG5XQDPVR5TDANCNFSM4HHAJTUA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM good job!
status = new NodeRunStatus( | ||
StatusAndTiming.computeChunkStatus2(run, nodeBefore, firstExecuted, lastNode, nodeAfter)); | ||
StatusAndTiming.computeChunkStatus2(run, nodeBefore, firstNode, lastNode, nodeAfter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this change wasn't strictly necessary, but it turns out that it is, and it fixes things like JENKINS-57579. I can't quite tell what firstExecuted
is expected to be, and it seems like it will vary based on context, but firstNode
is always (even for in-progress builds), going to be the start of a stage step block or parallel branch block, and that is what we expect over in StatusAndTiming.computeChunkStatus2
anyway.
My guess as to the problem is that in some cases firstExecuted
ends up being identical to or later than lastNode
in the graph, and so setting up the graph traversal here in StatusAndTiming.findWarningsBetween fails and the method returns null.
@halkeye Anything in particular you want me to look into here? It looks like we are just seeing a bunch of different flaky test failures. I looked into the |
nope, not right now, I've been poking at trying to fix them (#1962) but they are mostly timing based e2e tests that are failing. I've moved projects so I only have personal time to commit to this project now. |
Failing test in latest build#12 has been flakily failing on master as well across multiple build: https://ci.blueocean.io/blue/organizations/jenkins/blueocean/detail/PR-1954/12/tests/. I can't get ATH to run locally to reproduce to see why this test is flaky. In any case, fixing this flaky test is out of scope of this PR. Given wider interest in this PR, I propose we merge it and address flaky ATH tests separately. @michaelneale @halkeye @kshultzCB |
looks like a good improvement - not sure of impact of dependency changes (and the new pipeline dependency - but likely it is fine). Looks like an improvement of things anyway. |
@dwnusbaum so mostly flakes that fail that you see? |
…thPartialParallels for precise waiting
pipeline-model-extensions was already a dependency through pipeline-model-definition, it just wasn't declared explicitly in pom.xml.
Yes, as far as I can tell the ATH failures are not consistent. The one non-ATH failure that seems a bit sketchy is I went ahead and pushed 629d57f, which changes that test from guessing how long to wait using |
Rebuilding due to infra failure:
|
Latest test failure is |
woo hoo green ! |
@dwnusbaum nice job, and nice blog post: https://jenkins.io/blog/2019/07/05/jenkins-pipeline-stage-result-visualization-improvements/ |
Description
See jenkinsci/workflow-api-plugin#91, jenkinsci/pipeline-graph-analysis-plugin#25, jenkinsci/workflow-basic-steps-plugin#83, and JENKINS-39203. (I am intentionally not updating the status of that ticket or assigning it to myself until I am ready to merge/release all relevant changes to avoid giving users premature hope of a fix and so that I have clear answers for what will and won't work and when any fixes will be released.)
This PR adjusts the nodes used when calling
StatusAndTiming.computeChunkStatus2
for stages. Previously,before
was the stage step start,firstNode
was the first atom node in the stage,lastNode
was the end of the stage step's body, andafter
was the stage step end. Now,firstNode
is the start of the stage step's body. This change matches the standard behavior ofStageChunkFinder
, which means that we can create tests in pipeline-graph-analysis that actually match the way the APIs are used by Blue Ocean rather than being subtly different, and it makes it easier to look through the current block inside ofcomputeChunkStatus2
(I am a little curious about whetherfirstExecuted
can be removed entirely, but haven't tested that out yet).This PR also adds regression tests for JENKINS-43292.
Screenshot with the changes (without the changes, all of the yellow exclamation points would be green check marks:
Submitter checklist
Reviewer checklist