JENKINS-39203 - Only show blocks containing a WarningAction as unstable #1954

dwnusbaum · 2019-04-18T21:01:49Z

Description

See jenkinsci/workflow-api-plugin#91, jenkinsci/pipeline-graph-analysis-plugin#25, jenkinsci/workflow-basic-steps-plugin#83, and JENKINS-39203. (I am intentionally not updating the status of that ticket or assigning it to myself until I am ready to merge/release all relevant changes to avoid giving users premature hope of a fix and so that I have clear answers for what will and won't work and when any fixes will be released.)

This PR adjusts the nodes used when calling StatusAndTiming.computeChunkStatus2 for stages. Previously, before was the stage step start, firstNode was the first atom node in the stage, lastNode was the end of the stage step's body, and after was the stage step end. Now, firstNode is the start of the stage step's body. This change matches the standard behavior of StageChunkFinder, which means that we can create tests in pipeline-graph-analysis that actually match the way the APIs are used by Blue Ocean rather than being subtly different, and it makes it easier to look through the current block inside of computeChunkStatus2 (I am a little curious about whether firstExecuted can be removed entirely, but haven't tested that out yet).

This PR also adds regression tests for JENKINS-43292.

Screenshot with the changes (without the changes, all of the yellow exclamation points would be green check marks:

Submitter checklist

Link to JIRA ticket in description, if appropriate.
Change is code complete and matches issue description
Appropriate unit or acceptance tests or explanation to why this change has no tests
Reviewer's manual test instructions provided in PR description. See Reviewer's first task below.

Reviewer checklist

Run the changes and verified the change matches the issue description
Reviewed the code
Verified that the appropriate tests have been written or valid explanation given

dwnusbaum · 2019-04-18T21:02:44Z

pom.xml

@@ -27,7 +27,7 @@
    <revision>1.15.0</revision>
    <changelist>-SNAPSHOT</changelist>
    <java.level>8</java.level>
-    <jenkins.version>2.121.1</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement-->
+    <jenkins.version>2.138.4</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement-->


Required for the workflow-api update. Current LTS is 2.164.2, not sure if you want to go all the way to that version though.

So do you need to update some Dockerfile as the comment asks?

The Dockerfile uses FROM jenkins/jenkins:lts, and that version is currently 2.164.2. I wasn't sure if we want to update to such a new version, hence my comment in the PR. Perhaps the comment in pom.xml should just be deleted now that the Dockerfile always tracks the latest LTS.

Perhaps the comment in pom.xml should just be deleted now that the Dockerfile always tracks the latest LTS.

Or the Dockerfile should be made to pin a version. Depends on what it is being used for.

I would however update the one in Jenkinsfile (but that looks like its already 2.138.4, so it doesn't matter)

Ah good point, this line needs to be deleted or modified since we wouldn't be able to test on a baseline older than 2.138.4 anymore.

Go ahead.

I did learn recently when i cleaned up the jenkinsfile, the weekly ath isn't run anymore anyways so no fears of breaking things

Done in d5861ae.

dwnusbaum · 2019-04-18T21:03:51Z

pom.xml

@@ -45,6 +45,7 @@
    <git.version>3.8.0</git.version>
    <access-modifier-checker.failOnError>true</access-modifier-checker.failOnError>
    <workflow-support.version>3.2</workflow-support.version>
+    <pipeline-model-definition.version>1.3.7</pipeline-model-definition.version>


Moved to a property to make it easy to keep everything in sync, and the update is required for the JENKINS-43292 regression tests to work.

dwnusbaum · 2019-04-18T21:05:25Z

pom.xml

+        </dependency>
+        <dependency>
+            <groupId>org.jenkinsci.plugins</groupId>
+            <artifactId>pipeline-model-extensions</artifactId>


Just one of the other Declarative plugins, not sure why it wasn't here already.

halkeye · 2019-04-18T21:21:41Z

...ipeline-api-impl/src/test/java/io/jenkins/blueocean/rest/impl/pipeline/GraphBuilderTest.java

-        assertStageAndEdges(nodes, "stage 1 marked as unstable", BlueRun.BlueRunState.NOT_BUILT, BlueRun.BlueRunResult.NOT_BUILT, "stage 2 wait");
-        assertStageAndEdges(nodes, "stage 2 wait", BlueRun.BlueRunState.FINISHED, BlueRun.BlueRunResult.UNSTABLE);
+        assertStageAndEdges(nodes, "stage 1 marked as unstable", BlueRun.BlueRunState.FINISHED, BlueRun.BlueRunResult.UNSTABLE, "stage 2 wait");
+        assertStageAndEdges(nodes, "stage 2 wait", BlueRun.BlueRunState.FINISHED, BlueRun.BlueRunResult.SUCCESS);


This test changed, does it mean existing behaviour might break?

Yeah this test in particular is the exact case described in JENKINS-39203. stage 2 wait succeeded, but previously it would have been shown as unstable. The fix is that only the stage that actually set the status to unstable should be marked as unstable. I'm not sure why stage 1 was previously computed to be NOT_BUILT, but I will see if I can figure it out.

Its probably all good, i just get super nervous when tests change. Josh will be able to give better feedback next week

Yeah this test is definitely expected to change since it was asserting exactly the behavior that we don't want anymore (i.e. it was testing the bug). No rush on reviewing, this is on top of 3 other PRs and will likely change over time, so probably better to wait to review until I upgrade it from draft status to real PR status anyway.

jglick · 2019-04-22T19:01:46Z

...impl/pipeline/earlyUnstableStatusShouldReportPunStateAsRunningAndResultAsUnknown.jenkinsfile

@@ -1,6 +1,6 @@
 node {
    stage ("stage 1 marked as unstable") {
-        currentBuild.result = 'UNSTABLE';
+        unstable('foo')


We should look into whether RunWrapper.setResult can be deprecated. Tricky because this was done as a method on a POJO rather than a Step, so we have no TaskListener available. Ideally currentBuild would be deprecated as a whole: jenkinsci/workflow-basic-steps-plugin#71

jglick · 2019-04-22T19:02:47Z

pom.xml

@@ -27,7 +27,7 @@
    <revision>1.15.0</revision>
    <changelist>-SNAPSHOT</changelist>
    <java.level>8</java.level>
-    <jenkins.version>2.121.1</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement-->
+    <jenkins.version>2.138.4</jenkins.version> <!-- Should be kept in sync with the version resulting from Dockerfile FROM statement-->


So do you need to update some Dockerfile as the comment asks?

halkeye · 2019-04-22T19:04:21Z

I can't seem to reply to the comment, but I think the comment was resolved a long time ago as https://github.com/jenkinsci/blueocean-plugin/blob/master/docker/official/Dockerfile says lts? maybe?

halkeye · 2019-04-23T15:18:43Z

Hrm, want to label this correctly, is it a bug or enhancement? I'm thinking bug fix more than new feature

dwnusbaum · 2019-04-23T15:20:14Z

@halkeye I guess it's somewhat on the border between them, but from a user's perspective it probably makes the most sense to treat it as a bug.

halkeye · 2019-04-23T15:20:52Z

I wonder what release drafter will do if its labeled as both :)

dwnusbaum · 2019-04-23T19:51:35Z

...ipeline-api-impl/src/test/java/io/jenkins/blueocean/rest/impl/pipeline/GraphBuilderTest.java

+        NodeGraphBuilder graph = NodeGraphBuilder.NodeGraphBuilderFactory.getInstance(run);
+        List<FlowNodeWrapper> nodes = graph.getPipelineNodes();
+
+        // top should be BlueRunResult.FAILURE, but the status is currently computed from the beginning of top to the


This is because StandardChunkVisitor doesn't handled nested stages. See further discussion and an ignored test here: https://github.com/jenkinsci/pipeline-graph-analysis-plugin/pull/25/files#r277844096.

dwnusbaum · 2019-05-14T21:13:20Z

Closing and reopening to see if those ATH failures were flakes or not.

halkeye · 2019-05-15T17:49:47Z

Restart stage is still slightly flakey, but the other one is a new error

dwnusbaum · 2019-05-15T19:37:23Z

Tried reproducing the PipelineNodeTest.nodeWithPartialParallels failure but wasn't able to. I'll try to track down the source of the ATH tests, perhaps they are looking for a specific result or something that has intentionally changed.

dwnusbaum · 2019-05-15T19:42:15Z

All downstream changes have been released. Strictly speaking, the only necessary non-test change in this plugin is the one in NodeRunStatus.java that will change the per-step status displayed to support WarningAction. The change in PipelineNodeGraphVisitor.java described in the PR description could be backed out, but the change makes the usage of StatusAndTiming.computeChunkStatus2 here much easier to test elsewhere and more consistent with no drawbacks as far as I can tell.

vivek

Nice one @dwnusbaum, going to make lots of folks happy 🎉

halkeye · 2019-05-15T21:18:42Z

When I get a few min I'll poke at the test and see if I can figure out what's wrong

dwnusbaum · 2019-05-15T21:25:04Z

Ok, so from the last 3 builds, which don't have any interesting changes between them, it looks like there are some test flakes.

In build 8, the following 2 tests failed:

AllTest.javaTest
RestartStageTest.restartStage

In build 9, 4 tests failed:

AllTest.javaTest
FavoritesCardsTest.testFreestyle
PipelineNodeTest.nodesWithPartialParallels
RestartStageTest.restartStage

In Build 10, 2 tests failed:

AllTest.javaTest
FavoritesCardsTest.testFreestyle

The AllTest.javaTest failure is consistent, but I don't understand what that test is doing. It looks like it just reruns the integration test using mvn test, but as part of the ATH? Or maybe that test is just a wrapper for all of the other tests, so it fails if any ATH test fails?

Based on the screenshot for FavoritesCardTest, it looks like maybe the build completed before the status was detected as running, so the test timed out because the build was already done, but I'm not sure.

halkeye · 2019-05-15T21:44:07Z

yea, All test is just a wrapper to run all the other tests. So it always fails if the others fail. Restart stage and fav's have been flaking for a while, probably a 90% success rate PipelineNodeTest.nodesWithPartialParallels is new to me

Based on the screenshot for FavoritesCardTest, it looks like maybe the

build completed before the status was detected as running, so the test timed out because the build was already done, but I'm not sure. Yep thats about right, we've cleaned it up a lot, but there's still a couple races we've missed.

…

On Wed, May 15, 2019 at 2:25 PM Devin Nusbaum ***@***.***> wrote: Ok, so from the last 3 builds, which don't have any interesting changes between them, it looks like there are some test flakes. In build 8, the following 2 tests failed: - AllTest.javaTest - RestartStageTest.restartStage In build 9, 4 tests failed: - AllTest.javaTest - FavoritesCardsTest.testFreestyle - PipelineNodeTest.nodesWithPartialParallels - RestartStageTest.restartStage In Build 10, 2 tests failed: - AllTest.javaTest - FavoritesCardsTest.testFreestyle The AllTest.javaTest failure is consistent, but I don't understand what that test is doing. It looks like it just reruns the integration test using mvn test, but as part of the ATH? Or maybe that test is just a wrapper for all of the other tests, so it fails if *any* ATH test fails? Based on the screenshot <https://ci.blueocean.io/job/blueocean/job/PR-1954/10/artifact/target/screenshots/io.blueocean.ath.offline.personalization.FavoritesCardsTest_testFreestyle.png> for FavoritesCardTest, it looks like maybe the build completed before the status was detected as running, so the test timed out because the build was already done, but I'm not sure. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1954?email_source=notifications&email_token=AAA24B37SO62JNEASAO7XXTPVR5TDA5CNFSM4HHAJTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVP7W6A#issuecomment-492829560>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA24B7M2JUIMFOKDYG5XQDPVR5TDANCNFSM4HHAJTUA> .

olamy

LGTM good job!

dwnusbaum · 2019-05-29T18:36:11Z

...api-impl/src/main/java/io/jenkins/blueocean/rest/impl/pipeline/PipelineNodeGraphVisitor.java

                status = new NodeRunStatus(
-                    StatusAndTiming.computeChunkStatus2(run, nodeBefore, firstExecuted, lastNode, nodeAfter));
+                    StatusAndTiming.computeChunkStatus2(run, nodeBefore, firstNode, lastNode, nodeAfter));


I thought this change wasn't strictly necessary, but it turns out that it is, and it fixes things like JENKINS-57579. I can't quite tell what firstExecuted is expected to be, and it seems like it will vary based on context, but firstNode is always (even for in-progress builds), going to be the start of a stage step block or parallel branch block, and that is what we expect over in StatusAndTiming.computeChunkStatus2 anyway.

My guess as to the problem is that in some cases firstExecuted ends up being identical to or later than lastNode in the graph, and so setting up the graph traversal here in StatusAndTiming.findWarningsBetween fails and the method returns null.

dwnusbaum · 2019-06-03T17:10:45Z

@halkeye Anything in particular you want me to look into here? It looks like we are just seeing a bunch of different flaky test failures. I looked into the PipelineNodeTest.nodesWithPartialParallels failure since you hadn't seen it before, but I haven't been able to reproduce it, and looking at the test I don't really see how my changes would make it flaky unless it was already flaky.

halkeye · 2019-06-03T17:39:11Z

nope, not right now, I've been poking at trying to fix them (#1962) but they are mostly timing based e2e tests that are failing.

I've moved projects so I only have personal time to commit to this project now.

vivek · 2019-06-05T03:45:44Z

Failing test in latest build#12 has been flakily failing on master as well across multiple build: https://ci.blueocean.io/blue/organizations/jenkins/blueocean/detail/PR-1954/12/tests/. I can't get ATH to run locally to reproduce to see why this test is flaky. In any case, fixing this flaky test is out of scope of this PR.

Given wider interest in this PR, I propose we merge it and address flaky ATH tests separately. @michaelneale @halkeye @kshultzCB

michaelneale · 2019-06-05T03:52:05Z

looks like a good improvement - not sure of impact of dependency changes (and the new pipeline dependency - but likely it is fine). Looks like an improvement of things anyway.

michaelneale · 2019-06-05T03:53:31Z

@dwnusbaum so mostly flakes that fail that you see?

…thPartialParallels for precise waiting

dwnusbaum · 2019-06-05T16:46:14Z

and the new pipeline dependency

pipeline-model-extensions was already a dependency through pipeline-model-definition, it just wasn't declared explicitly in pom.xml.

so mostly flakes that fail that you see?

Yes, as far as I can tell the ATH failures are not consistent. The one non-ATH failure that seems a bit sketchy is PipelineNodeTest.nodesWithPartialParallels, but I haven't been able to reproduce that failure locally. I recreated the test in a live instance based on this PR, and still wasn't able to reproduce it.

I went ahead and pushed 629d57f, which changes that test from guessing how long to wait using Thread.sleep to using the semaphore step which was designed exactly for this purpose for testing. Now, we know that when we make the other calls in the test the Pipeline must have progressed to the point where there are definitely 3 steps in the left branch and 2 steps in the right branch. If we don't get there, then the test will time out, showing us that something has gone wrong earlier. (If we actually cared about the inputs in the test or the visualization, changing them wouldn't work, but as it is they are just used to stop the execution for the test)

dwnusbaum · 2019-06-05T18:20:23Z

Rebuilding due to infra failure:

Could not transfer artifact org.jenkins-ci.plugins:pipeline-graph-analysis:jar:1.10 from/to repo.jenkins-ci.org (http://repo.jenkins-ci.org/public/): Failed to transfer file: http://repo.jenkins-ci.org/public/org/jenkins-ci/plugins/pipeline-graph-analysis/1.10/pipeline-graph-analysis-1.10.jar. Return code is: 502 , ReasonPhrase:Bad Gateway.

dwnusbaum · 2019-06-06T15:10:30Z

Latest test failure is PipelineNodeTest.sequentialParallelStagesLongRun. Not sure what is wrong with that one, but I haven't been able to reproduce it locally, and since it doesn't have anything to do with statuses I don't think it's related to my change. Best guess is some unusual race condition being hit in this loop that runs every 100ms. Closing and reopening for a fresh build.

michaelneale · 2019-06-06T22:22:27Z

woo hoo green !

rodrigc · 2019-07-12T22:57:16Z

@dwnusbaum nice job, and nice blog post: https://jenkins.io/blog/2019/07/05/jenkins-pipeline-stage-result-visualization-improvements/

dwnusbaum added 2 commits April 18, 2019 17:00

[JENKINS-39203] Only show blocks containing a WarningAction as unstable

1c75bf6

[JENKINS-43292] Add tests for parallel branches with failFast

c108556

dwnusbaum commented Apr 18, 2019

View reviewed changes

halkeye reviewed Apr 18, 2019

View reviewed changes

dwnusbaum added 2 commits April 22, 2019 13:01

[JENKINS-39203] Switch to incremental versions of dependencies

853e73a

[JENKINS-39203] Add smoke test for stages with unstable status

4723daa

dwnusbaum mentioned this pull request Apr 22, 2019

[JENKINS-39203] Create new action to mark a FlowNode with a result such as unstable jenkinsci/workflow-api-plugin#91

Merged

jglick reviewed Apr 22, 2019

View reviewed changes

halkeye added bug enhancement labels Apr 23, 2019

dwnusbaum commented Apr 23, 2019

View reviewed changes

halkeye changed the title ~~[JENKINS-39203] Only show blocks containing a WarningAction as unstable~~ JENKINS-39203 - Only show blocks containing a WarningAction as unstable Apr 24, 2019

dwnusbaum added 5 commits April 24, 2019 17:06

[JENKINS-39203] Handle WarningAction when displaying individual steps

7d54632

[JENKINS-39203] Use WarningAction.getResult for per-step results

d3e61e9

Merge branch 'master' into JENKINS-39203

8eeacbf

Remove old baseline from weekly ATH builds

d5861ae

Update parent pom to fix Javadoc crash on JDK 11

86ed9c4

abayer approved these changes May 9, 2019

View reviewed changes

dwnusbaum closed this May 14, 2019

dwnusbaum reopened this May 14, 2019

[JENKINS-39203] Update to released versions of plugins

7ec470b

dwnusbaum marked this pull request as ready for review May 15, 2019 19:37

dwnusbaum requested a review from imeredith as a code owner May 15, 2019 19:37

dwnusbaum requested review from olamy, scherler, sophistifunk and vivek as code owners May 15, 2019 19:37

vivek approved these changes May 15, 2019

View reviewed changes

olamy approved these changes May 15, 2019

View reviewed changes

Merge branch 'master' into JENKINS-39203

6da5350

dwnusbaum commented May 29, 2019

View reviewed changes

Use SemaphoreStep instead of Thread.sleep in PipelineNodeTest.nodesWi…

629d57f

…thPartialParallels for precise waiting

dwnusbaum closed this Jun 5, 2019

dwnusbaum reopened this Jun 5, 2019

dwnusbaum closed this Jun 6, 2019

dwnusbaum reopened this Jun 6, 2019

halkeye merged commit 6696ce5 into jenkinsci:master Jun 6, 2019

dwnusbaum deleted the JENKINS-39203 branch June 7, 2019 14:17

dwnusbaum mentioned this pull request Aug 23, 2019

[JENKINS-58085] Fix issue where stages are shown as waiting to start when they are running/completed #2017

Merged

7 tasks

JENKINS-39203 - Only show blocks containing a WarningAction as unstable #1954

JENKINS-39203 - Only show blocks containing a WarningAction as unstable #1954

Conversation

dwnusbaum commented Apr 18, 2019 • edited Loading

Description

Submitter checklist

Reviewer checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum Apr 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum Apr 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

halkeye commented Apr 22, 2019

halkeye commented Apr 23, 2019

dwnusbaum commented Apr 23, 2019 • edited Loading

halkeye commented Apr 23, 2019

Choose a reason for hiding this comment

dwnusbaum commented May 14, 2019

halkeye commented May 15, 2019

dwnusbaum commented May 15, 2019

dwnusbaum commented May 15, 2019

vivek left a comment

Choose a reason for hiding this comment

halkeye commented May 15, 2019

dwnusbaum commented May 15, 2019

halkeye commented May 15, 2019 via email

olamy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum commented Jun 3, 2019

halkeye commented Jun 3, 2019

vivek commented Jun 5, 2019

michaelneale commented Jun 5, 2019

michaelneale commented Jun 5, 2019

dwnusbaum commented Jun 5, 2019

dwnusbaum commented Jun 5, 2019

dwnusbaum commented Jun 6, 2019 • edited Loading

michaelneale commented Jun 6, 2019

rodrigc commented Jul 12, 2019

dwnusbaum commented Apr 18, 2019 •

edited

Loading

dwnusbaum Apr 22, 2019 •

edited

Loading

dwnusbaum Apr 22, 2019 •

edited

Loading

dwnusbaum commented Apr 23, 2019 •

edited

Loading

dwnusbaum commented Jun 6, 2019 •

edited

Loading