diff --git a/docs/proposal-cylc-set.md b/docs/proposal-cylc-set.md index a42a8fa2..770ac99b 100644 --- a/docs/proposal-cylc-set.md +++ b/docs/proposal-cylc-set.md @@ -1,7 +1,5 @@ # PROPOSAL: A New Command for Manual Task Forcing -(IN DISCUSSION: NOT ACCEPTED YET) - ## Background Users need to be able to intervene and affect the progression of workflows. @@ -12,63 +10,38 @@ outputs: it just spawns downstream children with the corresponding prerequisites set, *as if* the outputs had been completed). -### Task Status Reset is Not Needed - -I am proposing that we do NOT support Cylc 7 style task status reset, e.g. -resetting a `failed` task to `succeeded`. - -Primarily this is for provenance reasons: the run history will be clearer if a -task's status reflects what its last job did, followed by evidence that the user -told the scheduler to carry on *as if* something else had happened. +## Set & Task Status -NOTE we *could* choose to change the task status as well as carry on *as if* -the forced status had been achieved naturally. However, that's not needed to -make the workflow continue in Cylc 8, so we can choose based on clarity of run -history. Do we want (e.g.): +When setting outputs we have the option to change the task status or not. +Do we want (e.g.): - A force-succeeded task that actually failed but is indistinguishable from a naturally succeeded one without looking at run history -- Or a failed task that really failed, which the workflow nevertheless - continued on from because the user said to carry on *as if* it had succeeded. +- Or a failed task which the workflow nevertheless + continued on from because the user said to carry on *as if* it had succeeded + but which is indistinguishable from a failed task on which no intervention + has been performed. -This may be a matter of opinion, but I prefer the latter. +This proposal considers the former option for reasons of visibility, +provenance and consistency with the task-job model. ## Requirements -0. Force **trigger** a target task. - - Make it run now, regardless of prerequisites. - 1. Force set specified **prerequisites** of a target task. - - This contributes to the task's readiness to run. - - It is not equivalent to setting the parent output, unless the task is an only child. - - It is not equivialent to triggering, unless the task has only one - unsatisfied prerequisite. - - DEFAULT: set all prerequisites (this would be equivalent to trigger) + - This contributes to the task's readiness to run. + - It is not equivalent to setting the parent output, unless the task is an only child. + - This will spawn the task if not already in the pool. 2. Force set specified **outputs** of a target task. - - This sets the corresponding prerequisites of child tasks. - - And it contributes to the completion of *incomplete tasks*. - - (It will not spawn a task with the specified outputs completed - that would create an incomplete task) - - Set *implied outputs* as well (see command help below). - - If the `succeeded` or `failed` outputs are set, disable automatic retries. - - DEFAULT: set all required outputs. - -3. Force expire tasks. - - Expire means "we no longer need to run (or rerun) this task". - - Expire can be automatic (clock-expire) or manual. - - Allow waiting tasks to expire without running at all. - - Allow the scheduler to forget incomplete tasks without re-running to complete them. - - Make `cylc remove` obsolete (currently, incomplete tasks have to be - "removed" if not re-run to completion). + - This sets the corresponding prerequisites of child tasks. + - And it contributes to the completion of *incomplete tasks*. + - It will not spawn a task with the specified outputs completed - that would create an incomplete task. + - Set *implied outputs* as well (see command help below). + - If the `succeeded` or `failed` outputs are set, disable automatic retries. + - DEFAULT: set all required outputs. -Expiration is a bigger topic in its own right, due to its connection to task -outputs and automatic triggering. See -[Task Expire Proposal](proposal-task-expire.md) - -## The New Command - -(Proposed) +## CLI ``` $ cylc set [OPTIONS] [TASK(S)] @@ -80,38 +53,76 @@ Setting a prerequisite contributes to the task's readiness to run. Setting an output contributes to the task's completion, and sets the corresponding prerequisites of child tasks. -Setting a future waiting task to expire allows the scheduler to forget it -without running it. WARNING: nothing downstream of the task will run -automatically after this. - -Setting a finished-but-incomplete task to expire allows the scheduler to forget -it without completing its required outputs. WARNING: nothing downstream of -of the incomplete outputs with run automatically after this. - Setting an output also results in any implied outputs being set: - started implies submitted - - succeeded, failed, and custom outputs, imply started - - succeeded implies all required custom outputs - - failed does not imply custom outputs + - succeeded and failed imply started + - custom outputs and expired do not imply any other outputs. + +By default this sets all required outputs for the given task(s). [OPTIONS] - --flow=INT`: flow(s) to attribute spawned tasks. Default: all active flows. - If a task already spawned, merge flows. + --flow=INT: Flow(s) to attribute spawned tasks. Default: all active flows. + If a task already spawned, use current flows. - --pre=PRE: set a prerequisite to satisfied (multiple use allowed). + --meta=DESCRIPTION: description of triggered flow (with --flow=new). - --pre=all: set all prerequisites to satisfied. Equivalent to trigger. + --wait: Wait for merge with current active flows before flowing on. - --out=OUT: set an output (and implied outputs) to completed (multiple use allowed). + --pre=PRE: Set a prerequisite to satisfied e.g. `foo:succeeded` + (multiple use allowed, may be comma separated). - --out=all or [no-options]: set all required outputs. + --pre=all: Set all prerequisites to satisfied. Equivalent to trigger. - `--expire`: allow the scheduler to forget a task without running it, or - without rerunning it if incomplete. + --out=OUT: Set an output e.g. `succeeded` (and implied outputs) to completed + (multiple use allowed, may be comma separated). + --out=required Or if no outputs or prerequisites are specifed: set all required + outputs. ``` +This would replace `cylc set-outputs`. + +### Validation + +Setting a prerequisite or output which does not apply to the task should result +in a warning to ensure typos and simple mistakes are communicated back to the +user. + +To support this work, Cylc custom outputs will need to be validated +to prohibit the `all` and `required` keywords as well as the comma character. +The `_cylc` prefix should also be reserved for any future internal uses as +already done for some other interfaces (e.g. xtriggers). + +Examples: +```ini +[runtime] + [[]] + [[[outputs]]] + # OK + foo = + foo_bar = + foo-bar = + # ERROR + foo,bar = # commas aren't valid in the graph + foo bar = # spaces aren't valid in the graph + all = # all is a keyword + required = # required is a keyword + _cylc = # _cylc is a special prefix +``` + + +### Extension + +When setting prereqs/outputs it would be useful to provide visual confirmation +of the result. + +We could display the `cylc show` output (which can be generated Scheduler side +without the requirement for a second API call). + +This can also help users to discover valid prereqs / outputs in the event of +warnings. + ## QUESTIONS @@ -126,8 +137,11 @@ anyway. Is there a case for unsatisfying the prerequisites of a waiting partially satisfied task that has not run yet? -Use cases aren't obvious, but **we should probably support this for -completeness.** +Use cases aren't obvious, suggest omitting this functionality until a +compelling use case is discovered for the sake of interface/implementation +simplicity. See also `cylc remove` which reduces the pressure on this. + +See: [Remove Proposal](proposal-remove.html) ### Un-completing outputs? @@ -135,93 +149,193 @@ completeness.** There's no point in unsetting a completed output because any downstream action will have already been spawned (on demand). -I think the only exception is in a flow-wait scenario, which is pretty niche. +Use cases aren't obvious, suggest omitting this functionality until a +compelling use case is discovered for the sake of interface/implementation +simplicity. See also `cylc remove` which reduces the pressure on this. -No need to support this? +See: [Remove Proposal](proposal-remove.html) ## Example Use Cases +> Note: In these CLI examples the workflow ID and cycle point have been omitted +> for brevity, so `cylc set //a` is shorthand for +> `cylc set ///a`. -### 1. Forget an incomplete failed task - -Expire it. +### 1. Carry on as if a failed task had succeeded - -### 2. Carry on as if a failed task had succeeded +`foo => post` E.g. `foo` failed and is incomplete, but I fixed some files externally so the workflow can carry on as if it had succeeded. +``` +$ cylc set //foo --out=succeeded +# Or simply: +$ cylc set //foo +``` + Set the task's `succeeded` output, to: - spawn children of that output, with the corresponding prerequisite set - complete the task, so scheduler can forget it The database will record: -- `foo` state `failed`, and with the `failed` output -- then `foo:succeeded` set by manual forcing +- foo: state=succeeded, outputs={submitted,started,failed,succeeded} -### 3. Trigger a new flow downstream of target task +### 2. Set off-flow prerequisites, to prep for a new flow -`foo => post` +If a flow is not entirely self-contained, you can prime the appropriate tasks +before triggering the flow, by either (whichever is more convenient): +- setting their off-flow prerequisites, or +- setting the parent outputs of the off-flow prerequisites -If we don't want to run `foo` again, setting `foo:succeed` (for a new flow) is -more convenient than triggering all the downstream children individually. +E.G. if we wanted to start a new flow at task `a`: -This will spawn all `post` at once, with prerequisite `foo:succeed` satisfied. +``` +# the tasks we want the flow to run +a => b => c +# the off-flow prerequisites +a_cold => a +b_cold => b +c_cold => c +``` +Then we could either set the prerequisites of the downstream tasks: -### 4. Set off-flow prerequisites, to prep for a new flow +``` +# set off-flow prerequisites and trigger a new flow on a +$ cylc set //a //b //c --flow=new --pre=a_cold:succeeded,b_cold:succeeded,c_cold:succeeded +WARNING //a has no prerequisite b_cold:succeeded +WARNING //a has no prerequisite c_cold:succeeded +WARNING //b has no prerequisite a_cold:succeeded +WARNING //b has no prerequisite c_cold:succeeded +WARNING //c has no prerequisite a_cold:succeeded +WARNING //c has no prerequisite b_cold:succeeded +``` -If a flow is not entirely self-contained, you can prime the appropriate tasks -before triggering the flow, by either (whichever is more convenient): -- setting their off-flow prerequisites, or -- setting the parent outputs of the off-flow prerequisites +Or the outputs of the upstream tasks: + +``` +# set off-flow outputs and trigger a new flow on a +$ cylc set //a_cold //b_cold //c_cold --flow=new [--out=succeeded] +``` + +Note, in both cases, using `cylc set` will spawn task `a` with all +prerequisites satisfied so a subsequent trigger of task a is not required. (Note this is more than a convenient alternative to `cylc trigger` because the flow needs to progress according to the graph edges, and we don't want to trigger the off-low parent tasks just to provide the off-flow prerequisites). -### 5. Skip to next cycle after externally-caused failures +### 3. Skip to next cycle after externally-caused failures + +> Not necessarily a `cylc set` use case, but included for completeness. Many tasks failed due to (say) external disk error. I want to start over at the next cycle point rather than re-run the failed tasks. -- Trigger a new flow from next cycle (using `cylc trigger` or `cylc set`) -- Expire all the incomplete failed tasks so the scheduler can forget them +Either skip all tasks in the cycle and complete any incomplete tasks: + +See: [Skip Mode Proposal](proposal-skip-mode.html) +``` +# configure all tasks in the cycle to skip +$ cylc broacast -n '*' -p -s 'run mode = skip' +# handle any incomplete tasks by setting their required outputs +$ cylc set //* --out=succeeded +``` -### 6.?? Set jobs to failed when a job platform is known to be down +Or, with [cylc-flow#5416](https://github.com/cylc/cylc-flow/issues/5416) +trigger all start tasks in the next flow: -I don't think this case is valid. (Unless I've misunderstood the requirement?). +``` +# trigger all start tasks in the next cycle +$ cylc trigger --start-cycle= # syntax not yet decided +# remove all n=0 tasks in the cycle +$ cylc remove ///* +``` -If jobs were submitted already they will have already failed naturally. -Otherwise, you can prevent job submission by holding the tasks, or expiring -them if necessary. +### 4. Set jobs to failed when a job platform is known to be down -### 7.?? Set switch tasks at an optional branch point, to direct the future flow +This case describes the scenario where a job has successfully submitted (or +even started) on a remote platform which subsequently becomes un-contactable +leaving us with a job "stuck" in the submitted/running state. We cannot poll +or kill these tasks, in order to allow the task to be retried, we must orphan +the broken submission. -I'm not sure this is valid either. Why would we need to do this? +``` +$ cylc set --out=failed +``` + +The database will record: +- foo: state=failed, outputs={submitted,started,failed} + + +### 5. Set switch tasks at an optional branch point, to direct the future flow + +Graph branching can occur automatically via optional graph outputs. +Users may need to manually control this logic by setting outputs manually E.G: + +``` +foo:a? => a => ... +foo:b? => b => ... +``` ``` -foo:x? => bar # take this branch regardless of x or y at run time? -foo:y? => baz +$ cylc set //foo --out=a,succeeded --wait +``` + +The database will record: +- foo: state=waiting, outputs={a} + +Notes: +* A completion output (`succeeded` in this case) must be specified if you + don't want the task to be re-run by an approaching flow. + The task may have a completion condition like + `completion = succeeded or (failed and a)` so we don't attempt to guess the + completion status for the user. They may also have legitimate use cases for + not specifying a completion output. +* The `--wait` is necessary if you don't want the workflow to flow on from this + task until the flow washes over it as in this case. + -# or: +### 6. Expire a task -foo:fail? => bar # take this branch regardless of succeed or fail at run time? -foo? => baz +Clock-expires allow tasks to be automatically expired, however, users may wish +to action this logic manually e.g. for testing. + +This is no different to setting any other output, task completion is evaluated +as normal. + +``` +$ cylc set // --out=expired ``` -Should `foo` still run when the flow reaches it, but only trigger the right -branch regardless of what outputs it completes? That amounts to "the graph is -wrong"! +The database will record: +- : state=expired, outputs={expired} + -Or do we want to pretend that `foo` has run already and generated the chosen -output, but don't flow on from there until the flow catches up? +### 7. Spawning parentless tasks -In that case: `cylc broadcast` to change the `script` to simply send the chosen -output and exit. +Parentless tasks cannot currently be spawned because there are no upstream +outputs which can spawn them. + +See [cylc-flow#5572](https://github.com/cylc/cylc-flow/issues/5572). + +If `--pre=all` is used but there are no prerequisites to satisfy, then +`cylc set` should still spawn the task as it would have if there had been +prerequisites to set. + +E.G: + +``` +@clock => foo => bar +``` + +``` +# spawn task foo but don't satisfy its xtriggers +$ cylc set --pre=all //foo +``` diff --git a/docs/proposal-interventions.md b/docs/proposal-interventions.md new file mode 100644 index 00000000..83b7a301 --- /dev/null +++ b/docs/proposal-interventions.md @@ -0,0 +1,273 @@ +# Intervention Feature Gaps + +At present a few Cylc 7 use cases are not currently possible at Cylc 8 or are +technically possible, but difficult to action, or have little to no feedback +via the user interfaces when actioned. + +We have been addressing these limitations one at a time so far. However, since +the remaining use cases/interventions overlap quite a bit it makes more sense +to consider them in one go to ensure we have all bases covered, as changes to +one proposal may mandate changes to another. + +Most of the functionality issues which remain are around the `cylc reset` +command which provided a powerful and surprisingly intuitive interface for +performing a much broader range of interventions than it was initially designed +for including: + +* Re-running tasks. +* Expiring tasks. +* Testing / development. +* Reverting unwanted changes. +* Skipping sections of a workflow. +* Orphaning jobs. +* Etc. + + +## Proposals + +Here are the new proposals which will be referenced in the use-cases below. + +* [1] [Set](proposal-cylc-set.html) +* [2] [Optional Outputs](proposal-optional-output-extension.html) +* [3] [Remove](proposal-remove.html) +* [4] [Skip](proposal-skip-mode.html) + + +## Use Cases Considered + +Here follows a list of use cases which are either not currently possible, or +awkward to action in Cylc 8. + +If approved, this list of interventions should padded out with the more +straight-forward cases and turned into a new section of the documentation. The +proposals also open up new use cases which are not covered here but should also +be documented. + + +### 1. I triggered tasks I didn't mean to, they may have run on. I want to undo this. + +**Description:** + +* User messed up a glob and ran things they didn't mean to. +* Or triggered a family including things they didn't expect it to. +* Or the graph followed a pathway they didn't want it to. +* Etc. +* They now want to rollback those changes to recover. + +**Cylc 7 Intervention:** + +* Hold the workflow (suppress downstream consequences until ready). +* Kill any unwanted active tasks. +* Reset unwanted tasks to waiting. +* Resume the workflow. + +**Proposed Intervention:** + +* [3] Remove any unwanted tasks. + * `cylc remove ` + +This will kill any active tasks, remove any waiting tasks from the pool +and strip the specified flow numbers from any tasks, outputs and corresponding +prerequisites of the selected tasks. + +As the new intervention only requires one action it is not necessary to +pause/resume the workflow although that may still be desired. + + +### 2. I want to Skip a task and allow the workflow to continue. + +**Description:** + +* A waiting task isn't needed for some reason, continue as if it succeeded. +* Or a completed task has not produced its required outputs. + +**Cylc 7 Intervention:** + +* Reset task to succeeded. + +**Proposed Intervention:** + +* [1] Set succeeded task output. + * `cylc set [--out=all]` + + +### 3. I want to Skip a cycle of tasks and allow the workflow to continue. + +**Description:** + +* A more advanced form of the previous example, but where we are skipping + multiple tasks which are unlikely to all be in the pool so cannot be + selected with a simple glob. +* Sometimes users want to skip a cycle of tasks and continue as if they had + run and succeeded. +* Note the user might only want to skip selected families within + the cycle which suffers from the same globing problem at Cylc 8 + with skipping cycles as family selectors also only match tasks in the pool. + +**Cylc 7 Intervention:** + +* Reset `*.` to succeeded. +* OR Reset `*.` to expired and manually trigger downstream tasks as + required. + +**Proposed Intervention:** + +* [4] Configure the relevant tasks/families/cycles to skip when they are run. + The user can configure what outputs are generated when a task skips as + desired. + * `cylc broadcast -p -n -s 'run mode = skip'` + + +### 4. I need to orphan a "stuck" job submission + +**Description:** + +* A job submission has got stuck due to a batch system failure or network + issue. +* I need Cylc to forget about this submission. +* I may want Cylc to re-submit this task to another platform, or I may + want to allow graph branching to handle the failure. + +**Cylc 7 Intervention:** + +* Reset task to waiting/succeeded/failed/submit-failed. + +**Proposed Intervention:** + +* Either, [1] set a completed output on the task. + * `cylc set --out=failed` +* Or, [3] remove the task. + * `cylc remove ` + + +### 5. I need to terminate a chain of automatic retries. + +**Description:** + +* My task has automatic retries configured. +* But there is a systematic problem with my task / platform which means the + retries cannot succeed. +* I want to terminate the retry sequence to allow graph branching to handle + the failure OR to avoid waiting compute / confusing operators whilst the + error is fixed. + +**Cylc 7 Intervention:** + +* Reset task to waiting/failed/submit-failed + +**Proposed Intervention:** + +* [1] Set the failed/submit-failed task output. + This cancels the retry chain as per the cylc-set proposal. + * `cylc set --out=failed` + + +### 6. Set outputs on switch tasks ahead of the flow. + +**Description:** + +* I have a task which controls graph branching, say through the use of optional + outputs. +* I need to override the automated behaviour and take control of this to + respond to external issues or for testing purposes. + +**Cylc 7 Intervention:** + +* Re-write the task's script using `cylc broadcast`. + +This use case was not especially prevalent at Cylc 7 due to the limitations +of suicide triggers but will likely become more common at Cylc 8 as switches +currently embedded in task logic are elevated to the workflow level for better +visibility/control/provenance/graph-interaction. + +**Proposed intervention:** + +* [1] Set the desired outputs on the task and a completion output to prevent + the task from being rerun by the approaching flow. + * `cylc set --out=,succeeded --wait` + + +### 7. Spawn a parentless task. + +**Description:** + +* I need to spawn a parentless task (which may have xtriggers which I don't + want to skip). +* E.G. as part of triggering a new flow. + +``` +@clock => a => b +``` + +**Cylc 7 Intervention:** + +* Insert the task. + +**Proposed Intervention:** + +* [1] Set all prerequisites of the task to spawn it. + * `cylc set --pre=all` + +The `cylc set` command sets prereqs/outputs and implicitly spawns tasks when +they are not yet present. In this case we want the task to spawn but have no +prerequisites to set. + + +### 8. I need my graph to branch on an expiry event. + +**Description:** + +* I have mitigations to perform if the graph falls behind schedule. + +**Cylc 7 Intervention:** + +* Configure expiry for an appropriate task/family/cycle. +* Use suicide triggers on the `expired` output to action graph changes e.g. + via suicide triggers. + +**Proposed Intervention:** + +* Configure expiry for a parentless task (to avoid late event detection). +* [2] Use optional outputs logic to perform graph branching. + +This would need to come with the documented advice to use a single +head-of-cycle task to detect expiry, then use graph branching to determine +which tasks are run. This is a change for several Cylc 7 workflows which +configure families or entire cycles of tasks to expire, as this approach will +no longer work at Cylc 8, under this, or the previous proposal. However, having +gone through the use cases, it would appear these cases could all be converted +to use the new approach. + + +### 9. I want to run a cycle of tasks ahead or behind of the flow. + +**Description:** + +* I want to re-run a historical cycle. +* I want to run a future cycle. + +**Cylc 7 Intervention:** + +* Insert the cycle of tasks. +* And or reset to waiting if the selection intersects with the pool. + +**Proposed Intervention:** + +In situations where users can easily list the start tasks of a cycle, I can +trigger a new flow starting from these tasks. However, this is error prone and +many users do not understand the graph structure of the workflow they are +working on well enough. The list of start tasks may differ depending on the +cycle and could be too long to type out. + +* Trigger a new flow at the specified cycle with start tasks automatically + determined using cold-start logic. + +This functionality is discussed/tracked in +[cylc-flow#5416](https://github.com/cylc/cylc-flow/issues/5416), however, +is sufficiently orthogonal to be omitted from this proposal pending +future discussion :) + +Open questions (to be resolved on #5416): +* What should the syntax be? +* Which commands should support this, `play`, `trigger`, `set`? +* [How] should the stop cycle point be configured in combination? diff --git a/docs/proposal-optional-output-extension.md b/docs/proposal-optional-output-extension.md new file mode 100644 index 00000000..194a9456 --- /dev/null +++ b/docs/proposal-optional-output-extension.md @@ -0,0 +1,542 @@ +# PROPOSAL: Optional Output Extension + +A proposal which extends the +[optional outputs model](https://cylc.github.io/cylc-admin/proposal-new-output-syntax.html) +to solve existing limitations and support the expired state by providing more +information to Cylc about the user's intentions for graph branching. + +Supersedes: +* [cylc-admin/task expire proposal](https://github.com/cylc/cylc-admin/blob/fc564ddb26c1476dd051b059ca9a31829a20bf30/docs/proposal-task-expire.md) +* [cylc-flow#5423](https://github.com/cylc/cylc-flow/issues/5423) +* [cylc-flow#3294](https://github.com/cylc/cylc-flow/issues/3294) + + +## Background: Task Expiry + +There are currently three problems with expiry (as of cylc-flow 8.1.4): + +1. **Expire triggers are a [bit broken](https://github.com/cylc/cylc-flow/issues/5364)** + + This is a bug which can be fixed without fundamental change to the expiry + implementation so can be ignored for the purpose of this proposal. + +2. **Expiry triggers don't fit in with [optional outputs](https://github.com/cylc/cylc-flow/issues/5361)** + + Under the existing logic, if expiry is optional, then success must be too which means + that failure would go uncaught making expiry unsafe. + + There are fundamentally two solutions to this problem. + + 1. Move expiry to a separate model. + * The existing + [proposal-task-expire](https://cylc.github.io/cylc-admin/proposal-task-expire.html). + follows this approach. + 2. Modify optional output logic to allow expiry to fit. + * The approach this proposal follows. + +3. **Expiry events can be detected late, or even missed due to the task pool implementation.** + + This is due to the coupling of expiry logic into the task pool. + + This proposal does not address the late event problem and recommends this + should be a documented limitation. + + +## Background: Optional Outputs + +The SoD model, initially had implicit graph branching. This was a neat feature, however, +meant that non-success cases were not properly caught so was not suitable behaviour for +real-world use. + +The problem was that Cylc had not been provided with the required information to tell +the difference between and "expected" or "permitted" outcome and a "problem" outcome. + +Consequently, optional outputs were introduced in order to provide Cylc with enough +information to resolve these situations and stall, where appropriate, or continue/stop +otherwise. This turned implicit branching explicit providing the runtime safety +required for production use. + +Unfortunately, the optional outputs system doesn't have enough information at +present to adequately handle expiry as an optional output as `:expire?` would +imply `:succeeded?` and `:failed?`. This is a limitation of the optional output +mechanism which this proposal aims to address by providing sufficient +information to the scheduler to allow it to differentiate between "permitted" +and "problem" outcomes. + + +## Proposal + +1. Expiry should remain implemented as a task output. + + If a task is clock-expired OR if `expired` output is set manually, then the + task status should change to expired as per + [cylc set proposal](https://cylc.github.io/cylc-admin/proposal-new-output-syntax.html). + +2. The *default* condition for task completion condition should be: + + > If optional outputs are defined, at least one must be generated. + + Where outputs are considered "optional" according to their declaration in + the graph and `completion` expression if defined. + + This is a breaking change for some examples where failure is permitted, but + not handled in the graph (`a? => b`). These cases are relatively rare, can + be caught by validation and are easy to fix. + +3. The completion condition should be a configurable expression. + + The completion condition can be set to tolerate cases not handled in the graph or express + more complex completion criterion. + + The expression should be pure Python, evaluated in a restricted context which permits only + `and` & `or` operators, but not: + + * The `not` operator, because we cannot logically support negation or + exclusive `or`'s in Cylc [1]. + * Imports, function calls, etc. + + ``` + # OK + completion = succeeded or failed + completion = succeeded and (x or y or z) + + # ERROR + completion = not failed + completion = (x and not(y or z)) or (y and not(x or z)) or (z and not(x or y)) + completion = import os + ``` + + Expression to be validated (i.e. parsed) at validate time. + + If the completion expression is defined, any outputs which are optional in the expression + should be marked as optional in the graph if used in the graph: + + ``` + completion(a) = succeeded and (x or y or z) + + # OK + a:x? => x + a:y? => y + a:z? => z + x | y | z => b + + # ERROR + a? => w # ":succeeded" must be required + a:x? => x + a:y? => y + a:z? => z + x | y | z => b + + # ERROR + a:x => x # ":x" must be optional + a:y => y # ":y" must be optional + a:z => z # ":z" must be optional + x | y | z => b + ``` + + Note that the validator can use the following logic to determine which outputs are optional + in a given expression: + + ```python + { # output: is_optional + output: restricted_eval(condition, {o: o != output for o in outputs}) + for output in outputs + } + ``` + +4. Succeeded, failed and expired are three orthogonal completion outcomes. + + * `:expired` must always be optional because that's the nature of clock-expiry. + * If `:succeeded` or `:failed` are referenced in the graph where expiry + is present, then they must be optional too. + + Examples: + + ``` + # OK + a:succeeded => x + + # OK + a:succeeded? => x + + # OK + a:succeeded? => x + a:failed? => y + + # OK + a:succeeded? => x + a:expired? => y + + # OK + a:succeeded? => x + a:failed? => y + a:expired? = z + + # ERROR + a:succeeded => x # :succeeded must be optional + a:expired? = z + + # ERROR + a:failed? => x + a:expired = z # :expired must be optional + + # ERROR + a:succeeded? => x + a:failed? => y + a:expired => z # :expired must be optional + ``` + +5. Clock-expire should infer `expired?`: + + The `:expired?` output exists if: + + * Explicitly referenced in the graph e.g. `a:expired? => x`. + * Or, if clock-expiry is configured for the task. + + Wherever expiry is configured or handled, graph branching can happen + as the `:succeeded` and `:failed` outputs might not happen. + + So for this example: + + ``` + clock-expire = a + a? => b + ``` + + The task "a" has two optional outputs: + + * succeeded + * expired + + And the default completion expression is: + + ``` + completion(a) = succeeded or expired + ``` + + Examples: + + ``` + # OK + clock-expire = a + a? => x + + # OK + clock-expire = a + a? => x + a:expire? => y + + # ERROR + clock-expire = a + a => x # :succeeded must be optional + + # ERROR + clock-expire = a + a:failed => x # :failed must be optional + ``` + +6. Expiry should be considered for tasks with partially satisfied prerequisites. + + Expiry should be computed for tasks in both the "main" and "hidden" pools + (rather than just the "main" pool as at present) to prevent workflows from + stalling on tasks with partially satisfied prerequisites where expiry is + configured. + +7. Late delivery of expiry events to be a documented limitation. + + Cases where expiry is not a parentless task e.g: + + ``` + a[-P1D]? | a[-P1D]:expire? => a? + + a:succeed? => x + a:expire? => y # this cannot happen until a[-P1D] has succeeded/expired + ``` + + Could receive expire events late due to the previous cycle running behind + schedule. + + To handle this, these cases will need to be adaped so that a parentless task + is a dependency of the task to be expired e.g: + + ```diff + + dummy => a + a[-P1D]? | a[-P1D]:expire? => a? + + a:succeed? => x + a:expire? => y # this cannot happen until a[-P1D] has succeeded/expired + ``` + + In this example, the task `dummy` will run and succeed as soon as it enters + the runahead window which will cause `a` to be spawned with partially + satisfied prerequisites at which point it will be considered for expiry + events according to (7). + +8. Manually triggering a task should not cause an expiry event. + + If a task has expiry configured and a user force-triggers it, it should not + expire in response to this trigger. + + This is special to to the trigger command (which infers run), manually + setting prerequisites on a task should not disable expiry. + + +## Examples + +### The "xyz" problem + +``` +a:x? => x +a:y? => y +a:z? => z +x | y | z => b +``` + +Optional outputs for task "a": + +* x +* y +* z + +``` +completion(a) = succeeded and (x or y or z) +``` + +### Expire Branch + +``` +clock-expire = a + +a:succeed? => x +a:expired? => y +x | y => z +``` + +Optional outputs for task "a": + +* succeeded +* expired + +``` +completion(a) = succeeded or expired +``` + +### Expire Halt + +``` +clock-expire = a + +a:succeeded? => x +``` + +Optional outputs for task "a": + +* succeeded +* expired (inferred by `clock-expire = a`) + +``` +completion(a) = succeeded or expired +``` + +Early halt in the event of expiry now works the same as with optional outputs. +Early halt in this example is now explicit in that it follows an optional output, making it clear +that the `x` branch might not happen and that expiry must be handled if early halt is not desirable. + +### Output Groups + +A contrived example demonstrating how more complex completion conditions can be implemented. + +In this example, we don't consider the task complete unless one pair of outputs is generated: + +``` +a:w? => w => w1 => ... +a:x? => x => x1 => ... +a:y? => y => y1 => ... +a:z? => z => z1 => ... +... +# with some conditional join later on along the lines of: +(w9 & x9) | (y9 & z9) => end + +[a] + completion = succeeded and ((w and x) or (y and z)) +``` + +Optional outputs for task "a" (defined in completion expression): + +* w +* x +* y +* z + +### Flaky Pipe + + + +``` +a? => b? => c? + +[a, b, c] +# There is only one optional output here because a:failed? has not +# been handled in the graph. +# So we must permit failure in the completion expression to permit failure. +completion = succeeded or failed +``` + +Optional outputs for task "a" (defined in completion expression): + +* succeeded +* failed + + + +### Recovery Task + +``` +a? | recover => b +a:failed? => recover +``` + +Optional outputs for task "a": + +* succeeded +* failed + +``` +completion(a) = succeeded or failed +``` + +### Error Outputs 1 + +A safer version of the recovery task example where a specific error case is caught and turned into +a custom output in order to avoid random errors (e.g. syntax errors) from triggering recovery logic: + +``` +a? | recover => b +a:error_x? => recover + +[a] + script = """ + this + that + if [[ $? == 42 ]]; then + cylc message -- x + fi + tother + """ + [outputs] + error_x = x +``` + +Optional outputs for task "a": + +* succeeded +* error_x + +``` +completion(a) = succeeded or error_x +``` + +### Error Outputs 2 + +An example where multiple failure cases have been assigned different outputs, not all of which +are used in the graph. + +``` +# break the chain if a fails with an expected error but stall otherwise +a? => b + +[a] + script = """ + ... + """ + completion = succeeded or (failed and (error_x or error_y or error_z)) + [outputs] + error_x = ... + error_y = ... + error_z = ... +``` + +Optional outputs for task "a": + +* succeeded +* failed +* error_x +* error_y +* error_z + +### Complex Conditional + +Contrived example to demonstrate how brining expiry into the optional output system allows it to +be used in combination with other triggers: + +``` +b? +c? +(a? & b:expired?) | (a:expired & c:failed?) => x +``` + +``` +completion(a) = succeeded or expired +completion(b) = succeeded or expired +completion(c) = succeeded or failed +``` + + +## Preference To The Expire Proposal + +This proposal replaces the +[task expire proposal](https://github.com/cylc/cylc-admin/blob/fc564ddb26c1476dd051b059ca9a31829a20bf30/docs/proposal-task-expire.md) +which recommended converting expiry from an output to a task attribute in order +to work around the issue with `:expired?` inferring `:succeeded?`. + +This proposal has technical merit as expiry might be better thought of as a +sort of inverse prerequisite, however, there are drawbacks with this approach. + +Arguments in favor of keeping expiry as a task state: + +1. The "expired" task state records the expiry event/intervention in a way which is intuitive to + users and visible with existing tools. + Review, GUI and Tui already work with task states (incl expired) and provide tools for + filtering tasks by state. +2. Expired tasks are not instantly removed so remain in the n-window and can be manually triggered + if necessary. +3. The "expired" state can be used in inter-workflow triggers which may be necessary where downstream + workflows depend on a real-time workflow with catch-up logic. +4. `:expire` triggers are used in the graph making them work the same as task outputs. + Implementing them differently to task outputs but representing them the same is illogical. + Xtriggers might be considered a workaround, however, they + cannot be used with optional outputs or conditional triggers so would not be able to replace + `:expire` triggers for all cases meaning that task expiry will continue to be represented as + a task state in the graph. +5. Other than implementation "correctness", there doesn't seem to be any practical advantage to + users in changing the implementation of expiry from task state to attribute from a + behaviour perspective. +6. At the moment expiry fits into the task state model in a way which is fairly intuitive. Changing + this means that expiry will be bespoke edge-case which user's will have to learn. The ideal + solution would be to unify expiry with the existing model rather than creating a new one. +7. Expiry, success and failure are fundamentally orthogonal outcomes. Implementing expiry via + a different model causes this relationship to break down. A consequence of this is that + the graph branching which results from expiry becomes implicit rather than explicit. + This has safety implications. + + +## Footnotes + +[1] If a task is run and yields the output `x`, but is then manually re-triggered and yields the + output `y`, then both the `x` and `y` outputs are completed on the task. Therefore Cylc cannot + enforce mutually exclusive outputs e.g. `x xor y` so we should not attempt to support this + in the `completion` expression. diff --git a/docs/proposal-remove.md b/docs/proposal-remove.md new file mode 100644 index 00000000..114c5f9b --- /dev/null +++ b/docs/proposal-remove.md @@ -0,0 +1,238 @@ +# PROPOSAL: Cylc Remove Extension + +The `cylc remove` command currently only removes tasks which are in the pool +but not active. Otherwise it logs a warning and does nothing. + +This proposal extends remove-like functionality to cover other cases and +better address the remaining use cases. + +> By remove-like, I mean things currently covered by several related terms / +> ideas which all pertain to erasing the past, present or future of a specific +> task instance. The resulting functionality does not *need* to be provided in +> a single command or interface or use the same terminology, however, it is +> helpful to consider all cases together even if they subsequently go their +> separate ways in implementation. + + +## Background + +There are some cases where users want to wipe historical data, e.g: + +> I triggered some tasks I did not mean to, I need to undo this. + +The Cylc 7 intervention was: + +* Hold the workflow. +* Kill unwanted active tasks. +* Reset unwanted tasks to waiting. +* Resume the workflow. + +The users intention is to erase tasks from the workflows history in order +to allow it to continue as if they hadn't run. The Cylc 7 reset capability +combined with the task pool model appears to have been a remarkably +intuitive model for users in these situations. + +At present (8.1.4) the Cylc 8 intervention is: + +* Pause the workflow. +* Kill unwanted active tasks. +* Wait for wanted active tasks to complete. +* Start a new flow by setting outputs on all of the tasks you want the workflow + to run on from. +* Stop the original flow. +* Play the workflow. + +Or another combination of flow, stop, trigger and set-outputs to the same effect. + +This is far too challenging and may require a level of knowledge of graph +structure that the operator is not partial to. + +There are also potentially some use cases which overlap with interventions +where `--flow=new` might be used, but would be cumbersome due to the +trouble of tracking down outputs/prerequisites and starting/stopping +flows. + +This proposal considers alternatives to `--flow=new` in order to reduce the +intervention to: +* "Forget" unwanted tasks. + +The need for pausing/resuming, originating from the need to suppress spawning, +alleviated. + + +## Removing Tasks In Different N-Window Cases + +There are four cases to consider for remove-like functionality. + +### 1. Historical tasks (n<0). + +**Proposed removal behaviour:** + +Remove the requested flow(s) from: +* The task (DB); +* its outputs (DB); +* and any corresponding prerequisites this task satisfied on waiting tasks in + the pool (and DB), removing them if these outputs were the only ones to be + satisfied for the task. + +**Notes:** + +* This is the closest we can get to "erasing" the task from the workflow's + history. +* Anything downstream which has already run will also have to be "forgotten" if + that is the intention. + +**Visibility:** + +* In the GUI the user will still see the task, but in a different flow + (when flows are supported in the GUI). + +### 2. Active tasks {preparing,submitted,running} (n=0). + +**Proposed removal behaviour:** + +Remove the requested flow(s) from: +* The task (DB); +* its outputs (DB); +* and any corresponding prerequisites this task satisfied on waiting tasks in + the pool (and DB), removing them if these outputs were the only ones to be + satisfied for the task. + +If removing the specified flow(s) would leave the task in no flow +(i.e. `flow=None`), then we also send the kill command to the running job but +don't track its outcome. Then reset the job to failed/submit-failed as +appropriate to orphan the submission and remove the task (i.e. complete it, +bypassing required output checking). The failed/submit-failed output should +not be yielded into the pool. + +This avoids the issue of having un-tracked active jobs in the DB (which could +appear in the GUI and confuse users) as well as preventing resource contention +issues if the task is re-run and bypassing any possible flow-merge conditions. + +**Visibility:** + +In the GUI the user will still see the task in the [submit-]failed state with +a [submit-]failed job, but in a different flow (when flows are supported in the +GUI). + +### 3. Waiting tasks which are ready to run (n=0). + +**Proposed removal behaviour:** + +Remove the task from the pool. + +Insert a waiting no-flow task into the task history in the DB. + +**Notes:** + +- If we just remove a task from the pool, then the GUI will still show it + because the n>0 window is generated from the workflow definition not the + task pool! We will also have no record of the removal. +- The only way to impact the task's representation in the GUI is to put + something there for it to display. The data store will subsequently load this + in along with its flow numbers. + +**Visibility:** + +In the GUI the user will still see the task, but in a different flow +(when flows are supported in the GUI). + +### 4. Future tasks (n>0). + +It is proposed that remove functionality shall not be extended to future tasks. + +Instead another proposal will handle this case with the addition of a new task +run mode called "skip" which can be configured via `cylc broadcast`. + + +## Proposal + +1. The `cylc remove` command is to be extended to cover cases 1 and 2 with + its behaviour modified for case 3. +2. When outputs are removed, any corresponding prerequisites on downstream + tasks in the pool shall be unset providing they were "naturally satisfied". + + This means prerequisites which have been satisfied by `cylc trigger` or + `cylc set` will not be unset as the result of `cylc remove` as these + have been manually satisfied by separate interventions. +3. If removing a task causes all of the prerequisites on a downstream task to be + unset (i.e. if the downstream task was spawned as a result of outputs from + this task alone) then the downstream task shall be removed from the pool. +4. This command shall not be extended to cover the selective un-setting of + specific prerequisites or outputs. +5. When all flows are removed from active tasks, Cylc should first attempt to + kill the job as it serves no further purpose and may cause resource + contention issues if left running, however, failure of the kill command + should be tolerated, though logged. + The state should be reset to "failed" (for running tasks) + or "submit-failed" (for submitted tasks) in order to orphan the submission + if the kill failed, but also prevent the problem of active tasks appearing + in the workflow history despite no longer being tracked/managed by Cylc and + to allow the task to be re-run as part of a subsequent flow-front as needed. + + * The task should then be removed from the pool as completed. + * The failed/submit-failed output should not spawn any downstream tasks. + * The failed/submit-failed event handlers should not be run. +6. The `cylc remove` command shall accept the `--flow` option: + - if specified, only the listed flow numbers should be removed. + - If not specified, and the task being removed is in the pool, then the + flow numbers from the task proxy should be used. + - If not specified and the task being removed is not in the pool, + the standard `all` default should apply. +7. To clarify removal operations as well as flow logic in general, visual + filtering of flows shall be supported in `cylc gui` and `cylc tui` + as already planned - https://github.com/cylc/cylc-ui/issues/470. + +## CLI + +``` +cylc remove [OPTIONS] + +Erase a task from the workflow's history. + +This removes the specified flow numbers from a task and any outputs it +generated. The task will still exist, however, not in the specified flows so +will not influence workflow evolution in those flows. + +Removing a task will also unset any prerequsites it satisfied on downstream +waiting tasks. + +If all flows are removed from a task, it and its outputs will be left in the +`None` flow. This preserves a record that the task ran, but it will not +influence active flows in any way. + +If the task is active (i.e. preparing, submitted or running), then, Cylc +will attempt to kill the task and will change its status to either failed, if +it is running, or submit failed, if it is not. + +Waiting tasks which have one or more of their prerequsities satisfied can be +removed. + +Waiting tasks which are ahead of the flow (i.e. which do not have any of their +prerequsites satisfied) cannot be removed because they do not exist yet. You +can, however, configure them to "skip" by chaing the "run-mode" e.g: + $ cylc broadcast -p -n -s 'run mode = skip' + +Or you could remove them from the workflow definition and reinstall + reload. + +Examples: + # remove a task which has already run + # (any tasks downstream of this task which have already run or are currently + # running will be left alone The task and its outputs will be left in the + # None flow) + $ cylc remove + + # remove a task which is running + # (Cylc will attempt to kill the task and will set the status to failed or + # submit-failed. The task and its outputs will be left in the None flow) + $ cylc remove + + # remove a task from a specified flow + # (the task may remain in other flows) + $ cylc remove --flow=1 + + +[OPTIONS] + + --flow=all The flow(s) to remove this task from. +``` diff --git a/docs/proposal-skip-mode.md b/docs/proposal-skip-mode.md new file mode 100644 index 00000000..5838a665 --- /dev/null +++ b/docs/proposal-skip-mode.md @@ -0,0 +1,101 @@ +# PROPOSAL: Skip Run-Mode For Tasks + +A proposal for a configuration to "skip" tasks in order to resolve the issue +of skipping families or cycles of tasks which are not yet present in the pool +or which have non-uniform required outputs by utilising broadcast task +matching. + +This is effectively an automated solution to calling `cylc set --output=skip` +which allows tasks to be "selected" according to broadcast rules opening +up a wider range of use cases. + +## Background + +Sometimes users may wish to "skip" one or more tasks in order to short +circuit a graph. + +With Cylc 7 they would do something like: + +```bash +# skip a family +cylc reset '.*' -s succeeded +# skip a cycle +cylc reset '*.' -s succeeded +``` + +However, at Cylc 8 tasks cannot be "selected" by globs in this way as globs +apply only to the pool and Cylc 7 style reset is not permitted. + +> **Note:** Resetting to "waiting" use cases are catered for by + `cylc trigger --flow=new`. + + +## Proposal + +1. A new task run mode called "skip" to sit alongside the existing "live" + and "simulation" modes. The implementation would follow the same + code-pathway as simulation mode. I.E. it would fake submission and + execution, but yield real outputs into the workflow. + +2. The "skip" mode would be configured by + `[runtime][][skip]` to separate it from + `[runtime][][simulation]`. + The valid configurations would be: + * `outputs` - Define the outputs to be generated when this task runs + in skip mode. By default, all required outputs will be generated. + * `disable task event handlers` - Disable the event handlers which would + normally be called on task lifecycle events. By default event handers + will be turned off. + + The skip mode will not support simulation mode configurations such + as `speedup factor` which are not relevant to this functionality. + +3. The run mode should be controlled by a new task configuration + `[runtime][]run mode` with the default being `live`. + As a runtime configuration, this can be defined in the workflow for + development / testing purposes or set by `cylc broadcast`. + + If the `run mode` is set to `simulation` or `skip` in the workflow + configuration, then `cylc validate` and `cylc lint` should produce a + warning (similar to development features in other languages / systems). + + Examples: + ```bash + # run plotting tasks in skip mode (i.e. turn it off) + cylc broadcast -n plotting -p '*' -s 'run mode=skip' + + # run debugging tasks in live mode (i.e. turn them on) + cylc broadcast -n DEBUG -p '*' -s 'run mode=live' + ``` + +4. The `cylc set --out` option should accept the `skip` value which should + set the outputs defined in `[runtime][][skip]outputs`. + The `skip` keyword should not be allowed in custom outputs. + +5. Tasks with `run mode = skip` will continue to abide by the `is_held` + flag as normal. + +6. Force-triggering a task will not override the `run mode`. + +7. [Extension] The configured `run mode` should be made available as a task + attribute so that the UI's can display skip/simulation mode tasks with + a task modifier (the held and runahead "badges" are task modifiers). + + The order of precedence for this modifier should be bellow the held and + runahead states, but above queued. + + Examples: + + ``` + Task(is_held=True, run_mode='skip') => is_held modifier + Task(is_runahead=True, run_mode='simulation') => runahead modifier + Task(is_queued=True, run_mode='skip') => skip modifier + ```` + +8. [Extension] When tasks are run in skip mode, the prerequisites which + correspond to the outputs they generate should be marked as + `satisfied by skip mode` rather than `satisfied naturally` for + provenance reasons. + + For the purpose of `cylc remove` logic, `satisfied by skip mode` should + be treated the same as `satisfied naturally`. diff --git a/docs/proposal-task-expire.md b/docs/proposal-task-expire.md deleted file mode 100644 index e1890a2a..00000000 --- a/docs/proposal-task-expire.md +++ /dev/null @@ -1,134 +0,0 @@ -# How Task Expire Should Work - -(IN DISCUSSION: NOT ACCEPTED YET) - -## What Expire Means - -A task can be configured to *expire* instead of run: -- automatically: if too far behind the wall clock -- manually: forced to expire by the user - -Either way, the implication is that there is no point in running the task now. -Perhaps (e.g.) its output files have been obtained by other means, or are -not needed anymore. - -Automatic and forced expiration have the same effect: the expired task can be -"forgotten" by the scheduler without running it to complete its outputs. - - -### Expired Tasks Should Not Cause a Stall - -If a task expires, the workflow definition, or the user, has decreed that under -current conditions it *should* expire, i.e., we don't need to run it anymore. - -Therefore expired tasks should not be retained as incomplete tasks (which is a -surprising/error condition) and allowed to stall the workflow. - -### Tasks Downstream of an Expired Task Should Not Cause a Stall - -If the workflow definition, or the user, has decreed that under current -circumstances a task should expire, i.e. that it should not run all all, then -by implication nothing downstream of it should run either. - - -``` -a => foo => bar # a succeeds, foo expires so foo never runs -``` -So there is no reason to stall on account of `bar` here. Required outputs are -always conditional on the owner task actually running in the first place, and -the downstream graph should not spawn at all if the outputs it hangs off are -not generated. - -From the perspective of `bar` this no different than `foo` not running -because the branch it lives on wasn't taken at all: - -``` -a:x? => foo => bar # a does not generate :x, so foo never runs -``` - -### Expire Triggers Must Be Used To Avoid Early Halt, if That's What's Needed - -An expired task obviously terminates the branch that it belongs to. If a task -does not run, whatever the reason, then downstream tasks that depend on it -running should not spawn. - -Depending on graph structure, this has the potential to result in an early -scheduler shutdown (if according to the graph there is nothing else to run). - -However that *could be exactly what's wanted*, even in a cycling workflow that -has all future cycles cut off by an expired task. We cannot presume that the -graph is wrong if it says this should happen. - -If downstream tasks, or a different graph branch, need to run after a task -expires, then expire triggers must be used to achieve that. - -### The :expire Output Should Be Marked Optional - - -We should enforce that :expire triggers be marked optional, because "required -expiration" doesn't really make sense. - -### Optional :expire Does Not Mean :succeed Must Be Optional - -**NOTE this is moot if we decide that `:expire` should not be considered a -task output - see below.** - -Required outputs are always conditional on the owner task actually running, and -an expired task does not run at all. So `:succeed` (and other outputs) can -still be required even if expiration is optional. Required success really -means, *if the task runs, its success is required*. - -The status of an expired task's required outputs is the same as that of the -required outputs of a task on a branch not taken at runtime. - -``` -foo:x? => bar -foo:y? => baz -``` - -Here, bar:succeed is required ONLY if bar runs, i.e. if branch x is taken. If -branch y is taken, the scheduler does not car that bar did not succeed. - -``` -foo | foo:expire? => bar -``` - -Here, foo:succeed is required ONLY if foo runs, i.e. if it doesn't expire. If -foo expires, the scheduler should not care that foo (and bar) did not succeed. - -## Expired Should Really Be a Task Attribute, Not a Task State - -`Expired` is currently a task state. It should really be an attribute (like -`held`) because the underlying pre-expired state is potentially useful -information: we could easily distinguish between waiting tasks that expired -without running to achieve completion, and finished-but-incomplete tasks that -were force-expired without re-running to achieve completion. - -I don't think anyone is deeply invested in expired as a state. - -Treating `:expire` as a task output is not actually necessary, and it invites -confusion about how it fits into the optional outputs framework: expiration -can't be required so it it must be optional, but does that imply success (and -every other output!) must be optional too? - -As discussed above, optional expire does NOT imply optional success - because -expiration prevents a task from running at all, and the required nature of -"real" outputs is always contingent on the task running in the first place. - -So that's unfortunately a nuance of optional outputs that users will have to -understand if we continue to treat `:expire` as an output. - -Expiration prevents a task from running in the first place, so it makes more -sense to think of expiration as something the scheduler does TO the task, not -something that the task does. (In fact, a task can in principle "be expired" -long before worfklow activity reaches it in the graph). - -So we could use different notation to express this in the graph and allow -triggering off of expiration, e.g.: - -``` -foo => bar -@expire(foo) => baz # if the scheduler (or user) expires foo, run baz -``` - -Now users do not even have to wonder if `foo:expire?` implies `foo:succeed?`.