Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planner comparison and update #1061

Merged
merged 17 commits into from
Feb 26, 2024
Merged

Conversation

milroy
Copy link
Member

@milroy milroy commented Aug 22, 2023

Edit: this PR has grown in scope to enable updates to planner and planner_multi

To update the Fluxion resource graph, its traverser will need to be reinitialized. The initialization process assembles vectors of resource counts and types via graph traversal to update the planner_multi pruning filters. Currently, counts and types are stored as separate vectors in planner_multi under the assumption that their order and size don't change. The index of a count must be the same as the index of the corresponding resource type string. This assumption is valid as long as the resource graph is static, but allowing the resource graph to change without updating the planners and their metadata will cause scheduling inconsistency (e.g., incorrect traversal pruning) because of incorrect resource types and counts.

This PR adds comparison operators for fundamental planner member data classes to scope the comparison to the correct part of the code. The rescoping avoids relying on awkward class functions for comparison of member data objects and their data.

The PR also restructures planner_multi to use boost::multi_index container to replace and simplify planner_multi member data. The boost::multi_index container has the important capability of searching and comparing by multiple indices which can be vector-like (random_access), and hash-like (hashed_unique). Adding two indexes for lookups enables support for altering resource types, counts, and their orders (indices) in a natural way. Performing these transformations with vectors would be higher complexity both in time and lines of code.

Finally, the PR enables the update functionality in the planner and planner_multi C interfaces and creates unit tests for updating planner and planner_multi.

Edit: this PR also re-enables the Valgrind tests and fixes UB in planner unit tests.

@milroy milroy changed the title Comparison update Planner comparison update Aug 22, 2023
@milroy milroy force-pushed the comparison-update branch 2 times, most recently from 0328643 to 32e97ca Compare October 10, 2023 17:08
jameshcorbett
jameshcorbett previously approved these changes Nov 14, 2023
Copy link
Member

@jameshcorbett jameshcorbett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there more changes you wanted to add @milroy ? It looks reasonable to me, although the commits need bodies.

Question though--is the motivation for this mostly to make the code more idiomatic? It doesn't look like it saves much complexity.

@milroy milroy force-pushed the comparison-update branch 10 times, most recently from 95cd5a7 to ca89c6e Compare January 10, 2024 01:10
@milroy milroy changed the title Planner comparison update Planner comparison and update Jan 10, 2024
@milroy milroy force-pushed the comparison-update branch 12 times, most recently from 2bac648 to 6d6ce89 Compare January 13, 2024 07:27
@milroy
Copy link
Member Author

milroy commented Feb 25, 2024

Deleting a question I asked above because there's a pretty obvious answer that just took a bit of searching.

@trws
Copy link
Member

trws commented Feb 25, 2024

Guessing the answer was UB propagation from the undefined value of bo? I started to type an answer earlier but didn’t get it done before we had to board. Starting a review.

Copy link
Member

@trws trws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending the value safety question this looks great.

@@ -193,6 +229,26 @@ int planner::restore_track_points ()
return rc;
}

int planner::update_total (uint64_t resource_total)
{
int64_t delta = resource_total - m_total_resources;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is m_total_resources guaranteed to be smaller than resource_total?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case where resource_total > m_total_resources is when the graph is being grown, and resource_total < m_total_resources means the graph is being shrunk. There aren't any guarantees on the value of resource_total except that it's nonnegative.

@jameshcorbett asked a form of this question, too, and my reasoning has been that negative values for point->remaining (e.g., when (point->remaining + delta) < 0) shouldn't cause problems because they should only happen for reservations (which will be cleared at the next schedule loop). I was thinking that adding a check for negative remaining wasn't worth the performance penalty.

That said, I don't expect update_total to be called often, and adding a guard check may prevent complicated problems with removing spans if that process isn't carefully performed upstream (i.e., in planner_c_interface).

@milroy
Copy link
Member Author

milroy commented Feb 25, 2024

Guessing the answer was UB propagation from the undefined value of bo?

Yeah, that was it, and I added two commits to fix other occurrences in the unit tests. I thought I was right about the comma-separated syntax for declaration and initialization and went down the wrong investigative hole for a while.

before we had to board

Thanks for the review and have a good flight!

Problem: to update the Fluxion resource graph, the traverser will need
to be reinitialized, which assembles vectors of resource counts and
types. The ordering of the counts and types depends on how graph is
mutated. Currently, counts and types are stored as separate vectors in
`planner_multi` under the assumption that their order and size don't
change. Removing that assumption will cause inconsistency in
scheduling.

Add a `boost::multi_index` container to replace and simplify
`planner_multi` member data. The `boost::multi_index` container has the
important capability of searching and comparing by multiple indices
which can be vector-like (`random_access`), and hash-like
(`hashed_unique`). Adding two indexes for lookups enables support for
altering resource types, counts, and their orders (indices) in a
natural way. Performing these transformations with `vectors` would be
high-complexity both in time and code.
@milroy milroy force-pushed the comparison-update branch 4 times, most recently from 2c8d0bf to 9038b13 Compare February 26, 2024 19:05
Problem: when updating the Fluxion resource graph the `planner`
scheduled points' remaining resource count must be updated.

Add an `update` function to add or subtract from the current remaining
resource count at each scheduled point. Create a guard to ensure the
remaining count is nonnegative.
Problem: when updating the Fluxion resource graph `request_multi`
`counts`' indices can be reordered. Reordering will cause inconsistent
lookups.

Update the `request_multi` `counts` to be map-based to handle reordering
of indices.
Problem: when updating the Fluxion resource graph planners, resources,
counts, and their orders can be changed by mutation of the resource
graph topology and attributes.

Add `update` functionality to `planner_multi` and the C interface to add
new resources and planners, reorder, modify, and permute existing
resources, and delete removed resources.
Problem: updating a `planner` requires unit tests to maintain correct
functionality.

Add update unit tests for `planner`.
Problem: updating a `planner_multi` requires unit tests to maintain
correct functionality.

Add update unit tests for `planner_multi`.
Problem: the migration to cmake broke the ability to run Valngrind
tests, and Valgrind doesn't detect leaks in important Fluxion
components when running under Flux.

Add capability to find Valgrind and run Valgrind tests. Add additional
tests for planner and schema.
Problem: the current planner tests do not destroy the planners created
during the tests, resulting in memory leaks that will pollute Valgrind
test output.

Add calls to destroy the planners to eliminate the memory leaks.
Problem: planner_multi_t contexts are declared and initialized using a
comma-separated list that can result in UB.

Correct the declaration and initialization.
Problem: planner_multi_t contexts and ints are declared and initialized
using a comma-separated list that can result in UB.

Correct the declarations and initializations.
@milroy
Copy link
Member Author

milroy commented Feb 26, 2024

Thanks for the reviews all! Setting MWP.

@milroy milroy added the merge-when-passing mergify.io - merge PR automatically once CI passes label Feb 26, 2024
@mergify mergify bot merged commit 2c61dcf into flux-framework:master Feb 26, 2024
23 checks passed
@milroy milroy deleted the comparison-update branch February 26, 2024 19:44
Copy link

codecov bot commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 85.37736% with 31 lines in your changes missing coverage. Please review.

Project coverage is 70.9%. Comparing base (cd2fb0f) to head (6f16387).
Report is 308 commits behind head on master.

Files with missing lines Patch % Lines
resource/planner/c++/planner_multi.cpp 83.5% 16 Missing ⚠️
resource/planner/c/planner_multi_c_interface.cpp 78.3% 13 Missing ⚠️
resource/planner/c++/planner.cpp 93.1% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1061     +/-   ##
========================================
+ Coverage    70.6%   70.9%   +0.2%     
========================================
  Files          95      96      +1     
  Lines       12744   12819     +75     
========================================
+ Hits         9010    9099     +89     
+ Misses       3734    3720     -14     
Files with missing lines Coverage Δ
resource/planner/c++/mintime_resource_tree.cpp 90.4% <100.0%> (+0.3%) ⬆️
resource/planner/c++/mintime_resource_tree.hpp 100.0% <ø> (ø)
resource/planner/c++/planner_internal_tree.cpp 100.0% <100.0%> (ø)
resource/planner/c++/planner_internal_tree.hpp 100.0% <ø> (ø)
resource/planner/c++/planner_multi.hpp 100.0% <100.0%> (ø)
resource/planner/c++/scheduled_point_tree.cpp 92.3% <100.0%> (+11.2%) ⬆️
resource/planner/c++/scheduled_point_tree.hpp 100.0% <ø> (ø)
resource/planner/c/planner_c_interface.cpp 70.1% <100.0%> (+0.1%) ⬆️
resource/traversers/dfu_impl.hpp 94.7% <100.0%> (+0.1%) ⬆️
resource/planner/c++/planner.cpp 83.5% <93.1%> (+0.8%) ⬆️
... and 2 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing mergify.io - merge PR automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fluxion Valgrind tests aren't run
4 participants