Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules: Move feasibility/satisfiability checking into a new module #1285

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Commits on Sep 28, 2024

  1. Split feasibility from resource via CLI option.

    Problem: sched.feasibility RPCs take up too much of
    	 sched-fluxion-resource's single-threaded time.
    
    Add CLI option to create a 'feasibility version' of
    sched-fluxion-resource called sched-fluxion-satisfiability
    that can run on multiple ranks.
    Jacob Tkeio authored and jacobtkeio committed Sep 28, 2024
    Configuration menu
    Copy the full SHA
    f4fa865 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2024

  1. Make s-f-satisfiability acquire from s-f-resource

    Problem: Only one resource.acquire RPC can be active at a time, but both
             s-f-resource and s-f-satisfiability try to open one.
    
    Make s-f-satisfiability call s-f-resource.notify to populate
    	its resources (kind of like s-f-qmanager does).
    Make s-f-resource send its resources instead of null on notify RPCs
    	to accomodate s-f-satisfiability.
    Finally, force the FIRST matching policy for s-f-satisfiability.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    a5ab60d View commit details
    Browse the repository at this point in the history
  2. Split feasibility into its own full module

    Problem: Having a 'satisfiability version' of the s-f-resource module
             makes resource_match.cpp hard to read and less maintainable.
    
    Split resource_match.cpp into a resource.cpp and feasibility.cpp
    that contain their respective modules. Leave the code common to both
    in resource_match.cpp with a new header, resource_match.hpp. This
    simplifies adding new modules that acquire and match resources.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    372a805 View commit details
    Browse the repository at this point in the history
  3. Resolve CMakeLists.txt linking error

    Problem: Tests segfault when unloading s-f-resource under flux-broker
    
    Create a sched-fluxion-resource-module library that loads the
    'resource' target only once between both s-f-resource and
    s-f-feasibility. The previous CMakeLists.txt was loading
    the 'resource' target twice, which caused a segfault during
    unload when running under 'flux broker' on static variable
    ANY_RESOURCE_TYPE{"*"} in matcher.cpp.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    9276dfc View commit details
    Browse the repository at this point in the history
  4. Cancel feasibility module on s-f-resource error

    Problem: The feasibility module ignores s-f-resource after it gets
             resources from s-f-resource.notify, but it should exit when
             s-f-resource does to prevent odd behavior after an
             s-f-resource reload, especially with different resources.
    
    Make s-f-feasibility listen for errors on the .notify stream.
    
    Send graph expiration time to feasibility module
    
    Problem: s-f-resource.notify does not currently send graph expiration
             info to s-f-feasibility.
    
    Send it.
    
    Fix several errors
    
    Problem: Improper use of git including forgetting to add a file in
             the previous commit and losing changes during a merge.
    
    Fix various things including broken type of m_acquired_resources and
        unnecessary+broken code in resource_match_opts.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    e37f43b View commit details
    Browse the repository at this point in the history
  5. Load feasibility on rank 0 only by default

    Problem: s-f-feasibility launches on all ranks by default. This is
             probably not a good default behavior.
    
    Change rc1.d/01-s-f to launch s-f-feasibility on only rank 0.
    Change rc3.d/01-s-f to remove s-f-feasibility from all ranks.
    This allows for any layout of s-f-feasibility instances while
    guaranteeing at least one.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    80776f6 View commit details
    Browse the repository at this point in the history
  6. Remove unnecessary check in notify_request_cb

    Problem: notify_request_cb waits for resource.acquire if it
             has not yet recieved resources. However, it is
             guaranteed to have resources since init_resource-
             _graph must return before flux_reactor_run starts,
             and notify_request_cb can only happen after that.
    
    Remove the check.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    2ba4a06 View commit details
    Browse the repository at this point in the history
  7. Handle 'feasibility.check' RPC in s-f-feasibility

    Problem: s-f-feasibility performs feasibility checking through
             the sched.feasibility RPC. However, RFC 27 requires
             feasibility checking to be performed in feasibility.check.
    
    Make s-f-feasibility register the 'feasibility' service and
    'feasibility.check' RPC instead of the 'sched.feasibility' RPC.
    Remove 'sched.feasibility' forwarder cb in qmanager.cpp.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    1346e5a View commit details
    Browse the repository at this point in the history
  8. Remove 'sched.feasibility' calls from tests

    Problem: Some tests call 'sched.feasibility', which no loger exists.
    
    Swap 'sched.feasibility' for 'feasibility.check'.
    
    Add feasibility module to sched-sharness and t1020
    
    Problem: Some tests that need satisfiability information do not
             load and unload s-f-feasibility.
    
    Add load_feasibility, reload_feasibility, and remove_feasibility
    functions to sched-sharness.sh and the relevant tests.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    1e5a6eb View commit details
    Browse the repository at this point in the history
  9. Add feasibility module test

    Problem: All calls to feasibility.check are in tests for s-f-resource,
    	 not tests for s-f-feasibility.
    
    Move them to a separate test for the feasibility module.
    
    Disallow load-file behavior in feasibility test
    
    Problem: t4014 expects feasibility to load resources from a resource
             module that was passed a load-file, which is not desired
             behavior.
    
    Update t4014 to expect failure on such a load.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    b0b5801 View commit details
    Browse the repository at this point in the history
  10. Apply clang formatting

    Problem: The formatting did not pass the CI code formatting check.
    
    Apply the required changes.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    44d5bb5 View commit details
    Browse the repository at this point in the history
  11. Remove resource marking from feasibility init

    Problem: As a holdover from s-f-resource, s-f-feasibility marks its
             acquired resources as DOWN, which is unnecessary.
    
    Remove this resource marking from init_resource_graph.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    0d43cb2 View commit details
    Browse the repository at this point in the history
  12. Make s-f-resource propagate flux_respond_pack err

    Problem: If notify_request_cb fails flux_respond_pack,
             it responds with nothing, leaving feasibility
             without resources but active.
    
    Make notify_request_cb send a flux_respond_error on
    a flux_respond_pack_error.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    e57c11a View commit details
    Browse the repository at this point in the history
  13. Fix broken && chains in t4014

    Problem: fedora40 CI fails on t4014 due to """
      error: bug in the test script: broken &&-chain:
          load_feasibility
          flux dmesg -c | grep -q "File exists"
    """
    
    Add && between successive statements in t4014.
    jacobtkeio committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    5c573ad View commit details
    Browse the repository at this point in the history