Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

react immediately to wakeups #1736

Merged
merged 18 commits into from
Mar 11, 2024
Merged

react immediately to wakeups #1736

merged 18 commits into from
Mar 11, 2024

Conversation

kostmo
Copy link
Member

@kostmo kostmo commented Jan 23, 2024

Fixes #1598

Demo

Illustrates immediate (same-tick) reaction to a change of a watched cell:

scripts/play.sh -i data/scenarios/Testing/1598-detect-entity-change.yaml --autoplay --speed 1

Background

Robots can be scheduled to wakeup from a watch by the wakeWatchingRobots function.
Robots may be actually awakened by the wakeUpRobotsDoneSleeping function.

Previously, wakeWatchingRobots would only ever schedule wakeups at currentTick + 1. Then, once the next tick is reached, in wakeUpRobotsDoneSleeping only robots that had been scheduled to wake on precisely the current tick could actually be awakened.

But this made it impossible for one robot to cause another robot, which had been sleeping, to actually wake up within the same tick that scheduling of the wakeup is performed.

The importance of this comes into play for system robots that are watching a cell and intended to react instantaneously to player actions (usually by running code in an instant block).

During each "tick", every active robot gets exactly one "turn" to execute code. Given the following ID assignments:

Robot ID Robot name
0 base
1 systemBot

the systemBot will always execute after the base robot in a given tick.

If the systemBot is watching a given cell, a modification of that cell by the base will schedule a wakeup of the systemBot. If the scheduled wakeup tick is currentTick + 1, then the base will have another execution turn before the systemBot gets an opportunity to react to the base's action at the current tick, causing the systemBot to overlook actions that may be relevant to a goal condition.

Solution

In contast, if we schedule the systemBot wakeup for currentTick instead of currentTick + 1, then it should have an opportunity to wake, run, and react to the base's action before the base's next turn.

But in the status quo, the selection of which robots to run in a given tick is made only once, before iteration over those robots begins. We need instead to update the robots-to-run pool dynamically, after each iteration on an individual robot. The most appropriate data structure for the iteration pool is probably a Monotone priority queue, but we approximate this with an IntSet of RIDs and the minView function.

Being able to alter the list of active robots in the middle of a given tick's robot iteration has required a change to the runRobotIDs function. Instead of using a forM_ to iterate over a static list of RIDs, we have a custom iteration function iterateRobots to handle the dynamic set.

Performance

Benchmarks were performed locally with stack and GHC 9.6.

I had tested the replacement of the forM_ loop with iterateRobots in isolation, and it had zero effect on benchmarks. But some other trivial changes tested in isolation, such as the addition of docstrings, yielded a +6% increase in benchmarks. So there seems to be some instability in the ability of the compiler to perform certain optimizations.

With this PR, increases in benchmark times ranging from 7% to 11% were observed. The obvious culprit is that wakeUpRobotsDoneSleeping is now called N times per tick, rather than once per tick, where N is the number of robots that are initially (or become) active during that tick.

Mitigating with an "awakened" set

In the common case (i.e. when there are no watching robots), the wakeUpRobotsDoneSleeping that is now executed N times per tick (instead of once per tick) incurs a Map lookup, which is O(log M) in the number of distinct future wakeup times across all robots.

However, those extra N invocations only exist to serve the purpose of watching robots that may need to wake up at the current tick. It may be more efficient to have a dedicated Set in the GameState for such robots that gets populated parallel to this insertion:

[ (currentTick, currTickWakeable)

Then if this set remains null, we can avoid paying for an additional O(N * log M) operations entailed by the Map lookups into internalWaitingRobots.

Result

Indeed, storing awakened bots in a new currentTickWakeableBots set restored the benchmarks nearly to the baseline. Instead of the regressions in the 10-20% range observed before, now only a few benchmarks had at most 3-4% increases.

CI regressions

In CI, (snake, powerset, and word-search) exceeded their timeout thresholds in GHC 9.2 and GHC 9.4. No regressions were observed with GHC 9.6. To accommodate the lowest common denominator, I bumped the thresholds for those three scenarios to get a passing build.

I don't believe it is worth further effort or investigation to optimize for GHC 9.4, since such efforts will be moot for GHC 9.6 and onward.

Test invocation:

cabal test swarm-integration --test-show-details streaming --test-options '--color always --pattern "word-search"'
Scenario GHC 9.4.8 GHC 9.4.8 GHC 9.6.4 GHC 9.6.4
Before After Before After
snake 1.84s 5.38s 1.62s 1.67s
powerset 1.66s 5.09s 1.56s 1.66s
word-search 0.56s 1.91s 0.44s 0.48s

Potential improvements

TickNumber as Map key

waitingRobots is of type Map TickNumber [RID]. Instead of Map, we could have a newtype that encapsulates a IntMap to make lookups by TickNumber more efficient.

@kostmo kostmo force-pushed the experimental/use-splitlookup branch 3 times, most recently from 7f0bd39 to 670f316 Compare January 23, 2024 02:22
@kostmo kostmo changed the base branch from main to debug/log-watch-events January 23, 2024 02:22
@swarm-game swarm-game deleted a comment from mergify bot Jan 23, 2024
@kostmo kostmo changed the title [EXPERIMENTAL] use splitlookup for wakeup use splitLookup for wakeups Jan 23, 2024
@kostmo kostmo force-pushed the experimental/use-splitlookup branch from 670f316 to c8fd3f5 Compare January 23, 2024 18:58
@kostmo kostmo changed the title use splitLookup for wakeups react immediately to wakeups Jan 24, 2024
@kostmo kostmo changed the base branch from debug/log-watch-events to main January 24, 2024 00:43
@kostmo kostmo force-pushed the experimental/use-splitlookup branch 2 times, most recently from 13a8081 to fcd46f0 Compare January 24, 2024 00:49
@kostmo kostmo force-pushed the experimental/use-splitlookup branch from fcd46f0 to 8920058 Compare March 4, 2024 01:52
@kostmo kostmo force-pushed the experimental/use-splitlookup branch 3 times, most recently from 256eca0 to bd968b9 Compare March 4, 2024 06:59
@kostmo kostmo force-pushed the experimental/use-splitlookup branch 2 times, most recently from bf9aaa7 to ad5c589 Compare March 4, 2024 07:05
@kostmo kostmo marked this pull request as ready for review March 4, 2024 07:06
@kostmo kostmo requested a review from byorgey March 4, 2024 07:06
@byorgey
Copy link
Member

byorgey commented Mar 4, 2024

Currently looking through this. How does it interact with the single-step debugging mode? Do we need to update the singleStep function so that it has corresponding behavior?

@kostmo
Copy link
Member Author

kostmo commented Mar 6, 2024

How does it interact with the single-step debugging mode? Do we need to update the singleStep function so that it has corresponding behavior?

Yes, I suppose we would. Will look into this.

Copy link
Member

@byorgey byorgey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, with the possible caveat of needing to update singleStep.

@kostmo kostmo force-pushed the experimental/use-splitlookup branch from 7f0b299 to 80d5b76 Compare March 11, 2024 02:34
@kostmo kostmo added the merge me Trigger the merge process of the Pull request. label Mar 11, 2024
@mergify mergify bot merged commit 01ae0e4 into main Mar 11, 2024
11 checks passed
@mergify mergify bot deleted the experimental/use-splitlookup branch March 11, 2024 02:49
@kostmo kostmo mentioned this pull request Mar 12, 2024
mergify bot pushed a commit that referenced this pull request Mar 15, 2024
This improves the test introduced in #1736 in a few ways:
* Place the system robot's code inline in the `program` field
    * even though the system bot had been using `instant`, the `run` command itself incurs a delay.  So this ensures the system robot is ready to monitor the cell immediately
* System robot uses exclusively `Intangible` commands in its monitoring loop
    * This allows removal of the `instant` command.
    * `Surveil`, which does not appear in any scenario yet, is made `Intangible`.
* Remove the initial `wait 2` in the player robot's solution.

Tested to ensure that this still exercises the problem that #1736 set out to solve, by commenting out this line (causing the scenario to fail).

https://github.com/swarm-game/swarm/blob/01ae0e45d75c30619af76ae2438dc1b92ef61e08/src/swarm-engine/Swarm/Game/State/Robot.hs#L396

## Demo

    scripts/play.sh -i data/scenarios/Testing/1598-detect-entity-change.yaml --autoplay --speed 1
@kostmo kostmo added the CHANGELOG Once merged, this PR should be highlighted in the changelog for the next release. label Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CHANGELOG Once merged, this PR should be highlighted in the changelog for the next release. merge me Trigger the merge process of the Pull request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System robot should recognize lock combo in single tick
2 participants