react immediately to wakeups #1736

kostmo · 2024-01-23T00:49:52Z

Demo

Illustrates immediate (same-tick) reaction to a change of a watched cell:

scripts/play.sh -i data/scenarios/Testing/1598-detect-entity-change.yaml --autoplay --speed 1

Background

Robots can be scheduled to wakeup from a watch by the wakeWatchingRobots function.
Robots may be actually awakened by the wakeUpRobotsDoneSleeping function.

Previously, wakeWatchingRobots would only ever schedule wakeups at currentTick + 1. Then, once the next tick is reached, in wakeUpRobotsDoneSleeping only robots that had been scheduled to wake on precisely the current tick could actually be awakened.

But this made it impossible for one robot to cause another robot, which had been sleeping, to actually wake up within the same tick that scheduling of the wakeup is performed.

The importance of this comes into play for system robots that are watching a cell and intended to react instantaneously to player actions (usually by running code in an instant block).

During each "tick", every active robot gets exactly one "turn" to execute code. Given the following ID assignments:

Robot ID	Robot name
`0`	`base`
`1`	`systemBot`

the systemBot will always execute after the base robot in a given tick.

If the systemBot is watching a given cell, a modification of that cell by the base will schedule a wakeup of the systemBot. If the scheduled wakeup tick is currentTick + 1, then the base will have another execution turn before the systemBot gets an opportunity to react to the base's action at the current tick, causing the systemBot to overlook actions that may be relevant to a goal condition.

Solution

In contast, if we schedule the systemBot wakeup for currentTick instead of currentTick + 1, then it should have an opportunity to wake, run, and react to the base's action before the base's next turn.

But in the status quo, the selection of which robots to run in a given tick is made only once, before iteration over those robots begins. We need instead to update the robots-to-run pool dynamically, after each iteration on an individual robot. The most appropriate data structure for the iteration pool is probably a Monotone priority queue, but we approximate this with an IntSet of RIDs and the minView function.

Being able to alter the list of active robots in the middle of a given tick's robot iteration has required a change to the runRobotIDs function. Instead of using a forM_ to iterate over a static list of RIDs, we have a custom iteration function iterateRobots to handle the dynamic set.

Performance

Benchmarks were performed locally with stack and GHC 9.6.

I had tested the replacement of the forM_ loop with iterateRobots in isolation, and it had zero effect on benchmarks. But some other trivial changes tested in isolation, such as the addition of docstrings, yielded a +6% increase in benchmarks. So there seems to be some instability in the ability of the compiler to perform certain optimizations.

With this PR, increases in benchmark times ranging from 7% to 11% were observed. The obvious culprit is that wakeUpRobotsDoneSleeping is now called N times per tick, rather than once per tick, where N is the number of robots that are initially (or become) active during that tick.

Mitigating with an "awakened" set

In the common case (i.e. when there are no watching robots), the wakeUpRobotsDoneSleeping that is now executed N times per tick (instead of once per tick) incurs a Map lookup, which is O(log M) in the number of distinct future wakeup times across all robots.

However, those extra N invocations only exist to serve the purpose of watching robots that may need to wake up at the current tick. It may be more efficient to have a dedicated Set in the GameState for such robots that gets populated parallel to this insertion:

swarm/src/swarm-engine/Swarm/Game/State/Robot.hs

Line 387 in ad5c589

[ (currentTick, currTickWakeable)

Then if this set remains null, we can avoid paying for an additional O(N * log M) operations entailed by the Map lookups into internalWaitingRobots.

Result

Indeed, storing awakened bots in a new currentTickWakeableBots set restored the benchmarks nearly to the baseline. Instead of the regressions in the 10-20% range observed before, now only a few benchmarks had at most 3-4% increases.

CI regressions

In CI, (snake, powerset, and word-search) exceeded their timeout thresholds in GHC 9.2 and GHC 9.4. No regressions were observed with GHC 9.6. To accommodate the lowest common denominator, I bumped the thresholds for those three scenarios to get a passing build.

I don't believe it is worth further effort or investigation to optimize for GHC 9.4, since such efforts will be moot for GHC 9.6 and onward.

Test invocation:

cabal test swarm-integration --test-show-details streaming --test-options '--color always --pattern "word-search"'

Scenario	GHC 9.4.8	GHC 9.4.8	GHC 9.6.4	GHC 9.6.4
	Before	After	Before	After
`snake`	1.84s	5.38s	1.62s	1.67s
`powerset`	1.66s	5.09s	1.56s	1.66s
`word-search`	0.56s	1.91s	0.44s	0.48s

Potential improvements

`TickNumber` as `Map` key

waitingRobots is of type Map TickNumber [RID]. Instead of Map, we could have a newtype that encapsulates a IntMap to make lookups by TickNumber more efficient.

byorgey · 2024-03-04T10:44:06Z

Currently looking through this. How does it interact with the single-step debugging mode? Do we need to update the singleStep function so that it has corresponding behavior?

kostmo · 2024-03-06T03:40:28Z

How does it interact with the single-step debugging mode? Do we need to update the singleStep function so that it has corresponding behavior?

Yes, I suppose we would. Will look into this.

byorgey

This all looks good to me, with the possible caveat of needing to update singleStep.

This improves the test introduced in #1736 in a few ways: * Place the system robot's code inline in the `program` field * even though the system bot had been using `instant`, the `run` command itself incurs a delay. So this ensures the system robot is ready to monitor the cell immediately * System robot uses exclusively `Intangible` commands in its monitoring loop * This allows removal of the `instant` command. * `Surveil`, which does not appear in any scenario yet, is made `Intangible`. * Remove the initial `wait 2` in the player robot's solution. Tested to ensure that this still exercises the problem that #1736 set out to solve, by commenting out this line (causing the scenario to fail). https://github.com/swarm-game/swarm/blob/01ae0e45d75c30619af76ae2438dc1b92ef61e08/src/swarm-engine/Swarm/Game/State/Robot.hs#L396 ## Demo scripts/play.sh -i data/scenarios/Testing/1598-detect-entity-change.yaml --autoplay --speed 1

kostmo force-pushed the experimental/use-splitlookup branch 3 times, most recently from 7f0bd39 to 670f316 Compare January 23, 2024 02:22

kostmo changed the base branch from main to debug/log-watch-events January 23, 2024 02:22

swarm-game deleted a comment from mergify bot Jan 23, 2024

kostmo changed the title ~~[EXPERIMENTAL] use splitlookup for wakeup~~ use splitLookup for wakeups Jan 23, 2024

kostmo force-pushed the experimental/use-splitlookup branch from 670f316 to c8fd3f5 Compare January 23, 2024 18:58

kostmo changed the title ~~use splitLookup for wakeups~~ react immediately to wakeups Jan 24, 2024

kostmo mentioned this pull request Jan 24, 2024

react immediately to wakeups #1601

Closed

kostmo changed the base branch from debug/log-watch-events to main January 24, 2024 00:43

kostmo force-pushed the experimental/use-splitlookup branch 2 times, most recently from 13a8081 to fcd46f0 Compare January 24, 2024 00:49

This was referenced Jan 24, 2024

[EXPERIMENTAL] perf improvements to #1736 #1739

Closed

[EXPERIMENTAL] benchmark comparison with GHC 9.6.3 #1745

Closed

kostmo force-pushed the experimental/use-splitlookup branch from fcd46f0 to 8920058 Compare March 4, 2024 01:52

kostmo mentioned this pull request Mar 4, 2024

[EXPERIMENTAL] another intset replacement #1781

Draft

restyled-io bot mentioned this pull request Mar 4, 2024

Restyle react immediately to wakeups #1783

Closed

kostmo force-pushed the experimental/use-splitlookup branch 3 times, most recently from 256eca0 to bd968b9 Compare March 4, 2024 06:59

restyled-io bot mentioned this pull request Mar 4, 2024

Restyle react immediately to wakeups #1784

Closed

kostmo force-pushed the experimental/use-splitlookup branch 2 times, most recently from bf9aaa7 to ad5c589 Compare March 4, 2024 07:05

kostmo marked this pull request as ready for review March 4, 2024 07:06

kostmo requested a review from byorgey March 4, 2024 07:06

kostmo mentioned this pull request Mar 5, 2024

[EXPERIMENTAL] perf improvement using set #1786

Closed

byorgey approved these changes Mar 6, 2024

View reviewed changes

use splitLookup for waiting robots

7503012

kostmo added 17 commits March 10, 2024 17:24

small refactor of wakeUpRobotsDoneSleeping

d50a031

return list of active robots from wakeUpRobotsDoneSleeping

f35f7ec

add docstrings

9941944

type alias for game stepping

8bbe0c9

explicit recursion to iterate tick robots

b165617

refactor wakeUpRobotsDoneSleeping

c4c556c

finally fix bug

96208df

tests for instant entity change detection

0ca6945

rebase fix

33e6418

parameterize benchmarking script

aef7ec6

Ensure that awakened robots are scheduled for the proper tick

568c874

use IntSet instead of Set RID for robotsWatching

b2b898f

Add docs, code cleanup

a673ecc

bump time limits for powerset, word-search, and snake

8d0bde9

docfix, reorder function

256f3e3

performance enhancement using awakened set

e49d29c

update docs

80d5b76

kostmo mentioned this pull request Mar 11, 2024

[WIP] debugger robot runpool #1790

Draft

kostmo force-pushed the experimental/use-splitlookup branch from 7f0b299 to 80d5b76 Compare March 11, 2024 02:34

kostmo added the merge me Trigger the merge process of the Pull request. label Mar 11, 2024

mergify bot merged commit 01ae0e4 into main Mar 11, 2024
11 checks passed

mergify bot deleted the experimental/use-splitlookup branch March 11, 2024 02:49

kostmo mentioned this pull request Mar 12, 2024

fix watch wakeup test #1791

Merged

kostmo mentioned this pull request Jun 10, 2024

Parallel semantics for stepping robots #1925

Open

kostmo added the CHANGELOG Once merged, this PR should be highlighted in the changelog for the next release. label Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

react immediately to wakeups #1736

react immediately to wakeups #1736

kostmo commented Jan 23, 2024 •

edited

Loading

byorgey commented Mar 4, 2024 •

edited

Loading

kostmo commented Mar 6, 2024

byorgey left a comment

react immediately to wakeups #1736

react immediately to wakeups #1736

Conversation

kostmo commented Jan 23, 2024 • edited Loading

Demo

Background

Solution

Performance

Mitigating with an "awakened" set

Result

CI regressions

Potential improvements

TickNumber as Map key

byorgey commented Mar 4, 2024 • edited Loading

kostmo commented Mar 6, 2024

byorgey left a comment

Choose a reason for hiding this comment

kostmo commented Jan 23, 2024 •

edited

Loading

`TickNumber` as `Map` key

byorgey commented Mar 4, 2024 •

edited

Loading