Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow clients to implement large hash arrays without a collection-rehash feedback loop in the public MPS #230

Merged
merged 2 commits into from
Jun 16, 2023

Conversation

thejayps
Copy link
Contributor

@thejayps thejayps commented Jun 2, 2023

Provides a solution to issue #215

Progress towards resolving #110

This isn't the nicest way to solve the problem, but so far it works well for Confifura who have encountered the issue.

Part of the plan to meet Configura's requirements by separating and reviewing each implementation.

This branch cherry-picks and edits #76 to only include the "hash array" feature.

Documentation for the original requirement is here: https://www.ravenbrook.com/project/mps/issue/job003435/

@thejayps thejayps requested review from rptb1 and UNAA008 June 2, 2023 14:01
@thejayps thejayps added this to the version 1.118 milestone Jun 2, 2023
@rptb1 rptb1 added the essential Will cause failure to meet customer requirements. Assign resources. label Jun 5, 2023
@rptb1
Copy link
Member

rptb1 commented Jun 5, 2023

I think it would be worth writing a test case for this. It would need to create hash tables using LDs and other objects, then provoke and detect the case where the rehashing goes into a loop. It's quite a subtle test, and I don't think we should do it for version 1.118. Perhaps raise an issue. Also, check the MMQA suite and the custom/cet branch for relevant stuff. We can and should reproduce it in CET. It might be an example of a test that could be implemented neatly in the Scheme interpreter.

Note: #247

@thejayps
Copy link
Contributor Author

thejayps commented Jun 5, 2023

Executing proc.review.entry

  1. Start time 1953
  2. proc.review.entry.criteria:
    applying entry.universal, entry.design, entry.impl.
  3. entry.universal.author: This work was done by manually isolating specific changes that form the implementation of the "hash arrays" feature from Configura's custom production MPS source. I am not the original author, but I performed the work to isolate this feature, and I request and permit review. Configura have given us permission to do this work: https://info.ravenbrook.com/mail/2022/01/18/12-53-38/0/

@thejayps
Copy link
Contributor Author

thejayps commented Jun 5, 2023

paused 20:10

@thejayps
Copy link
Contributor Author

thejayps commented Jun 5, 2023

resuming 2124
4. entry.universal.source-available: 2 mail threads: The story behind "Ravenbrook MPS sources are out of sync with
Configura's sources" and Gathering information on Configura specific functionality describe the approach to this work. (Links to the base email only, p4 grep can find all the subsequent responses). In addition, the issue #215 links to a description of the problem this functionality was designed to solve. There aren't any branches documented in https://info.ravenbrook.com/project/mps/branch/ that describe the original genesis of the requirement.
5. source-approved - there is no available evidence that any source document has exited review. All source documents should be considered "unreviewed"
6. entry.design.rfc - there isn't much evidence available for why the design approach for this solution was originally chosen, however the working of the design itself seems to be described adequately
7. Entry finished at 2205

@thejayps
Copy link
Contributor Author

thejayps commented Jun 5, 2023

Entry took ~1h (but included writing up issues that arose with the entry criteria)

@thejayps
Copy link
Contributor Author

thejayps commented Jun 5, 2023

Executing proc.review.plan

  1. Started at 2221
  2. @thejayps, @rptb1, @UNAA008 will be available
  3. The change itself consists of 3 lines of code changed, and 70 lines of documentation. The code changes expose pre-existing implementation to the MPS API so the "product document" is not limited to the 3 lines of code changed. I estimate less than 100 lines may need to be checked for correctness by the correctness reviewer.
    Doc 70 lines
    Code ~100 lines but those lines need tracing and finding so I'll apply a multiplier ->(10 loc per min) = 10 mins *1.5 = 15 mins checking code.
    ~30 mins checking documentation for fresh eyes (i.e unfamiliar with code or requirement) in clarity role.
  4. Role ideas:
    @rptb1 checking correctness ~15/20 mins, additional time to be spent specifically identifying defects that might have been detected by a testbench (since one does not exist)
    @thejayps checking .sources and .consitency (and leading)
    @UNAA008 on .clarity: "The clarity checking role focuses on whether the product document is clear and obvious. This is a good role to give to someone who has never seen the product document before, but who is in the intended readership. Anything that is unclear to them is a defect."
  5. Time: checking will take 30 mins, but could be extended to 45 while calibration of checking rates is ongoing.
  6. Sources and instructions are available - TODO: send to checkers
  7. Plan finished at 2249
  8. Plan took ~30 mins

@thejayps
Copy link
Contributor Author

thejayps commented Jun 5, 2023

The source documents for this review are:
Issues:
#110
#215
Mails:
https://info.ravenbrook.com/mail/2023/01/13/14-45-37/0/
(note that this is the first mail in the thread - other mails in the thread might provide more context but won't add much else)

What is to be reviewed:

  • The documentation contained in this change is largely the "product document" in terms of documentation, but consistency with other documentation will need to be reviewed by the .consistency checker.
  • The code change is only 3 lines: The code product document mostly consists of those lines of code that would not have been executed if the three changed lines had not existed. The .correctness role has time factored in to identify the product document. My estimation is that it's less than 100 lines. I won't provide further guidance on what I think the product document is as it's more likely that defects will be found if the correctness checker discovers the product document for themselves.

Rules
https://github.com/Ravenbrook/mps/blob/586336a7021c2276a3f798ad10e6e4ab5f438226/procedure/rule.code.rst
https://github.com/Ravenbrook/mps/blob/586336a7021c2276a3f798ad10e6e4ab5f438226/procedure/rule.code.style.rst
https://github.com/Ravenbrook/mps/blob/586336a7021c2276a3f798ad10e6e4ab5f438226/procedure/rule.generic.rst
https://github.com/Ravenbrook/mps/blob/586336a7021c2276a3f798ad10e6e4ab5f438226/procedure/rule.style.rst
apply.

@rptb1
Copy link
Member

rptb1 commented Jun 6, 2023

Executing proc.review.plan

There is nothing noted for proc.review.plan.schedule. I realise we arranged this in advance, but you should record it.

From the invitation:

Review will take place at 1300 on 2023-06-06.

@thejayps
Copy link
Contributor Author

thejayps commented Jun 6, 2023

Here is proc.review.check for checkers

@thejayps
Copy link
Contributor Author

thejayps commented Jun 6, 2023

Beginning proc.review.ko at 1304
@rptb1 on .correctness and identifying defects that could be hidden because of lack of testbench
@thejayps on .sources and .consistency
@UNAA008 on .clarity
Checking started at 1320
Logging will take place at 1350 (optionally checking might be extended by 15 mins if checkers would like more time).

Copy link
Member

@rptb1 rptb1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing proc.review.check

  1. Start time 13:20.
  2. Checked out branch and ran make test on my machine (lii6gc) without issue.
  3. Also built the manual with make -C manual clean html without issue.
  4. Checked the manual and the header changes.
  5. Checking existing hash arrays code in poolamc.c.
  6. m. https://github.com/Ravenbrook/mps/blob/branch/2023-05-31/hash-arrays/code/poolamc.c#L516 pointless comment.
  7. M. https://github.com/Ravenbrook/mps/blob/branch/2023-05-31/hash-arrays/code/poolamc.c#L961-L962 "defer the size accounting" with what intention? (Could refer to design.mps.strategy section.) rule.generic.clear [Please see https://github.com/Hash arrays mechanism is not general #233#issuecomment-1580131391 when editing this. RB 2023-05-07]
  8. Finished checking for logging meeting 13:50.
  9. Found 1M 6m 1q.

manual/source/pool/amc.rst Outdated Show resolved Hide resolved
manual/source/pool/amc.rst Outdated Show resolved Hide resolved
manual/source/pool/amc.rst Outdated Show resolved Hide resolved
manual/source/pool/amc.rst Outdated Show resolved Hide resolved
@thejayps
Copy link
Contributor Author

thejayps commented Jun 6, 2023

M1 this issue does not link to documentation for the original requirement which seems to be here: https://www.ravenbrook.com/project/mps/issue/job003435/

Fix: added link in #230 description above

M2 This feature is not tested (stress, correctness or regression). This job describes the lack of stress testing: https://www.ravenbrook.com/project/mps/issue/job003963/

Raise: #247

comment1: nice improvements to the documentation noted in the past could be relevant https://www.ravenbrook.com/project/mps/issue/job004109/

Raise: #249

Copy link
Contributor

@UNAA008 UNAA008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment is that the mechanism that setting
MPS_KEY_AP_HASH_ARRAYS to true uses to not contribute to nursery space
is not described. But perhaps it is correct that implementation
details (which could change) are not discussed.

Raise: #249

@rptb1
Copy link
Member

rptb1 commented Jun 6, 2023

M1 this issue does not link to documentation for the original requirement which seems to be here: https://www.ravenbrook.com/project/mps/issue/job003435/

We can update the Perforce fixes to link this to the hash arrays implementation as well.

@rptb1
Copy link
Member

rptb1 commented Jun 6, 2023

NQ. Is ramping required? If not we could deprecate or remove it.
NQ. Is ramping documented? If so, we could mention in the manual that hash arrays are ramping.

@rptb1
Copy link
Member

rptb1 commented Jun 6, 2023

Editing may involve updating and tagging design.mps.strategy so I suggest the editor also take note of

mps/design/strategy.txt

Lines 565 to 567 in 8be637f

- 2013-06-04 NB_ Checked this in although it's far from complete.
Pasted in my 'ramping notes' from email, which mention some bugs
which I may have fixed (TODO: check this).

@thejayps
Copy link
Contributor Author

thejayps commented Jun 6, 2023

Begin logging at 1407: (no extra time checking but took a break)
@rptb1 found 1M 6m 1question
@UNAA008 1 comment
@thejayps found 2M 1m 1comment
Logging finished at 1440. Brainstorming to start at 1450

@thejayps thejayps added the pending Something needs doing, even if closed. label Jun 6, 2023
@thejayps
Copy link
Contributor Author

thejayps commented Jun 6, 2023

Brainstorming started at 1505
Considering @thejayps Ms:
M1 #230 (comment):
@thejayps + @rptb1 together suggest a "proc.excavate" procedure which suggests ways to help look for symbols/requirements etc in as many sources as possible. @thejayps original idea was an easy or documented way to search the mail archive. And beyond into the Harlequin era - something @rptb1 is capable of doing.
M2: @rptb1 maybe invest more in our test infrastructure to make tests easier to write. Eg make it easy to test in scheme.
@thejayps brainstorm ways to have more resource available to the MPS team.
Note (@rptb1) that testing has never been a major way of ensuring quality in the MPS.
@rptb1 M #230 (review)
@thejayps suggests a heat map or statistical analysis of tags. @rptb1 notes it's a graph tracing problem.

PI: @UNAA008 suggests that 5 mins brainstorming each defect isn't always enough.
Logging finished at 1540

@rptb1
Copy link
Member

rptb1 commented Jun 9, 2023

NQ. Is ramping required? If not we could deprecate or remove it.

Answer: @thejayps please check this for Configura, but I think this is a good requirement to keep, on reflection.

NQ. Is ramping documented? If so, we could mention in the manual that hash arrays are ramping.

Answer: Ramping is documented in the Allocation Patterns topic of the manual.

@thejayps
Copy link
Contributor Author

Begin proc.review.edit

  1. Start time 1733

@thejayps
Copy link
Contributor Author

6. m. https://github.com/Ravenbrook/mps/blob/branch/2023-05-31/hash-arrays/code/poolamc.c#L516 pointless comment.

from #230 (review)

Fix: deleted comment in commit

@thejayps
Copy link
Contributor Author

editing paused at 1815

Copy link
Contributor Author

@thejayps thejayps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documenting proc.review.edit

manual/source/release.rst Outdated Show resolved Hide resolved
manual/source/pool/amc.rst Outdated Show resolved Hide resolved
manual/source/pool/amc.rst Outdated Show resolved Hide resolved
@thejayps
Copy link
Contributor Author

NQ. Is ramping required? If not we could deprecate or remove it.

Answer: @thejayps please check this for Configura, but I think this is a good requirement to keep, on reflection.
from #230 (comment)

Answer to @rptb1: see https://info.ravenbrook.com/mail/2023/06/15/20-56-30/0/

@thejayps
Copy link
Contributor Author

7. M. https://github.com/Ravenbrook/mps/blob/branch/2023-05-31/hash-arrays/code/poolamc.c#L961-L962 "defer the size accounting" with what intention?  (Could refer to design.mps.strategy section.)  rule.generic.clear  [Please see https://github.com/[Hash arrays mechanism is not general #233](https://github.com/Ravenbrook/mps/issues/233)#issuecomment-1580131391 when editing this. RB 2023-05-07] from https://github.com/Ravenbrook/mps/pull/230#pullrequestreview-1465077305

Raise: #249

@thejayps
Copy link
Contributor Author

M1 this issue does not link to documentation for the original requirement which seems to be here: https://www.ravenbrook.com/project/mps/issue/job003435/

We can update the Perforce fixes to link this to the hash arrays implementation as well.

from #230 (comment)

pass: @rptb1

Assigning PR to @rptb1 and PR should remain "pending"

@thejayps
Copy link
Contributor Author

executing proc.review.exit

  1. Start time 1730
  2. Transgression - Allow clients to implement large hash arrays without a collection-rehash feedback loop in the public MPS  #230 (comment) has no response. Raise Design for "hash arrays" needs to clearer #249 is the appropriate action
  3. Transgression - estimation of remaining defects has not taken place: 3 M defects found: using 75% rule 1 likely remains
  4. exit.pass
  5. review.exit.calc:
    • hours used: ~12
    • hours saved: 30-12 = 18
    • major defects remaining: ~1
  6. exit finished at 1740 - took 10 mins

@thejayps
Copy link
Contributor Author

executing proc.merge.pull-request (currently in #228)

  1. Start time 1813
  2. Checkist "no" to item 3: There is no testcase, we have decided to proceed and write one later: Allow clients to implement large hash arrays without a collection-rehash feedback loop in the public MPS  #230 (comment)
  3. Merge conflicts found. Pausing at 1825

@thejayps thejayps merged commit de19bf4 into master Jun 16, 2023
@thejayps
Copy link
Contributor Author

Merge resumed and completed at 1935

@rptb1
Copy link
Member

rptb1 commented Jun 17, 2023

Fixing up merge mistakes by catching up from master, re-merging, and force-pushing to master. See #253 for prevention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
essential Will cause failure to meet customer requirements. Assign resources. pending Something needs doing, even if closed.
Development

Successfully merging this pull request may close these issues.

Large hash tables can result in a feedback loop of collections and reallocations of the hash table
3 participants