-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase Tree Count to 50 & Update Memory Calculation for RCF models #1181
Conversation
Signed-off-by: Kaituo Li <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1181 +/- ##
============================================
- Coverage 80.54% 80.53% -0.01%
- Complexity 4599 4601 +2
============================================
Files 336 336
Lines 19091 19119 +28
Branches 1987 1993 +6
============================================
+ Hits 15377 15398 +21
- Misses 2769 2772 +3
- Partials 945 949 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
* numberOfTrees * dimension * 4 * averagePointStoreUsage + 77192); | ||
long thresholdSize = 6 * (dimension * 8 + 16) + shingleSize * 8 + 624; | ||
return compactRcfSize + thresholdSize; | ||
int pointStoreCapacity = Math.max(sampleSize * numberOfTrees + 1, 2 * sampleSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sampleSize * numberOfTrees
are calculated multiple times, maybe consider storing this in a variable if it doesn't change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: Kaituo Li <[email protected]>
…1181) * Parameter tuning Signed-off-by: Kaituo Li <[email protected]> * reuse capacity calculation Signed-off-by: Kaituo Li <[email protected]> --------- Signed-off-by: Kaituo Li <[email protected]> (cherry picked from commit e9a3782) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…1181) (#1186) * Parameter tuning * reuse capacity calculation --------- (cherry picked from commit e9a3782) Signed-off-by: Kaituo Li <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
This PR accomplishes two main objectives:
Methodology for Memory Calculation:
The updated memory calculation follows the methodology in PR #222. It involves running TRCF or RCFCaster with one million data points and measuring object sizes using jmap memory dumps. This white-box approach examines all fields listed under the heap dump to account for various scenarios, such as fluctuations in node store size due to parameter changes. Additionally, the point store ratio adjustment is based on shingle size, using a heuristic constant derived from empirical data.
Experimental Validation:
The memory size formula's accuracy was validated through experiments, with results showing a close match between estimated and actual memory usage, within a tolerable variance. Detailed experiment data can be found here:
Testing done:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.