Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: round numbers to reduce undeterministic behavior #3740

Merged
merged 3 commits into from
Oct 21, 2024

Conversation

badGarnet
Copy link
Collaborator

@badGarnet badGarnet commented Oct 19, 2024

This PR rounds the floating point number associated with coordinates in pdfminer_processing.py. This helps to eliminate machine precision caused randomness in bounding box overlap detection. Currently the rounding is set to the nearest machine precision for np.float32 using np.finfo(float), which yields resolution = 1e-15.

future work

We should reduce the rounding to only 6 digits after floating point since the data type float32 has a resolution of only 1e-6. However it would break tests. A followup is required to tune the threshold values in pdfminer_processing.py so that it works with 1e-6 resolution.

@badGarnet badGarnet marked this pull request as ready for review October 20, 2024 21:46
Copy link
Contributor

@pawel-kmiecik pawel-kmiecik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@badGarnet badGarnet added this pull request to the merge queue Oct 21, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 21, 2024
@scanny scanny added this pull request to the merge queue Oct 21, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Oct 21, 2024
@scanny scanny force-pushed the feat/round-floating-point-number-before-computation branch from 826186b to 669d717 Compare October 21, 2024 17:34
@scanny scanny enabled auto-merge October 21, 2024 17:35
@scanny scanny added this pull request to the merge queue Oct 21, 2024
Merged via the queue into main with commit e764bc5 Oct 21, 2024
41 checks passed
@scanny scanny deleted the feat/round-floating-point-number-before-computation branch October 21, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants