Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TypeRecovery] Re-work for Fixed Iteration Strategy & Using POSSIBLE_TYPES #3733

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

DavidBakerEffendi
Copy link
Collaborator

@DavidBakerEffendi DavidBakerEffendi commented Oct 11, 2023

Benefits

  • Slightly improved performance
  • Fixed context blindness based on compilation unit scope
  • Type recovery stores its intermediate values in POSSIBLE_TYPES in order to avoid clashing with the must-style of DYNAMIC_TYPE_HINT_FULL_NAME

Breaking Changes

  • The iterations are a lot more granular and reflect access paths better, thus the default number of iterations (2) may not be enough, and sometimes 3 or 4 are required. e.g. to get something like <returnValue>.<indexAccess>.<indexAccess>. one needs an iteration count of at least 3.
  • One will no longer be able to specify the compilation unit node, it will always be File and the pass will iterate through Methods from that file while checking variables captured in by a parent scope.

Main Changes

  • The recovery runs per method and no longer per "compilation unit" specified by the subclass:
    • Starting from the file level, imports are parsed and provide the method-level task a deep copy of a pre-populated symbol table and a fresh diff graph
    • When a task is completely successful, the resulting diff graph is absorbed
    • Each method-level task is submitted to a shared global executor service
  • To achieve this, methods are required to check their parent method(s) for potentially shared symbols, which were, until now, worked around by looking at the whole file as a single scope. This had the unfortunate side effect for being context insensitive
  • Some basic work is in place towards caching the CPG instead of querying for every small piece of info by adding the GraphCache object to the TypeRecoveryState

Next Steps

TODO: Work towards a fully ConcurrentWriterCpgPass solution by expanding upon the GraphCache
TODO: Make use of Declaration nodes in the symbol table to avoid many repeated AST crawls

Closes PR #3551

cc: @ml86 @max-leuthaeuser

@DavidBakerEffendi DavidBakerEffendi added the type recovery Concerns Joern's WIP type recovery label Oct 11, 2023
@DavidBakerEffendi
Copy link
Collaborator Author

There appears to be something flaky about the tests. Will investigate further

@DavidBakerEffendi
Copy link
Collaborator Author

Update: Seems like method-level parallelism isn't causing the flakiness, and rather the file-level one.

@DavidBakerEffendi
Copy link
Collaborator Author

I think I am falling prey to technical debt. It may be good to start reworking this pass in general, visiting on a per-statement basis and keeping things simpler. There is a lot of noise coming from these heuristics, as they are not normalized nor controlled. It would be a good idea to also separate these heuristics and have them callable through some other mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type recovery Concerns Joern's WIP type recovery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants