-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure symlinked realpaths are not loaded twice #3189
base: master
Are you sure you want to change the base?
Ensure symlinked realpaths are not loaded twice #3189
Conversation
closes oracle#3138 Use a hash to cache realpath lookups and another to cache realpath -> feature similar to what was done in CRuby: https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L358-L402 https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L1195-L1289
1df367c
to
89e47ef
Compare
Design-wise that change looks pretty hacky to me. |
@@ -140,6 +143,16 @@ def self.find_feature_or_file(feature, use_feature_provided = true) | |||
end | |||
end | |||
|
|||
def self.load_unless_realpath_loaded(feature, path) | |||
# TODO: does this need to be synchronized? | |||
realpath = (@features_realpath_cache[path] ||= File.realpath(path) || path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK File.realpath never returns false|nil, always a String, or an exception if some part does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That did seem to be the case for me but I wasn't sure if it was true on all platforms.
I think I was looking at this C code and it made me want to program defensively (the semantics are probably different, though).
https://github.com/ruby/ruby/blob/7ce4b716bdb5bcfc8b30ffcd034ce7aded1f72b9/load.c#L87-L88
@@ -140,6 +143,16 @@ def self.find_feature_or_file(feature, use_feature_provided = true) | |||
end | |||
end | |||
|
|||
def self.load_unless_realpath_loaded(feature, path) | |||
# TODO: does this need to be synchronized? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes of course, you could use a TruffleRuby::ConcurrentMap
.
But I think this code would need to move to Java anyway, because we cannot have an extra backtrace entry for require
, everything must happen in Primitive.load_feature
(or another Primitive reusing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was wondering if it should be under with_synchronized_features
.
That's a good point, though, might be able to do the whole change in java 🤔
@@ -35,6 +35,9 @@ module FeatureLoader | |||
# A snapshot of $LOADED_FEATURES, to check if the @loaded_features_index cache is up to date. | |||
@loaded_features_version = -1 | |||
|
|||
@features_realpath_cache = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this cache is correct, for instance it seems broken if CWD changes or $LOAD_PATH changes.
Probably that part shouldn't be cached.
You could cache absolute_path->realpath though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, we probably want absolute path as the key.
@@ -292,13 +305,24 @@ def self.get_loaded_features_index | |||
unless @loaded_features_version == $LOADED_FEATURES.version | |||
raise '$LOADED_FEATURES is frozen; cannot append feature' if $LOADED_FEATURES.frozen? | |||
@loaded_features_index.clear | |||
previous_realpaths = @features_realpath_cache.dup | |||
@features_realpath_cache.clear | |||
@loaded_features_realpaths.clear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this cache never needs to be cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was necessary to satisfy the specs, and I saw the CRuby code doing something simliar:
https://github.com/ruby/ruby/blob/7ce4b716bdb5bcfc8b30ffcd034ce7aded1f72b9/load.c#L363-L364
https://github.com/ruby/ruby/blob/7ce4b716bdb5bcfc8b30ffcd034ce7aded1f72b9/load.c#L377-L384
If #3138 is not really an issue in practice I would probably delay this. |
In my testing it seemed that the first time the realpath was stored it was saved to LOADED_FEATURES as whatever was actually requested. I tried to follow what the ruby C code was doing, but it's probably more complicated than what I have so far here.
Might be worth a discussion 👍 |
Action item for me here is I need to file a CRuby issue to suggest using realpath for everything, it would be so much simpler and consistent, and have that discussed in a dev meeting. Also would be useful to get an answer to #3138 (comment), cc @nirvdrum |
closes #3138
Use a hash to cache realpath lookups
and another to cache realpath -> feature.
This is based off of what I was reading in the CRuby code:
https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L358-L402
https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L1195-L1289
but I could easily be wrong about something here.
Let me know if you see problems with the approach or a better way to do something here.
I did have a few questions remaining:
Should the body of the
load_unless_realpath_loaded
method be synchronized? Does it make sense to wrap it inwith_synchronized_features
?There was a commit in CRuby:
ruby/ruby@972f274#:~:text=To%20avoid%20concurrency%20issues%20when%20rebuilding%20the%20loaded%20features%0Aindex%2C%20the%20building%20of%20the%20index%20itself%20is%20left%20alone%2C%20and%0Aafterwards%2C%20a%20separate%20loop%20is%20done%20on%20a%20copy%20of%20the%20loaded%20feature%0Asnapshot%20in%20order%20to%20rebuild%20the%20realpaths%20hash.
Do we not need to worry about that here because it's already being called inside
with_synchronized_features
? Or would that still be a concern?