Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use context cache for imported contexts #304

Merged
merged 14 commits into from
Jan 22, 2024
Merged

Conversation

skodapetr
Copy link
Contributor

Should resolve issue #292, as now the cache is used for imported contexts as well.

In order to work user must set JsonLdOptions.documentCache as it is by default null meaning no context caching is applied at all.

@filip26 filip26 self-requested a review December 30, 2023 15:03
Copy link
Owner

@filip26 filip26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing. HTTP caching MUST be part of underlying HTTP client implementation (preferred) or DocumentLoader implementation. Mixing caching with a processor implementation violates a separation of concerns.  (as has been mentioned here #292 (comment)) FYI: ContextCache is going to be deprecated in the future version. Please let's keep the code clean as much as possible. 

Should resolve issue #292, as now the cache is used for imported contexts as well.

Obviously a test case is missing. SHOULD is not an acceptable resolution.

@skodapetr
Copy link
Contributor Author

Thank you for your kind comment pointing out opportunities for improving the pull request.

It seems like I've misunderstood our conversation in #292. based on the conversation I thought that you prefer to avoid caching HTTP requests in the codebase of titanium-json-ld. At the same time, you seemed to be open to the idea of caching higher-level resources like documents and contexts as this is already happening in the codebase.

That is why I've opted for re-using the already existing cache for "imported" contexts as well. The idea was to introduce caching where it was missing in a way compatible with the existing code. In addition, caching on this level prevents unnecessary parsing or JSON documents, leading to better performance. As of the performance, I agree that caching on a higher level would add additional performance benefits, yet it would not be aligned with the existing code.

Reading your comment I've concluded that my understanding so far was wrong and you prefer caching on the level of HTTP requests. Perhaps I was too quick to reject this option as the drawbacks of caching on multiple places, and performance penalty looks to me like good enough reasons for dismissal.

Anyway, back to the pull request. The objective is to cache HTTP requests, where the caching should be implemented in an underlying HttpClient. I'm not sure whether you mean com.apicatalog.jsonld.http.DefaultHttpClient or another (custom) implementation of com.apicatalog.jsonld.http.HttpClient. For the second case, it can be implemented outside titanium-json-ld codebase as users can set custom com.apicatalog.jsonld.http.HttpClient for com.apicatalog.jsonld.loader.HttpLoader.

Can you please comment on the preferred solution?

@hmottestad
Copy link

If you are using the built-in HttpClient in Java 11 then I don't believe that it supports caching. The JEP says that it's only meant to cover 80-90% of application needs and is meant to replace the outdated HttpURLConnection API.

If you are instead using the far more common Apache HttpClient then there is this builder here that should enable caching: https://hc.apache.org/httpcomponents-client-4.5.x/current/httpclient-cache/apidocs/org/apache/http/impl/client/cache/CachingHttpClientBuilder.html

@filip26
Copy link
Owner

filip26 commented Jan 8, 2024

@skodapetr What about having a specialized document loader, a wrapper that acts as a cache? e.g.

JsonLd.expand(...).loader(new LRUDocumentCache(loader, cacheparams))...

This solution has several benefits. It keeps processor code clean of caching, end-users can decide if/how to use it, the cache can be shared between calls, and anyone can implement its own alternative cache. What do you think? Also, check StaticContextLoader for an inspiration.

@hmottestad Titanium does not provide direct support to Apache HTTP Client but uses Java 11 HtttClient and OkHttp (Java 8). Both implementations support caching, it just has to be enabled.

@hmottestad
Copy link

@filip26 I didn't know that Java 11 HttpClient supports caching. Any chance you could point me to some documentation, might be useful in the future?

@skodapetr
Copy link
Contributor Author

skodapetr commented Jan 8, 2024

@filip26 It is not the most performant solution, but it will help keep the code clean and may actually ease removal of caching from the main algorithm. Should I create a new PR or update this one?

EDIT: You mention ability to use custom implementation, should the LRUDocumentCache be part of the PR?

@filip26
Copy link
Owner

filip26 commented Jan 8, 2024

@hmottestad sorry, I was looking into Apache HttpClient documentation by mistake. You are right about Java 11 HttpClient.

@filip26
Copy link
Owner

filip26 commented Jan 8, 2024

@skodapetr please, can you elaborate why you don't think it's a performant solution?

@hmottestad
Copy link

@hmottestad sorry, I was looking into Apache HttpClient documentation by mistake. You are right about Java 11 HttpClient.

That's what I initially stumbled upon too. Would have been nice if they could have named it something a bit more unique to avoid confusion with the Apache HttpClient 😔

@skodapetr
Copy link
Contributor Author

In 5.6.4, 5.2.4, and 5.2.5 we can cache either the input documents (solutions we plan to implement) or results produced by processing the documents - mostly contexts.

Good example is 5.2.4. If there is a cache hit it returns from the algorithm in the "if" part. Using only document level caching, we save request in loadDocument, yet rest of the algorithm is executed every time.

I may need to clarify that this is not an issue of the currently implemented approach, but rather of the intention to remove ContextCache. At the same time I understand the decision to prefer readability and maintainability over performance for some specific cases.

@filip26
Copy link
Owner

filip26 commented Jan 9, 2024

@skodapetr If we both agree that using a separate cache based on DocumentLoader is the way to go, then let's add:

  • a cache implementation - MUST
  • a simple unit test to test only it does what is expected - MUST
  • "how to use a cache" paragraph to README or even a separate document - NICE

Copy link
Owner

@filip26 filip26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just minor stuff. Thank you!

@filip26 filip26 merged commit f4b905b into filip26:main Jan 22, 2024
5 checks passed
@skodapetr skodapetr deleted the feature/292 branch January 22, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large numbers of HTTP requests for the same JSON-LD context due to lack of caching
3 participants