Large numbers of HTTP requests for the same JSON-LD context due to lack of caching #292

jakubklimek · 2023-12-01T08:12:44Z

Describe the bug
When parsing a JSON-LD file (JSON-LD to RDF), already loaded external contexts are not cached. In a situation where there is a hierarchy of external contexts referencing each other, and used multiple times in a file, an HTTP request is made every time such a JSON-LD context is required. This results to a large number of identical HTTP requests, flooding the server hosting the contexts, triggering rate limiting, GOAWAY messages, etc., not to mention performance issues.

To Reproduce
Try transforming inzeraty.json to RDF. It contains 36 job advertisements, reusing other JSON-LD contexts.

Expected behavior
Each required context is loaded only once per transformation. Subsequent usages of that context should use the already available copy.
E.g. the JSON-LD playground works this way and transforms the file immediately with 17 HTTP requests for contexts, some of them also duplicate.

Additional context
Originally, the server hosting the contexts, running nginx, had the default keepalive_requests 1000;, which resulted in GOAWAY in approx. 13 seconds of loading the file. I have raised the limit to 10000, which resulted in GOAWAY in 5 minutes. Only after I raised it to 100000, in 13 minutes, the file got processed.

The text was updated successfully, but these errors were encountered:

skodapetr · 2023-12-18T08:25:11Z

It seems the issue is a lack of caching in ActiveContextBuilder line 244.
On this line new document is requested every time without checking the cache.
There are cache mechanisms in JsonLdOptions, yet they are not used here.

I see several ways to tackle this issue:

Introduce a custom HttpLoader with another cache. We may be able to do this outside the
titanium-json-ld code base.
Add caching to ActiveContextBuilder.
2.1) We can leverage existing contextCache as we are working with the context here anyway.
The main issue is that contextCache stores JsonValue instead of JsonStructure.
2.2) We can add a third cache importedDocumentCache (to JsonLdOptions) to the existing contextCache, documentCache.
That would not only allow to use of proper type but also when left empty (null) would not change
existing behaviour.
2.3) We can use documentCache, but I would rather avoid it. Using documentCache
would lead to unnecessary parsing of JSON harming performance. This would be equivalent to 1) but in titanium-json-ld code base.

@filip26 Can I get your thoughts, is there a solution you would prefer?

filip26 · 2023-12-18T12:05:23Z

HTTP cache must be fully handled by provided HttpLoader implementation. e.g. native Java 11 HttpClient. I'm not in favor to start implementing another HTTP cache as a part of the processor.

How to cache parsed and pre-processed context is another thing. In the next major version I would like to see a separate method allowing you to pre-fetch/process a context separatelly.

Something like

var context = JsonLd.getContent(....);
var expanded =  JsonLd.expand(input, context);

So you would be able to re-use a context of a set contexts between calls.

skodapetr · 2023-12-18T13:05:39Z

So that takes 1), and 2.3) out of the question.
From my perspective the difference between 2.1 and 2.2 is backwards compatibility.

As of the

Jsonld.getContext( .... );

That is a neat idea. Yet, I do not see how this would address the issue. I would need to know all contexts used throughout the document tree, which is often not the case.

I see that you do aim on the high-level API (JsonLd object). As I should be able to use re-use cache by sharing the JsonLdOptions between the calls. Is that correct? If so, we are back to 2.1 or 2.2.

filip26#292

filip26 · 2024-01-24T15:18:15Z

@jakubklimek can we close this issue as a shareable cache has been delivered by @skodapetr ?

jakubklimek · 2024-01-24T15:22:31Z

@filip26 I would prefer to close it when the resolution makes its way to a release and we can test it in our tool. But I will understand if you want to close it based on the fact that the corresponding test is passing. We can always raise another issue later.

filip26 · 2024-01-24T15:26:37Z

@jakubklimek great, it's easier link an issue to PR and close it together, to keep it trackable, instead of waiting on a release and then do the closing manually. It's part of 1.4.0-SNAPSHOT now.

Please feel free to re-open or raise a new issue.

jakubklimek mentioned this issue Dec 12, 2023

Better support for JSON-LD Titanium development? apache/jena#2128

Closed

skodapetr added a commit to skodapetr/titanium-json-ld that referenced this issue Dec 29, 2023

Use context cache for imported contexts

e9b3ee1

filip26#292

skodapetr mentioned this issue Dec 29, 2023

Use context cache for imported contexts #304

Merged

filip26 linked a pull request Jan 24, 2024 that will close this issue

Use context cache for imported contexts #304

Merged

filip26 closed this as completed Jan 24, 2024

filip26 added this to the 1.4.0 milestone Jan 24, 2024

filip26 added the enhancement New feature or request label Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large numbers of HTTP requests for the same JSON-LD context due to lack of caching #292

Large numbers of HTTP requests for the same JSON-LD context due to lack of caching #292

jakubklimek commented Dec 1, 2023 •

edited

Loading

skodapetr commented Dec 18, 2023

filip26 commented Dec 18, 2023 •

edited

Loading

skodapetr commented Dec 18, 2023

filip26 commented Jan 24, 2024

jakubklimek commented Jan 24, 2024

filip26 commented Jan 24, 2024

Large numbers of HTTP requests for the same JSON-LD context due to lack of caching #292

Large numbers of HTTP requests for the same JSON-LD context due to lack of caching #292

Comments

jakubklimek commented Dec 1, 2023 • edited Loading

skodapetr commented Dec 18, 2023

filip26 commented Dec 18, 2023 • edited Loading

skodapetr commented Dec 18, 2023

filip26 commented Jan 24, 2024

jakubklimek commented Jan 24, 2024

filip26 commented Jan 24, 2024

jakubklimek commented Dec 1, 2023 •

edited

Loading

filip26 commented Dec 18, 2023 •

edited

Loading