Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks and performance improvements #272

Closed
wants to merge 31 commits into from

Conversation

hmottestad
Copy link

Currently work-in-progress!

@hmottestad
Copy link
Author

@filip26 I'm working on some performance improvements and I want to share my current progress with you.

Comment on lines +54 to +59
@Benchmark
public int toRdfApiGet() throws JsonLdError {
RdfDataset rdfDataset = new ToRdfApi(document).get();

return rdfDataset.size();
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started by creating a benchmark for converting JSON-LD to RDF. The file that I am using is from https://data.gov.be/en and is licensed very permissively. It contains DCAT data about public Belgium datasets. It doesn't contain the datasets, just metadata.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. I've excluded the IO and parsing since these are out of the control of Titanium JSON-LD. When I profiled the parsing side I saw that pretty much all of it is done by the Jakarta JSON library. It's also fairly fast in comparison to the RDF conversion (approximately 1/10th of the total time).

Comment on lines 3 to 81
## Original results
```
Benchmark Mode Cnt Score Error Units
LoadingBenchmark.toRdfApiGet avgt 5 5859.114 ± 522.500 ms/op
```

## Current results
```
Benchmark Mode Cnt Score Error Units
LoadingBenchmark.toRdfApiGet avgt 5 3363.833 ± 448.450 ms/op
```
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I'm at close to a 2x performance improvement.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvement is great! I'm afraid that without a reference computing environment (cpu, mem, etc) the numbers are not helpful to some who has just visited the project page. Please note that after releasing this PR the table for Previous version becomes obsoleted.

Not sure if is that doable, but would be great to somehow use github actions and virtual machines (as a reference environment) to generate such report for each PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the results depend heavily on what computer you're using. The file is nice for tracking performance of this branch, but I'll delete it before merging.

I would love to have a github action or similar to run the benchmarks with a before/after for each PR. I've not found anything that does this yet.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, there is no need to run an action before, we can keep benchmark results for the main branch and use it for comparison. I've not found anything easy to use yet as well.

Comment on lines 121 to 126
} else if (absoluteIris.contains(subject)) {
rdfSubject = Rdf.createIRI(subject);
} else if (UriUtils.isAbsoluteUri(subject, uriValidation)) {
absoluteIris.add(subject);
rdfSubject = Rdf.createIRI(subject);

} else {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating the IRI using the built in Java IRI implementation is really costly, which is why I've added a simple cache so that we don't need to validate the same IRI multiple times.

Comment on lines +92 to +101
var indexGraphName = index.get(graphName);

if (indexGraphName != null) {
var indexSubject = indexGraphName.get(subject);
if (indexSubject != null) {
return indexSubject.containsKey(property);
}
}
return false;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code called both containsKey(...) and get(...), new code only calls get(...) and then does a null check instead. It's a very minor performance improvement.

Comment on lines 108 to 123
// 3.
if (elementObject.containsKey(Keywords.TYPE)) {
handle3(elementObject);
}

if (JsonUtils.isObject(item)) {
itemValue = item.asJsonObject();
// 4.
if (elementObject.containsKey(Keywords.VALUE)) {
handle4(elementObject);
}

} else if (JsonUtils.isArray(item)) {
itemValue = item.asJsonArray();
// 5.
else if (elementObject.containsKey(Keywords.LIST)) {
handle5(elementObject);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first thing I've done here is to split the code into multiple methods so it's easier to see what code is taking the most time to execute.

if (activePropertyValue.isEmpty()) {
build = JsonProvider.instance().createArrayBuilder(List.of(reference)).build();
} else {
build = JsonProvider.instance().createArrayBuilder(activePropertyValue).add(reference).build();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently the most resource intensive part of the code. I'm not sure what to do about it since it's more of an architectural issue.

Essentially the problem is that every time we add the reference object we need to create a copy of the JsonArray object.

This code allocates 500 MB worth of JsonArray objects when converting the benchmark file containing 600 000 triples. This is 1/3 of all the allocations.

elementObject.remove(Keywords.REVERSE);
for (int i = 0, activePropertyValueSize = activePropertyValue.size(); i < activePropertyValueSize; i++) {
JsonValue e = activePropertyValue.get(i);
if (referenceHashCode == e.hashCode()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When checking that the reference object is not in the activePropertValue JsonArray the majority of the time was spent in the .equals method. This was taking up a significant amount of the total time.

Since the hashCode is cached within each object, checking if the hashCode is the same dramatically reduces the time it takes to check for the existence of the reference object.

I also noticed that the reference object is never present when converting the benchmark file. So I presume that this code is meant as some kind of edge case handling.

@filip26
Copy link
Owner

filip26 commented Jul 31, 2023

Hi @hmottestad
thank you for contributing to titanium-json-ld and congratulations to the performance improvement, that's huge! Please give me a few days to find time to go through it.

General notes on code style

  • please use Java code style conventions 
  • avoid use of cryptic names and magic numbers e.g. handle5, more descriptive is better to keep the code readable and maintainable 
  • whitespaces, do not use tabs, avoid trailing spaces, and use empty lines wisely to help with code readability 

(the current code base is not perfect, I hope it gets fixed incrementally with new PRs)

Regarding  data for benchmarks, are there any other public sets we could use? Could we use existing test data to get other inputs to benchmark?

I'm looking forward to working with you to get this PR merged.

@hmottestad
Copy link
Author

Hi @hmottestad thank you for contributing to titanium-json-ld and congratulations to the performance improvement, that's huge! Please give me a few days to find time to go through it.

General notes on code style

  • please use Java code style conventions
  • avoid use of cryptic names and magic numbers e.g. handle5, more descriptive is better to keep the code readable and maintainable
  • whitespaces, do not use tabs, avoid trailing spaces, and use empty lines wisely to help with code readability

(the current code base is not perfect, I hope it gets fixed incrementally with new PRs)

Regarding  data for benchmarks, are there any other public sets we could use? Could we use existing test data to get other inputs to benchmark?

I'm looking forward to working with you to get this PR merged.

:)

The magic numbers are based on the numbering in the comments in the code. I'm not sure what the numbers actually refer to, but I get a feeling that it might be numbering from a spec of some kind. Do you know what the numbers refer to?

I'm not sure about what code style you are using, but I'll try to match my editor as closely as a I can. How many spaces do you use for an indentation? Any other code style issues that I should keep in mind? Maybe you have a max line width?

I saw that you have a static code analyzer, I'll fix up any issues it mentions.

@hmottestad
Copy link
Author

As for test data I don't really have any other suggestions. The existing files are quite small, which would give different optimisation priorities. It's still a good idea to optimise for small files too, especially since an important use case for JSON-LD is to embed data in webpages using schema.org.

Would be nice with a test file that contains a lot of different graphs.

We can also make different versions of the test data by expanding, compacting, flattening and framing the file.

I would like to make more benchmarks to cover expanding, compacting, flattening and framing.

@filip26
Copy link
Owner

filip26 commented Jul 31, 2023

Hi @hmottestad thank you for contributing to titanium-json-ld and congratulations to the performance improvement, that's huge! Please give me a few days to find time to go through it.
General notes on code style

  • please use Java code style conventions
  • avoid use of cryptic names and magic numbers e.g. handle5, more descriptive is better to keep the code readable and maintainable
  • whitespaces, do not use tabs, avoid trailing spaces, and use empty lines wisely to help with code readability

(the current code base is not perfect, I hope it gets fixed incrementally with new PRs)
Regarding  data for benchmarks, are there any other public sets we could use? Could we use existing test data to get other inputs to benchmark?
I'm looking forward to working with you to get this PR merged.

:)

The magic numbers are based on the numbering in the comments in the code. I'm not sure what the numbers actually refer to, but I get a feeling that it might be numbering from a spec of some kind. Do you know what the numbers refer to?

I'm not sure about what code style you are using, but I'll try to match my editor as closely as a I can. How many spaces do you use for an indentation? Any other code style issues that I should keep in mind? Maybe you have a max line width?

I saw that you have a static code analyzer, I'll fix up any issues it mentions.

The numbers you can see match steps in algorithms describe by JSON-LD spec just when the code is splinted into pieces the numbers make it harder to understand and descriptive names are better. As I said, the current code base is far to perfect, let's hope it gets improved incrementally by PRs. Regarding code style, I've noticed Codacy complaining about underscores in names, which is more C/C++ thing.

Google or Oracle code style for Java is a good source, there is no need to follow every recommendation but also there is no need to violate anything what's considered a common practice, e.g. naming conventions.

@hmottestad
Copy link
Author

I'll probably go with inlining the code back into the big method when I'm done with the performance improvements. That way I don't need to figure out names :P

@hmottestad hmottestad force-pushed the benchmark-and-performance branch 2 times, most recently from a97b9e5 to fc4cc58 Compare August 1, 2023 20:38
Comment on lines 1 to 109
## Original results
```
Benchmark Mode Cnt Score Error Units
LoadingBenchmark.toRdfApiGet avgt 5 5859.114 ± 522.500 ms/op
```

```
Benchmark Mode Cnt Score Error Units
BasicProcessingAlgorithmsBenchmark.compact avgt 5 1896.694 ± 81.070 ms/op
BasicProcessingAlgorithmsBenchmark.expand avgt 5 673.813 ± 9.570 ms/op
BasicProcessingAlgorithmsBenchmark.flatten avgt 5 3586.524 ± 124.667 ms/op

```


#### OOMBenchmark.toRdfApiGet
```
Iteration 1: 7466.977 ms/op
Iteration 2: 6891.456 ms/op
Iteration 3: Terminating due to java.lang.OutOfMemoryError: Java heap space
```


## Current results
```
Benchmark Mode Cnt Score Error Units
LoadingBenchmark.toRdfApiGet avgt 5 1998.111 ± 135.659 ms/op
```

```
Benchmark Mode Cnt Score Error Units
BasicProcessingAlgorithmsBenchmark.compact avgt 5 1541.122 ± 33.854 ms/op
BasicProcessingAlgorithmsBenchmark.expand avgt 5 376.108 ± 17.546 ms/op
BasicProcessingAlgorithmsBenchmark.flatten avgt 5 816.228 ± 28.191 ms/op
```

#### OOMBenchmark.toRdfApiGet
```
Iteration 1: 2420.124 ms/op
Iteration 2: 1260.448 ms/op
Iteration 3: 1204.927 ms/op
Iteration 4: 1191.012 ms/op
Iteration 5: 1193.424 ms/op
Iteration 6: 1209.333 ms/op
Iteration 7: Terminating due to java.lang.OutOfMemoryError: Java heap space
```
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonArray is immutable. I've made a custom implementation that is not immutable and it really cuts down on array creation and copying. As far as I can tell it's fine with the JSON-LD algorithm, but it might be problematic for other parts of the JSON processing code.

JSON-LD to RDF now takes 2 seconds instead of ~6 second. So a 3x improvement.

Memory pressure is also heavily reduced. With Xmx16GB we can now run 6 iterations instead of just 2, before running out of memory.

Expand and flatten are also considerably faster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it might be a good idea to have a dedicated internal representation of JSON-LD instead of relying on Jakarta JSON. We could convert back to the Jakarta JSON representation at the end, once we know that all objects and lists are final.

Copy link
Owner

@filip26 filip26 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it might be a good idea to have a dedicated internal representation of JSON-LD instead of relying on Jakarta JSON. We could convert back to the Jakarta JSON representation at the end, once we know that all objects and lists are final.

Yes, it would be great to have an internal API to process JSON. It would be better to not mix too much in this PR and have separate PR, or a special branch, where we can work on that.

  • just replicating jakarta.json interfaces, or having mutable implementation, is not an option
  • other popular JSON parsers should be supported, the code should be prepared for that, e.g. Jackson, Gson. See issue Jackson JSONP Provider #111
  • we need an API to read and write JSON
    • creating new objects is costly, (could be compensated by having a pool, but ...)
    • I was experimenting with something I call JSON cursor API, but the main disadvantage is debugging, which is painful
  • this should be the main feature of v2.x line

Copy link
Owner

@filip26 filip26 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonArray is immutable. I've made a custom implementation that is not immutable and it really cuts down on array creation and copying. As far as I can tell it's fine with the JSON-LD algorithm, but it might be problematic for other parts of the JSON processing code.

JSON-LD to RDF now takes 2 seconds instead of ~6 second. So a 3x improvement.

Memory pressure is also heavily reduced. With Xmx16GB we can now run 6 iterations instead of just 2, before running out of memory.

Expand and flatten are also considerably faster.

I'm afraid this change is premature and should be done as a part of the internal JSON processor. There are many libraries, and applications in production, using Titanium as a core component. The libs might be using glassfish provider or not. Mixing a custom implementation, that is later processed by a provider we are not aware of, could be a source of hard to debug/fix run-time issues. Please let's not do this. Possible options are:

  • re-write the algorithm to work better with immutable arrays
  • extract part of the algorithm, if possible, and use List - not sure it's easily doable, could be as costly as having barely new internal JSON processor

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a proof of concept to show that there is a lot to be gained from allowing mutability. That's why I added a comment to the top of the MutableJsonArray to explain that it's not the way to go.

JsonArray is immutable by definition so there's not much we can do about the JSON processors.

The algorithm doesn't seem to require an immutable list, it actually uses the term "append".

Screenshot 2023-08-02 at 08 53 35

I think it might be worth while trying to use an ArrayList as an intermediary, then convert it back to a JsonArray once we are done. Users would still only see the JsonArray.

@hmottestad
Copy link
Author

Btw. I have a bit of a vested interest in Titanium JSON-LD. I'm the co-project lead on Eclipse RDF4J and we would like to use Titanium JSON-LD in order to support JSON-LD 1.1.

@filip26
Copy link
Owner

filip26 commented Aug 1, 2023

Btw. I have a bit of a vested interest in Titanium JSON-LD. I'm the co-project lead on Eclipse RDF4J and we would like to use Titanium JSON-LD in order to support JSON-LD 1.1.

That's great, I hope you are aware that jakarta JSON and the Glassfish provider are under the Eclipse umbrella as well ;)

int oHashCode = o.hashCode();
if(valueList.size() == 1){
JsonValue jsonValue = valueList.get(0);
if(oHashCode == jsonValue.hashCode()){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that a.hashCode() == b..hashCode() could be true even if those objects are different. an explanation
Our of my curriosity, why is the equals method so slow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The various objects will cache their hash codes after they are calculated. The equals method can't really cache anything. This makes checking if the hash code is equal a lot faster. Two objects with different hash codes should never be equal, but of course there is nothing stopping a hash collision. Hash collisions are fairly rare though, especially since we are not truncating the hash code in any way. They do occur though, which is why I confirm with .equals(...) as a safe guard.

The general check of if the reference element already exists in the array seems to be a way of handling an edge case, since it's never the case while processing the 600 000 test file I'm using.

@@ -63,27 +65,44 @@
<version>${jakarta.json.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
Copy link
Owner

@filip26 filip26 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's hard to have the dependencies needed to benchmark hidden under a separate goal, e.g. something like

mvn compile benchmark

Having a benchmark as  mvn goal would help to  keep track of performance.

Also have benchmark sources and resources separated, e.g.

/src/benchmark/java
/src/benchmark/resources

or something like that. One of the main advantages of Titanium is min. dependencies. 

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're thinking of the guava dependency. That's only needed if we end up with using the bloom filter.

The benchmark dependencies are all scoped to test, which is the easiest way to not include them in the final build.

In RDF4J we've added a step to the maven process that generates the JMH jar files, which makes it possible to run the benchmarks from the command line. I usually just run them from IntelliJ using a JMH plugin. We can also add a main method to each benchmark that runs them, that's helpful for any developers using Netbeans or Eclipse.

@filip26
Copy link
Owner

filip26 commented Aug 2, 2023

@hmottestad please, before you do more changes, let's split this PR into several. The first should introduce benchmark itself that can be executed from a command line. The other ones should focus on one change at the time so it's easy to evaluate the impact. e.g. I've noticed you are removing use Optional and making the code more error prone, harder to read/maintain. It's hard to see the real impact of your changes on the performance itself now.

@hmottestad
Copy link
Author

@hmottestad please, before you do more changes, let's split this PR into several. The first should introduce benchmark itself that can be executed from a command line. The other ones should focus on one change at the time so it's easy to evaluate the impact. e.g. I've noticed you are removing use Optional and making the code more error prone, harder to read/maintain. It's hard to see the real impact of your changes on the performance itself now.

I'm in the profile-optimize-benchmark loop at the moment. I try to document my results by updating the README.md file with the benchmark results for each commit. So a commit contains changes and updated results to show how the changes impact the performance.

There isn't much point splitting this into a bunch of different branches. I usually try to optimize the hot-path, but where the hot-path is changes every time I improve it. So changes won't usually show much performance improvement if there is still a code path that is particularly slow. I would end up with a dozen branches with only one of them showing any real improvements and the others don't look like they are doing anything before some of the other branches are merged.

Lambdas, streams and optional add a decent amount of overhead to the performance. Don't get me wrong, Java has really fast implementations of these and there are a bunch of things that can be made faster by using lambdas, streams and optional. They do often make the code more readable, so it's a trade off between performance and readability. Often a good middle ground is to create descriptive util methods for repetitive code to make up for it being a bit more complicated then using say a stream. It really depends on if they are on the hot-path or not though, so it's not that they can't be used at all.

Once I'm happy with the performance results I'll start cleaning things up and removing changes that don't affect the performance significantly. At the moment this is still work in progress and very much a draft PR. It's good if you can follow along, but keep in mind that performance optimization is a very experimental process and things will be in flux for a while still.

@filip26
Copy link
Owner

filip26 commented Aug 2, 2023

@hmottestad thank you for trying to make performance improvements that could have an impact on processing of large datasets; leaving aside if moving huge amounts of data using JSON-LD as a transportation format is a good practice. Many other libraries use Titanium to process much smaller datasets and it's always a question when to sacrifice code maintainability to save 0.01%.

I appreciate your effort but this is not one man show focusing only on one goal, to get Titanium into RDF4J.

I'm trying to listen to you, please do the same to me. Otherwise I'm afraid it will take a long time for me to get through one huge PR and understand, evaluate the impact of the changes, balancing the possible gains vs losses (maintainability, error prone code)

As I've pointed out earlier, documenting your changes in README is not a good way to share your achievements. Let's do it right and start by setting a transparent process so anyone, not just you or me, can see the real impact on huge but also on small datasets.

There are no miracle easy solutions left, just big challenges that require true commitment and dedication to software engineering.

@hmottestad
Copy link
Author

Screenshot 2023-08-03 at 17 09 29

Here is a performance comparison for the small files benchmark. The benchmark measures the time it takes to convert a Document to RDF. I think the results are really good. Lowest is 1.18x faster and highest is 2x faster.

It may seem a bit insignificant that an operation that took 0.034ms now takes 0.017ms, since in milliseconds it's not a big difference. If you have a computer that loads json-ld files all day long, then you can now load twice as many files per day.

@hmottestad
Copy link
Author

I still want to take a look at the ones with less of an improvement though to see if I can improve the performance a bit for them too.

@hmottestad
Copy link
Author

@filip26 if you want to test them out for yourself then you can checkout commit ec54540 to get the original results and compare them to the head of the branch.

@VladimirAlexiev
Copy link

@hmottestad @filip26 About benchmarks:

I'll try to mobilize some colleagues at Ontotext to help with the benchmarking effort

@hmottestad
Copy link
Author

@VladimirAlexiev that would be great! I've been using https://github.com/umbreak/jsonld-benchmarks a bit and managed to make it a somewhat faster. I don't feel that it's a very representative benchmark since it's based on the W3C compliance suite. I don't really want to optimize for parsing "a/b/c/d/e/../../../../../../../f" URL path fragments.

What I really need are some real world examples so I can optimize for how users would actually be using Titanium JSON-LD.

At the moment I also don't have any real world examples of JSON-LD Framing. I remember talking to someone a few years ago who was using framing to convert JSON-LD into "regular" json to make the frontend devs happy. I think that is probably a typical use case for framing. Do you know if Ontotext has any real-world examples of framing?

@hmottestad
Copy link
Author

I'm now done with what I hope is the last optimization. I implemented a URI validator that validates the more common cases, I delegate to the Java URI class if we're not certain that the URI is either valid or not.

@hmottestad
Copy link
Author

hmottestad commented Aug 13, 2023

@filip26 Here are my next steps:

  • inline all the methods I extracted
  • create PR with only benchmarks so we can have a baseline to start with
  • pick out optimizations into separate PR's so you can review them

Do you guys require a contributor agreement, like what the Eclipse Foundation requires?

@koptan
Copy link

koptan commented Oct 2, 2023

Any progress on this @hmottestad @hmottestad?, we are looking for more performance improvement on this lib ASAP ... thanks

@hmottestad
Copy link
Author

hmottestad commented Oct 2, 2023

Any progress on this @hmottestad @hmottestad?, we are looking for more performance improvement on this lib ASAP ... thanks

@koptan great to hear that you're interested in this work. I've started on the process of merging in the performance improvements by first creating a branch with just the benchmarks. After that I'll start making separate PRs for each of the major performance improvements.

It'll probably take a few months, possibly more. All depending on how much free time I have and how much free time @filip26 has to review my PRs. I would highly recommend sponsoring @filip26 on his github profile to show how important this library is to you.

Feel free to use my branch locally if you need something ASAP.

@filip26
Copy link
Owner

filip26 commented Oct 2, 2023

@koptan Please, this is an open source project supported by volunteers working in their spare time, usually. If you want to speed up things then you have plenty of options, even becoming a sponsor. Thanks.

# Conflicts:
#	pom_parent.xml
#	src/test/java/com/apicatalog/jsonld/benchmark/BasicProcessingAlgorithmsBenchmark.java
#	src/test/java/com/apicatalog/jsonld/benchmark/OOMBenchmark.java
@hmottestad hmottestad closed this Oct 25, 2023
@hmottestad
Copy link
Author

@koptan If you are still interested in these performance improvements then I've integrated them into my fork of Titanium. https://mvnrepository.com/artifact/no.hasmac/hasmac-json-ld

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants