-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy Parsing #33
Comments
Hi, currently, lazy parsing is not supported. However, I've been thinking about it, and it will likely be the next feature that I add to the library. There are at least two approaches to lazy/limited parsing:
List<Tweet> tweets = parser.parse(bytes);
Iterator<JsonValue> tweets = jsonValue.get("statuses").arrayIterator();
while (tweets.hasNext()) {
JsonValue tweet = tweets.next();
JsonValue user = tweet.get("user");
if (user.get("default_profile").asBoolean()) { // parse the default_profile field here
System.out.println(user.get("screen_name").asString());
}
} I'd like to support both approaches, but I don't know yet which will be implemented first. |
There are already a lot of libraries that parse to POJO well and handle missing fields and tune and optimize the parsing in many different ways, of course, does not mean they beat SIMD, but for me SIMD stands for the ability to really parse a large file/json in a streaming manner, apart from jsoniter, not many libraries can do that and especially this fast as. I think a key note on stream parsing is theres a must to support InputStream since in many frameworks, especially high performance ones, accessing a byte[] might be expensive or even impossible (Netty UnsafeBuffer ex), so that must be taken into account |
Okay, so you're referring to the feature described here: #19, rather than on-demand fields parsing. Nevertheless, both of these features are important, and I have them on the roadmap. However, I don't know yet when they will be delivered. |
Hi @zekronium. Would you mind sharing the code of your benchmark? |
I can invite you to the repository |
OK, so please add me to the repository. |
Invite sent |
Did it help? Have you also noticed, that in my benchmark where I serialize to map manually and build the full structure, how much slower it actually is compared to the same task in fastjson or even jackson? |
It did, although I haven't started working on this yet. Regarding serializing to a map, I'll look into it. |
Hi,
My understanding that the key benefit of SIMD is that we can "progressively" parse a stream of JSON hence the tape reading like implementation.
Although in benchmarks with varying stop points, as in parsing only 20/50/80% of JSON and exiting early, the throughput seems to be almost the same and directly correlating to the size of the json.
If I pre-pad the array like what it does, then I get a more realistic result with varying throughput depending how deep the parsing goes but the general throughput still stays roughly the same. Is there alot of pre-parsing going on?
The bars are different size of json
The text was updated successfully, but these errors were encountered: