Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Officially support "ndjson" #33

Open
hgschmie opened this issue Jan 11, 2019 · 10 comments
Open

Officially support "ndjson" #33

hgschmie opened this issue Jan 11, 2019 · 10 comments

Comments

@hgschmie
Copy link

While it is possible to massage the object mapper enough to produce newline-delimited json ("ndjson", see http://ndjson.org/), it would be great to have "official" support in Jackson through a module. Reading ndjson into a collection, list or stream would be straightforward as well (e.g. see https://github.com/CjHare/systematic-trading/blob/master/systematic-trading-backtest-output-elastic/src/main/java/com/systematic/trading/backtest/output/elastic/serialize/NdjsonListSerializer.java for a basic example on what a serializer could look like).

@cowtowncoder
Copy link
Member

I may be missing something, but Jackson already supports reading and writing of sequences of white-space separate values, and not just individual value.

Reading most conveniently via ObjectMapper.readerFor(elementType).readValues() which returns MappingIterator (although creating JsonParser, manually calling readValue() also works).

And writing using ObjectMapper.writerFor().writeValues() which returns SequenceWriter.
Separator to use is configurable on per-call basis (default is single space I think, not newline).

There may be room for improved ergonomics of course, with addition of things.

@tommysdk
Copy link

tommysdk commented Apr 5, 2019

Reading newline-delimited json to a list can be achieved using the following piece of code:

MappingIterator<MyClass> iterator = new ObjectMapper().readerFor(MyClass.class).readValues(content);
var list = iterator.readAll();

@pdaulbyscottlogic
Copy link

And writing using ObjectMapper.writerFor().writeValues() which returns SequenceWriter.
Separator to use is configurable on per-call basis (default is single space I think, not newline).

Would you be able to expand on how to do this (this is the first result when searching jackson ndjson )

I've done

ObjectWriter objectWriter = new ObjectMapper().writer(new DefaultPrettyPrinter()); SequenceWriter writer = objectWriter.writeValues(stream); writer.init(true);

and now i have a writer that i can only call write on. which does default comma between each object.

what can i do to change this to a newline?

@cowtowncoder
Copy link
Member

There is something else going on if comma is being added: default would be single space. Comma would be only used within values output as JSON Arrays.

I suspect that you might be assuming that write() expand Java arrays and Collections into separate values: this is not the case. Each call to write() writes single JSON value, so if you want to "unpack" them (write contained elements as newline separated), you need to instead use "writeAll()" (or iterate over values and call "write()".

And finally, to change separator from space to linefeed, call withRootValueSeparator() on ObjectWriter:

objectWriter = objectWriter.withRootValueSeparator("\n");

@acruise
Copy link

acruise commented May 19, 2023

Hey folks,

I have an idiosyncratic ndjson stream where lines can be extremely long (>1MB per line) and a lot of the values are numbers, booleans or nulls, not strings... So I really want to avoid creating Strings as long as possible, and to use JsonParser lazily.

So far so good, I'm parsing the first line/object without creating megabytes of garbage, but I can't see how to get JsonParser to read past the newline and proceed to the next object in the file.

The data is coming from a transient stream, not a file, so mark/reset or pushback might be problematic. Any tips? Thanks!

@cowtowncoder
Copy link
Member

cowtowncoder commented May 19, 2023

@acruise JsonParser itself simply... "works"; you iterate over tokens and it will produce JsonToken stream to read from. It just happens to have a sequence of values.

So I guess I am asking... how are you trying to use it?

@acruise
Copy link

acruise commented May 19, 2023

JsonParser itself simply... "works"; you iterate over tokens and it will produce JsonToken stream to read from. It just happens to have a sequence of values.

So I guess I am asking... how are you trying to use it?

Thanks! What should I expect to see in the token stream for this?

{"foo":1}
{"foo":2}

... an END_OBJECT followed immediately by a START_OBJECT?

@cowtowncoder
Copy link
Member

@acruise Exactly. So,

    JsonToken.START_OBJECT, JsonToken.FIELD_NAME, JsonToken.VALUE_NUMBER_INT, JsonToken.END_OBJECT,
    JsonToken.START_OBJECT, JsonToken.FIELD_NAME, JsonToken.VALUE_NUMBER_INT, JsonToken.END_OBJECT,
    null

@acruise
Copy link

acruise commented May 19, 2023

@cowtowncoder thanks! It turns out I wasn't consuming the expected END_OBJECTs at the end of each line 😅

@cowtowncoder
Copy link
Member

@acruise Ah! Yes, and feeding END_OBJECT to anything else is either going barf or consider it end of things, likely. Glad it turned out to be something simple. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants