-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON configuration and Jackson Streaming Object Processor #5225
Conversation
This PR is currently lacking a lot of documentation. I know! In addition to reviews, I'm hoping to have some field testing of the APIs. Here's an example script that shows a lot of the primitive functionality: from datetime import datetime
from deephaven.table import Table
from deephaven.json import (
string_,
bool_,
char_,
byte_,
short_,
int_,
long_,
float_,
double_,
instant_,
big_integer_,
big_decimal_,
array_,
tuple_,
object_,
any_,
)
from deephaven.stream import blink_to_append_only
from deephaven.json import JsonValueType
from deephaven.json.table import json_table, source
def table(options: JsonValueType, content: str, multi_value_support: bool = False) -> Table:
return blink_to_append_only(
json_table(options, [source(content=content)], multi_value_support=multi_value_support)
)
def weather() -> Table:
return blink_to_append_only(
json_table(
{
"features": [
{"properties": {"timestamp": datetime, "textDescription": str}}
]
},
[source(url="https://api.weather.gov/stations/KNYC/observations")],
)
)
for i, example in enumerate(
[
table(bool, "true"),
table(int, "42"),
table(float, "42.42"),
table(str, '"foo bar"'),
table(datetime, '"2009-02-13T23:31:30.123456789Z"'),
table((bool, int), "[true, 42]"),
table({"foo": int}, '{"foo": 42}'),
table([int], "[1, 2, 4, 8, null]"),
table(
[{"name": str, "xy": (float, float)}],
'[{"name": "foo", "xy": [1.1, 2.2]}, {"name": "bar", "xy": [-3, 42]}]',
),
table(bool_(), "false"),
table(char_(), '"X"'),
table(byte_(), "42"),
table(short_(), "42"),
table(int_(), "42"),
table(long_(allow_decimal=True), "42.42"),
table(long_(allow_string=True), '"42"'),
table(float_(), "42.42"),
table(double_(), "42.42"),
table(instant_(), '"2009-02-13T23:31:30.123456789Z"'),
table(instant_(number_format="ns"), "1234567890123456789"),
table(big_integer_(), "1234567890123456789999999999999999999999"),
table(
big_decimal_(),
"1234567890123456789999999999999999999999.9876543210000000001234",
),
table(tuple_((bool_(), int_())), "[true, 42]"),
table(object_({"foo": int_(), "bar": long_()}), '{"foo": 42, "bar": 43}'),
table(
array_(object_({"foo": int_(), "bar": long_()})),
'[{"foo": 42, "bar": 43}, {"foo": 44, "bar": 45}]',
),
table(any_(), '[[[[], [[], [], [{"foo": 42}]]]]]'),
table(int_(), "1 2 3 4", multi_value_support=True),
weather(),
]
):
globals()[f"example_{i}"] = example |
Here's a fun one: from deephaven.stream import blink_to_append_only
from deephaven.json import JsonValueType
from deephaven.json.table import json_table, source
deephaven_core_issues = blink_to_append_only(json_table(
[
{
"number": int,
"title": str,
"user": {"login": str},
"labels": [{"name": str}],
"state": str,
"assignee": {"login": str},
"milestone": {"title": str},
"comments": int,
"body": str,
}
],
[
source(url="https://api.github.com/repos/deephaven/deephaven-core/issues?per_page=100&page=1"),
source(url="https://api.github.com/repos/deephaven/deephaven-core/issues?per_page=100&page=2"),
source(url="https://api.github.com/repos/deephaven/deephaven-core/issues?per_page=100&page=3"),
source(url="https://api.github.com/repos/deephaven/deephaven-core/issues?per_page=100&page=4"),
source(url="https://api.github.com/repos/deephaven/deephaven-core/issues?per_page=100&page=5"),
],
)).sort_descending(["number"]) |
This PR adds a declarative JSON configuration object that allows users to specify the schema of a JSON message. It is meant to have good out-of-the-box defaults, while still allowing power users to modify some of the finer parsing details (should this int field be parseable from a string? should null values be allowed? what if a field is missing? etc). The JSON configuration layer is not tied to any specific implementation; it is introspectible, and could have alternative implementations with other parsing backends. (I could imagine a DHE use-case where they do code-generation based on the JSON configuration, somewhat like the DHE avro ObjectProcessor code generator.) Out of the box, there's an ObjectProcessor implementation based on the Jackson streaming APIs; that is, the data flows from byte[]s (or InputStream, relevant for very-large-files) to the output WritableChunks without the need for the intermediating Jackson databind layer (TreeNode). This saves a large layer of allocation that our current kafka json_spec layer relies upon. The ObjectProcessor layer means that this can be used in other places that expose ObjectProcessor layers and want 1-to-1 record-to-row (currently, Kafka). Part of deephaven#5222
caf160f
to
1320749
Compare
extensions/json/src/main/java/io/deephaven/json/ValueOptions.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/AnyOptions.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/CharOptions.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/ValueOptionsRestrictedUniverseBase.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/ValueOptionsSingleValueBase.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/CharOptions.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/StringOptions.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/ObjectFieldOptions.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/ObjectOptions.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
py/server/deephaven/json/jackson.py
Outdated
# todo: should be on the classpath by default, but doesn't have to be | ||
# todo: would be nice if this code could live in the JSON jar | ||
_JProvider = jpy.get_type("io.deephaven.json.jackson.JacksonProvider") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODOs.
@jmao-denver should take a look. he has a pattern for dealing with possibly missing libs that may be appropriate here.
py/server/deephaven/json/jackson.py
Outdated
# todo: not on the classpath by default | ||
# todo: would be nice if this code could live in the BSON jar | ||
_JProvider = jpy.get_type("io.deephaven.bson.jackson.JacksonBsonProvider") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODOs.
@jmao-denver should take a look. he has a pattern for dealing with possibly missing libs that may be appropriate here.
extensions/json/src/main/java/io/deephaven/json/BoolOptions.java
Outdated
Show resolved
Hide resolved
extensions/json/src/main/java/io/deephaven/json/TupleOptions.java
Outdated
Show resolved
Hide resolved
* @return the date-time formatter | ||
*/ | ||
@Default | ||
public DateTimeFormatter dateTimeFormatter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to want to be able to parse "the way DateTimeUtils
does".
*/ | ||
@Immutable | ||
@BuildableStyle | ||
public abstract class LocalDateOptions extends ValueOptionsSingleValueBase<LocalDate> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have LocalTime
? ZonedDateTime
?
*/ | ||
@Immutable | ||
@BuildableStyle | ||
public abstract class ObjectKvOptions extends ValueOptionsRestrictedUniverseBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider whether we can produce a guide like:
https://immutables.github.io/immutable.html
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/Mixin.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/IntMixin.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/IntMixin.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/IntMixin.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/ObjectMixin.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/ObjectMixin.java
Outdated
Show resolved
Hide resolved
A more general question? Should we support JSON Schema?
|
It should be relatively "straightforward" for us to implement a "JSON Schema to json Value" adapter in the future if we find value in it. This was considered for the java side of things, but calling it a "standard" means there are a bunch of completing implementations and different versions, and none of the java versions seemed to fit the bill for what we want to use it for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python changes LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/JacksonProvider.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/JacksonProvider.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/JacksonProvider.java
Outdated
Show resolved
Hide resolved
extensions/json-jackson/src/main/java/io/deephaven/json/jackson/JacksonProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python changes LGTM.
<root level="warn"> | ||
<root level="info"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the consequence of this change? Where do these logs go? Who sees them? What gets lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I didn't mean to commit this. I do think this should be done. There was discussion earlier about missing logs from the embedded server. It doesn't go to the console, only the web logs. I'll re-open this as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#236 |
This PR adds a declarative JSON configuration object that allows users to specify the schema of a JSON message. It is meant to have good out-of-the-box defaults, while still allowing power users to modify some of the finer parsing details (should this int field be parseable from a string? should null values be allowed? what if a field is missing? etc). The JSON configuration layer is not tied to any specific implementation; it is introspectible, and could have alternative implementations with other parsing backends. (I could imagine a DHE use-case where they do code-generation based on the JSON configuration, somewhat like the DHE avro ObjectProcessor code generator.)
Out of the box, there's an ObjectProcessor implementation based on the Jackson streaming APIs; that is, the data flows from
byte[]
s (orInputStream
, relevant for very-large-files) to the outputWritableChunk
s without the need for the intermediating Jackson databind layer (TreeNode
). This saves a large layer of allocation that our current kafka json_spec layer relies upon. The ObjectProcessor layer means that this can be used in other places that expose ObjectProcessor layers and want 1-to-1 record-to-row (currently, Kafka).Part of #5222