You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets.
Based on this, it is expected that JSON files with top level (anonymous) arrays should be properly understood and decoded by the Amazon Ion Hive SerDe.
Hi! It is true that Ion is a superset of JSON, but it doesn't follow that JSON Arrays should necessarily be treated as Rows/Structs by the Ion SerDe. I understand why it seems implied, but it's not a given.
We don't have any plans for active development on the Hive SerDe but other ecosystem integrations (namely Trino) are in-flight. In what engine/deployment are you using the Hive SerDe? Trino? AWS Athena? Spark? Something else?
According to JSON standards (RFC 4627, ECMA-404, and RFC 8259), an array is a legal top-level JSON text.
According to the Amazon Ion Hive SerDe documentation:
Based on this, it is expected that JSON files with top level (anonymous) arrays should be properly understood and decoded by the Amazon Ion Hive SerDe.
For example:
[{"a": "b", "b": 123, "c": true}, {"a": "z", "b": 456, "c": false}]
However the Ion Hive SerDe does not properly interpret these files:
Table definition:
However this results in no query results and no input bytes to the execution engine by the SerDe:
In my testing, the OpenX JSON SerDe correctly handles similar data files.
The text was updated successfully, but these errors were encountered: