Apache Arrow Flight and Arrow Adbc for data transfer and data loading #36121
taher-cldcvr
started this conversation in
New Connector Request
Replies: 2 comments
-
Any response on this? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Any response on this? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently, Airbyte uses custom airbyte json streams to transfer data from sources and destinations, this causes a huge CPU overhead when translating data from JSON to other formats. Also, Airbyte stream has to carry the schema for the corresponding json records which is an extra data overhead. JSON serialization over the network is very bad.
An efficient way to handle such workloads would be to translate data to Apache Arrow format.
Advantages:
Disadvantages:
Converting Arrow data to textual formats like JSON, CSV or JDBC can cause destinations to become slower due to data translation. Whereas if the format is columnar like Parquet, ORC or iceberg etc then arrow translations are effortless.
I am happy to collaborate and contribute to this approach.
Beta Was this translation helpful? Give feedback.
All reactions