Skip to content

Commit

Permalink
Cast types on read
Browse files Browse the repository at this point in the history
`COPY FROM parquet` is too strict when matching Postgres tupledesc schema to the parquet file schema.
e.g. `INT32` type in the parquet schema cannot be read into a Postgres column with `int64` type.
We can avoid this situation by casting arrow array to the array that is expected by the tupledesc
schema, if the cast is possible. We can make use of `arrow-cast` crate, which is in the same project
with `arrow`. Its public api lets us check if a cast possible between 2 arrow types and perform the cast.

With that we can cast between all allowed arrow types. Some of the examples:
- INT16 => INT32
- UINT32 => INT64
- FLOAT32 => FLOAT64
- LargeUtf8 => UTF8
- LargeBinary => Binary
- Array, and Map with castable fields, e.g. [UINT16] => [INT64]

**Considerations**
- Struct fields are matched by position if a cast applies to it by arrow-cast. This is different than
  how we match table fields by name. This is why we do not allow casting structs yet in this PR.
- Some of the casts are allowed by arrow but they are not allowed by Postgres.
  e.g. INT32 => DATE32 is possible at arrow but not at Postgres. This allows much more flexibility
  to the users but some types can unexpectedly cast to different types.

Closes #67.
  • Loading branch information
aykut-bozkurt committed Nov 14, 2024
1 parent 518a5ac commit 984567b
Show file tree
Hide file tree
Showing 10 changed files with 912 additions and 287 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ pg_test = []

[dependencies]
arrow = {version = "53", default-features = false}
arrow-cast = {version = "53", default-features = false}
arrow-schema = {version = "53", default-features = false}
aws-config = { version = "1.5", default-features = false, features = ["rustls"]}
aws-credential-types = {version = "1.2", default-features = false}
Expand Down
Loading

0 comments on commit 984567b

Please sign in to comment.