Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
`COPY FROM parquet` is too strict when matching Postgres tupledesc schema to the parquet file schema. e.g. `INT32` type in the parquet schema cannot be read into a Postgres column with `int64` type. We can avoid this situation by casting arrow array to the array that is expected by the tupledesc schema, if the cast is possible. We can make use of `arrow-cast` crate, which is in the same project with `arrow`. Its public api lets us check if a cast possible between 2 arrow types and perform the cast. With that we can cast between all allowed arrow types. Some of the examples: - INT16 => INT32 - UINT32 => INT64 - FLOAT32 => FLOAT64 - LargeUtf8 => UTF8 - LargeBinary => Binary - Array, and Map with castable fields, e.g. [UINT16] => [INT64] **Considerations** - Struct fields are matched by position if a cast applies to it by arrow-cast. This is different than how we match table fields by name. This is why we do not allow casting structs yet in this PR. - Some of the casts are allowed by arrow but they are not allowed by Postgres. e.g. INT32 => DATE32 is possible at arrow but not at Postgres. This allows much more flexibility to the users but some types can unexpectedly cast to different types. Closes #67.
- Loading branch information