We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When subsequent records have incompatible schema, the pa.Table.from_pandas() call will fail:
pa.Table.from_pandas()
Example:
----> 3 result = source.read(cache=cache) 9 frames /usr/local/lib/python3.10/dist-packages/airbyte/source.py in read(self, cache, streams, write_strategy, force_full_refresh) 592 ) 593 print(f"Started `{self.name}` read operation at {pendulum.now().format('HH:mm:ss')}...") --> 594 cache.processor.process_airbyte_messages( 595 self._tally_records( 596 self._read( /usr/local/lib/python3.10/dist-packages/airbyte/_processors/base.py in process_airbyte_messages(self, messages, write_strategy, max_batch_size) 207 for stream_name, stream_batch in stream_batches.items(): 208 batch_df = pd.DataFrame(stream_batch) --> 209 record_batch = pa.Table.from_pandas(batch_df) 210 self._process_batch(stream_name, record_batch) 211 progress.log_batch_written(stream_name, len(stream_batch)) /usr/local/lib/python3.10/dist-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas() /usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) 611 612 if nthreads == 1: --> 613 arrays = [convert_column(c, f) 614 for c, f in zip(columns_to_convert, convert_fields)] 615 else: /usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in <listcomp>(.0) 611 612 if nthreads == 1: --> 613 arrays = [convert_column(c, f) 614 for c, f in zip(columns_to_convert, convert_fields)] 615 else: /usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in convert_column(col, field) 598 e.args += ("Conversion failed for column {!s} with type {!s}" 599 .format(col.name, col.dtype),) --> 600 raise e 601 if not field_nullable and result.null_count > 0: 602 raise ValueError("Field {} was non-nullable but pandas column " /usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in convert_column(col, field) 592 593 try: --> 594 result = pa.array(col, type=type_, from_pandas=True, safe=safe) 595 except (pa.ArrowInvalid, 596 pa.ArrowNotImplementedError, /usr/local/lib/python3.10/dist-packages/pyarrow/array.pxi in pyarrow.lib.array() /usr/local/lib/python3.10/dist-packages/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array() /usr/local/lib/python3.10/dist-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: ("Could not convert 'false' with type str: tried to convert to boolean", 'Conversion failed for column attributes with type object')
Reported in Slack:
https://airbytehq.slack.com/archives/C06FZ238P8W/p1709053334428409?thread_ts=1708526473.508759&cid=C06FZ238P8W
The text was updated successfully, but these errors were encountered:
Related issue:
Hopefully would also be resolved by #67.
Sorry, something went wrong.
aaronsteers
Successfully merging a pull request may close this issue.
When subsequent records have incompatible schema, the
pa.Table.from_pandas()
call will fail:Example:
Reported in Slack:
https://airbytehq.slack.com/archives/C06FZ238P8W/p1709053334428409?thread_ts=1708526473.508759&cid=C06FZ238P8W
The text was updated successfully, but these errors were encountered: