🐛 Bug: Failure when subsequent records have fundamentally incompatible schemas #89

aaronsteers · 2024-03-01T21:52:06Z

When subsequent records have incompatible schema, the pa.Table.from_pandas() call will fail:

Example:

----> 3 result = source.read(cache=cache)

9 frames
/usr/local/lib/python3.10/dist-packages/airbyte/source.py in read(self, cache, streams, write_strategy, force_full_refresh)
    592         )
    593         print(f"Started `{self.name}` read operation at {pendulum.now().format('HH:mm:ss')}...")
--> 594         cache.processor.process_airbyte_messages(
    595             self._tally_records(
    596                 self._read(

/usr/local/lib/python3.10/dist-packages/airbyte/_processors/base.py in process_airbyte_messages(self, messages, write_strategy, max_batch_size)
    207         for stream_name, stream_batch in stream_batches.items():
    208             batch_df = pd.DataFrame(stream_batch)
--> 209             record_batch = pa.Table.from_pandas(batch_df)
    210             self._process_batch(stream_name, record_batch)
    211             progress.log_batch_written(stream_name, len(stream_batch))

/usr/local/lib/python3.10/dist-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()

/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
    611 
    612     if nthreads == 1:
--> 613         arrays = [convert_column(c, f)
    614                   for c, f in zip(columns_to_convert, convert_fields)]
    615     else:

/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in <listcomp>(.0)
    611 
    612     if nthreads == 1:
--> 613         arrays = [convert_column(c, f)
    614                   for c, f in zip(columns_to_convert, convert_fields)]
    615     else:

/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in convert_column(col, field)
    598             e.args += ("Conversion failed for column {!s} with type {!s}"
    599                        .format(col.name, col.dtype),)
--> 600             raise e
    601         if not field_nullable and result.null_count > 0:
    602             raise ValueError("Field {} was non-nullable but pandas column "

/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py in convert_column(col, field)
    592 
    593         try:
--> 594             result = pa.array(col, type=type_, from_pandas=True, safe=safe)
    595         except (pa.ArrowInvalid,
    596                 pa.ArrowNotImplementedError,

/usr/local/lib/python3.10/dist-packages/pyarrow/array.pxi in pyarrow.lib.array()

/usr/local/lib/python3.10/dist-packages/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()

/usr/local/lib/python3.10/dist-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: ("Could not convert 'false' with type str: tried to convert to boolean", 'Conversion failed for column attributes with type object')

Reported in Slack:

https://airbytehq.slack.com/archives/C06FZ238P8W/p1709053334428409?thread_ts=1708526473.508759&cid=C06FZ238P8W

The text was updated successfully, but these errors were encountered:

aaronsteers · 2024-03-05T21:49:17Z

Related issue:

🐛 Bug: Columns with type ["null", "array"] can't be typecasted #101

Hopefully would also be resolved by #67.

aaronsteers mentioned this issue Mar 4, 2024

Fix: Resolve conflicting schema issues by removing dependency on PyArrow #67

Merged

aaronsteers closed this as completed in #67 Mar 7, 2024

aaronsteers self-assigned this Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Bug: Failure when subsequent records have fundamentally incompatible schemas #89

🐛 Bug: Failure when subsequent records have fundamentally incompatible schemas #89

aaronsteers commented Mar 1, 2024 •

edited

Loading

aaronsteers commented Mar 5, 2024

🐛 Bug: Failure when subsequent records have fundamentally incompatible schemas #89

🐛 Bug: Failure when subsequent records have fundamentally incompatible schemas #89

Comments

aaronsteers commented Mar 1, 2024 • edited Loading

aaronsteers commented Mar 5, 2024

aaronsteers commented Mar 1, 2024 •

edited

Loading