Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: BigQuery to_pandas() and read_sql_table() don't work #165

Open
aaronsteers opened this issue Apr 3, 2024 · 4 comments
Open

🐛 Bug: BigQuery to_pandas() and read_sql_table() don't work #165

aaronsteers opened this issue Apr 3, 2024 · 4 comments
Labels
accepting pull requests bug Something isn't working good first issue Good for newcomers

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Apr 3, 2024

Discovered when working on another PR:

It appears to_pandas() fails when run against BigQuery.

sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in Engine(bigquery://dataline-integration-testing?credentials_path=%2Fvar%2Ffolders%2Fs2%2Fvn4r87x53fx8v_n79pxyvc_r0000gq%2FT%2Ftmpq6yf1owc) schema 'test_deleteme_c6wj0k': (users)

Repro condition documented in code here:

@aaronsteers aaronsteers added good first issue Good for newcomers accepting pull requests bug Something isn't working labels Apr 16, 2024
@aaronsteers aaronsteers changed the title 🐛 Bug: BigQuery to_pandas() doesn't work 🐛 Bug: BigQuery to_pandas() and read_sql_table() don't work Jul 9, 2024
@aaronsteers
Copy link
Contributor Author

aaronsteers commented Jul 9, 2024

Found the same issue in:

Added read_sql_table() in issue title.

@MinuraPunchihewa
Copy link

Hey @aaronsteers,
I am keen to give this a shot if that is OK.

@aaronsteers
Copy link
Contributor Author

@MinuraPunchihewa - That would be great!

@MinuraPunchihewa
Copy link

Hey @aaronsteers,
I am trying to get up and running with the BigQuery connector via PyAirbyte and this is what my code looks like:

import airbyte as ab
import json

# Create and install the source:
source: ab.Source = ab.get_source("source-bigquery")

# Configure the source:
source.set_config(
    config={
        "project_id": "<MY-PROJECT-ID>",
        "credentials_json": json.dumps({
               "<MY>": "<CREDS>"
         }),
        "dataset_id": "<MY-DATASET-ID>"
    }
)
# Verify the config and creds by running `check`:
source.check()

source.select_streams(["<MY-TABLE>"])

source.read()

The Docker image gets pulled successfully and the connection check succeeds, however, something goes wrong during the sync.

I am not too sure I understand what's happened. This is what I see in my log file:

2024-11-04 01:27:39 - INFO - INFO i.a.i.s.b.BigQuerySource(main):219 starting source: class io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:40 - INFO - INFO i.a.c.i.b.IntegrationCliParser(parseOptions):126 integration args: {spec=null}
2024-11-04 01:27:40 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):132 Running integration: io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:40 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):133 Command: SPEC
2024-11-04 01:27:40 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):134 Integration config: IntegrationConfig{command=SPEC, configPath='null', catalogPath='null', statePath='null'}
2024-11-04 01:27:42 - INFO - INFO i.a.i.s.b.BigQuerySource(main):219 starting source: class io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:42 - INFO - INFO i.a.c.i.b.IntegrationCliParser(parseOptions):126 integration args: {check=null, config=/tmp/tmph2h8sggv.json}
2024-11-04 01:27:42 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):132 Running integration: io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:42 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):133 Command: CHECK
2024-11-04 01:27:42 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):134 Integration config: IntegrationConfig{command=CHECK, configPath='/tmp/tmph2h8sggv.json', catalogPath='null', statePath='null'}
2024-11-04 01:27:42 - INFO - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-11-04 01:27:47 - INFO - INFO i.a.i.s.b.BigQuerySource(lambda$getCheckOperations$0):90 The source passed the basic query test!
2024-11-04 01:27:50 - INFO - INFO i.a.i.s.b.BigQuerySource(lambda$getCheckOperations$1):97 The source passed the Dataset query test!
2024-11-04 01:27:52 - INFO - INFO i.a.i.s.b.BigQuerySource(main):219 starting source: class io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:52 - INFO - INFO i.a.c.i.b.IntegrationCliParser(parseOptions):126 integration args: {discover=null, config=/tmp/tmppvpgfp1m.json}
2024-11-04 01:27:52 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):132 Running integration: io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:52 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):133 Command: DISCOVER
2024-11-04 01:27:52 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):134 Integration config: IntegrationConfig{command=DISCOVER, configPath='/tmp/tmppvpgfp1m.json', catalogPath='null', statePath='null'}
2024-11-04 01:27:52 - INFO - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-11-04 01:27:57 - INFO - INFO i.a.i.s.b.BigQuerySource(main):219 starting source: class io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:57 - INFO - INFO i.a.c.i.b.IntegrationCliParser(parseOptions):126 integration args: {read=null, catalog=/tmp/tmpnuzdiecc.txt, state=/tmp/tmps7a8ayqv.txt, config=/tmp/tmp1citfouj.json}
2024-11-04 01:27:57 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):132 Running integration: io.airbyte.integrations.source.bigquery.BigQuerySource
2024-11-04 01:27:57 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):133 Command: READ
2024-11-04 01:27:57 - INFO - INFO i.a.c.i.b.IntegrationRunner(runInternal):134 Integration config: IntegrationConfig{command=READ, configPath='/tmp/tmp1citfouj.json', catalogPath='/tmp/tmpnuzdiecc.txt', statePath='/tmp/tmps7a8ayqv.txt'}
2024-11-04 01:27:57 - INFO - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-11-04 01:27:57 - INFO - INFO i.a.c.i.s.r.s.StateManagerFactory(createStateManager):57 Stream state manager selected to manage state object with type STREAM.
2024-11-04 01:27:57 - INFO - INFO i.a.c.i.s.r.s.CursorManager(createCursorInfoForStream):199 Cursor field set in state but not present in catalog. Stream: MDB_DB_HANDLER_TEST_TEST. Original Cursor Field: null. Original value: null. Resetting cursor.
2024-11-04 01:27:59 - INFO - ERROR i.a.c.i.b.AirbyteExceptionHandler(uncaughtException):64 Something went wrong in the connector. See the logs for more details. java.lang.NullPointerException: Cannot invoke "java.util.List.size()" because the return value of "io.airbyte.protocol.models.v0.ConfiguredAirbyteStream.getCursorField()" is null
	at io.airbyte.cdk.db.IncrementalUtils.getCursorField(IncrementalUtils.java:18) ~[airbyte-cdk-core-0.13.2.jar:?]
	at io.airbyte.cdk.db.IncrementalUtils.getCursorFieldOptional(IncrementalUtils.java:29) ~[airbyte-cdk-core-0.13.2.jar:?]
	at io.airbyte.cdk.integrations.source.relationaldb.AbstractDbSource.validateCursorFieldForIncrementalTables(AbstractDbSource.java:210) ~[airbyte-cdk-db-sources-0.13.2.jar:?]
	at io.airbyte.cdk.integrations.source.relationaldb.AbstractDbSource.read(AbstractDbSource.java:167) ~[airbyte-cdk-db-sources-0.13.2.jar:?]
	at io.airbyte.cdk.integrations.base.IntegrationRunner.readSerial(IntegrationRunner.java:275) ~[airbyte-cdk-core-0.13.2.jar:?]
	at io.airbyte.cdk.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:173) ~[airbyte-cdk-core-0.13.2.jar:?]
	at io.airbyte.cdk.integrations.base.IntegrationRunner.run(IntegrationRunner.java:125) ~[airbyte-cdk-core-0.13.2.jar:?]
	at io.airbyte.integrations.source.bigquery.BigQuerySource.main(BigQuerySource.java:220) ~[io.airbyte.airbyte-integrations.connectors-source-bigquery.jar:?]

2024-11-04 01:27:59 - ERROR - Something went wrong in the connector. See the logs for more details.
2024-11-04 01:27:59 - INFO - {"type":"TRACE","trace":{"type":"ERROR","emitted_at":1.730663879962E12,"error":{"message":"Something went wrong in the connector. See the logs for more details.","internal_message":"java.lang.NullPointerException: Cannot invoke \"java.util.List.size()\" because the return value of \"io.airbyte.protocol.models.v0.ConfiguredAirbyteStream.getCursorField()\" is null","stack_trace":"java.lang.NullPointerException: Cannot invoke \"java.util.List.size()\" because the return value of \"io.airbyte.protocol.models.v0.ConfiguredAirbyteStream.getCursorField()\" is null\n\tat io.airbyte.cdk.db.IncrementalUtils.getCursorField(IncrementalUtils.java:18)\n\tat io.airbyte.cdk.db.IncrementalUtils.getCursorFieldOptional(IncrementalUtils.java:29)\n\tat io.airbyte.cdk.integrations.source.relationaldb.AbstractDbSource.validateCursorFieldForIncrementalTables(AbstractDbSource.java:210)\n\tat io.airbyte.cdk.integrations.source.relationaldb.AbstractDbSource.read(AbstractDbSource.java:167)\n\tat io.airbyte.cdk.integrations.base.IntegrationRunner.readSerial(IntegrationRunner.java:275)\n\tat io.airbyte.cdk.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:173)\n\tat io.airbyte.cdk.integrations.base.IntegrationRunner.run(IntegrationRunner.java:125)\n\tat io.airbyte.integrations.source.bigquery.BigQuerySource.main(BigQuerySource.java:220)\n","failure_type":"system_error"}}}

Do you have any idea what might have gone wrong here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepting pull requests bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants