-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poll .GetBatch()
instead of using operation.result()
#841
Conversation
|
.GetBatch()
instead of using operation.result()
@wazi55 we still need a changelog entry on this. here's how to add a changelog entry |
operation = self.job_client.create_batch(request=request) # type: ignore | ||
# this takes quite a while, waiting on GCP response to resolve | ||
# (not a google-api-core issue, more likely a dataproc serverless issue) | ||
response = operation.result(polling=self.result_polling_policy) | ||
|
||
state = "PENDING" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the linting checks are failing with the following message
python_submissions.py:127:9
:F841
local variable'operation'
is assigned to but never used
I suggest either:
- removing it, or
- (preferred) use it to set the initial
state
variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the easier path 🔢
…last 2 seconds sleep if the response is in one of the 3 options
|
||
state = "State.PENDING" | ||
while state not in ["State.SUCCEEDED", "State.FAILED", "State.CANCELLED"]: | ||
time.sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colin-rogers-dbt i'm a polling filthy casual, is there a more canonical way for polling besides .sleep(2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, we need to have some sort of polling interval
response = operation.result(polling=self.result_polling_policy) | ||
|
||
state = "State.PENDING" | ||
while state not in ["State.SUCCEEDED", "State.FAILED", "State.CANCELLED"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this is an enum defined some where in the bigquery python client. We should rely on that and not hardcoded strings.
Thanks for getting this started I'm going to take this one over to address tests/missing functionality as part of #929 |
resolves: #734
docs dbt-labs/docs.getdbt.com/#
Problem
Serverless Spark polling within the dbt-bigquery adapter spends an extra 1.5 minute waiting for cluster to be torn down instead of just waiting for the job to finish.
Solution
Updated operation.result to getBatch instead. added a randomly generated batch_id to allow dataproc to reference getBatch operation from the batch_id used for createBatch.
Checklist