Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] [Regression] support non-literal batch_id config for python models on dataproc #1321

Open
3 tasks done
maxmckittrick opened this issue Aug 16, 2024 · 1 comment
Open
3 tasks done
Labels

Comments

@maxmckittrick
Copy link

maxmckittrick commented Aug 16, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

currently, the default batch ID that's included for python models submitted to dataproc is simply str(uuid.uuid4()), this was last changed with #1020.

this works, and is sufficient to avoid 409 Already exists: Failed to create batch errors from dataproc when attempting to submit batches with duplicate names, but after the test changes included in #1014, attempting to pass any non-literal batch_id in the model config will cause a parsing error, e.g.;

18:19:35  Running with dbt=1.8.5
18:19:36  Registered adapter: bigquery=1.8.2
18:19:36  Unable to do partial parsing because of a version mismatch
18:19:39  Encountered an error:
Parsing Error
  Error when trying to literal_eval an arg to dbt.ref(), dbt.source(), dbt.config() or dbt.config.get()
  malformed node or string on line 49: <ast.Name object at 0x169b599f0>
  https://docs.python.org/3/library/ast.html#ast.literal_eval
  In dbt python model, `dbt.ref`, `dbt.source`, `dbt.config`, `dbt.config.get` function args only support Python literal structures

this makes passing any non-default batch_id more or less impossible, as using a var to assign a dynamic batch ID at runtime will throw an error from literal_eval, and setting a static batch ID will allow a model to run on dataproc only once before throwing a 409 error.

Describe alternatives you've considered

one alternative would be to amend the default_batch_id config to prepend the model name with either a uuid, or with a non-static dbt env var, maybe invocation_id (unsure if this would only work on dbt cloud)? this would avoid the previous errors when using created_at as mentioned in #1006

Who will this benefit?

everyone who wants to see descriptive batch names in dataproc!

Are you interested in contributing this feature?

yes, I'm a regular dbt user but haven't contributed anything here before :)

Anything else?

I've confirmed this is broken in both dbt-core v1.8.5/dbt-bigquery v1.8.2 and dbt-core v1.7.16/dbt-bigquery v1.7.9

@maxmckittrick maxmckittrick added enhancement New feature or request triage labels Aug 16, 2024
@amychen1776 amychen1776 added python Pull requests that update Python code and removed triage labels Aug 28, 2024
@amychen1776
Copy link

amychen1776 commented Aug 28, 2024

@maxmckittrick Thank you for opening up the issue.
What are the use cases for which you use the batch ids? (I assume it's to help you identify the queries?)

@amychen1776 amychen1776 added python_models and removed python Pull requests that update Python code labels Aug 28, 2024
@amychen1776 amychen1776 changed the title [Feature] support non-literal batch_id config for python models on dataproc [Feature] [Regression] support non-literal batch_id config for python models on dataproc Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants