-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removal of drop table if exists "no_schema"."$scratch" #149
Comments
Bonjour @Alain-Barrette ,
Can you confirm that you want your model to be located as :
no_schema is only meaningful for root_path or schema configurations, $scratch is the default location for persistent materializations. Are you on S3 ? I can remember long time table deletions issues. Looking forward, |
Hi @fabrice-etanchaud , No problems continuing in french if you like. In the meantime, I am adding the main data engineer that is working with dbt here. Alexandre Cote will be reviewing this to answer you. We may indeed have some bad configuration in this project. Thanks |
Hi @Alain-Barrette and @fabrice-etanchaud , If I understand the situation correctly, this drop table happens when QBC is trying to create views? I see in the relevant create_or_replace_view macro we have a call to drop the table with the same name as the view:
@fabrice-etanchaud do you know the reasoning for why this was added? |
This could probably help. |
Hi @Alain-Barrette , Yes that's what I was suspecting :
no_schema is only meaningful for root_path or schema configurations, As it seems all your models are materialized as views (I know nothin about your custom model config in the properties or sql files), datalake and root_path will not be used at all. Could you please check that the "DeclarationAnnuelleAMFLegacyApplication3JeuxDonneesPolicesEnVigueurFinAnnee" model is in a subdirectory mentioned in your dbt_project.yml or its twin_strategy is ' Hi @ravjotbrar : The Correct code is in the I wish dremio had a shared catalog for views and tables, the database/schema and datalake/root_path double configuration add complexity to the adapter. Maybe is there a simpler way to go ? |
Removing datalake and root_path from the dbt_project.yml file does not seem to avoid the drop table instruction. Note that there is no functional impact from this issue, only a possible optimization. |
I can't explain this |
That's better ! Please, take into account this :
So you shouldn't configure your project's default database to 'no_schema' (unless your space is named like this ?) What is the materialized configuration of your "PlatiniumCrc..." model ? Are you still experiencing long drop times ? |
The drop time depends on the load of dremio. At the moment we are testing with a small number of object. To reduce test time. Will do further testing when this issue is resolved |
Would it be easier to diagnose this by booking a meet between @fabrice-etanchaud and @alexcotecbq |
@fabrice-etanchaud : we can even talk in French if you want :-) |
Why not ?
|
We don't have any documentation on the twin_strategy so no configuration. If you have an URL for this, I'll take it. |
Dremio team wrote a wiki : https://github.com/dremio/dbt-dremio/wiki/Using-Materializations-with-Dremio#optional-twin-strategy-configuration I am currently on my office time. I created dbt-dremio on my spare time three years ago, it's now maintained by dremio officially ! Don't tell my manager I still have a double life :-), |
OK, we were recently looking at the Dremio and DBT websites so I wasn't aware of the option. |
Yes, twin_strategy is not an original dbt configuration, it's specific to dremio. |
Situation resolved with the new configuration tested by @alexcotecbq Which is much better. Thanks to all |
models: |
Yes, many thanks @fabrice-etanchaud and @ravjotbrar !!! |
Glad to hear it worked as expected ! |
Did you try to tell dbt to process more than 4 models at the same time (with regards to your DAG) ? https://docs.getdbt.com/reference/dbt-jinja-functions/target |
testing was done with thread 8 and 16. No performance difference with both level. Will do further testing later when we hit the 1,500 objects mark in our project. |
It seems like we might need to consider removing "clone" as the default twin_strategy option and make it "allow" instead. |
in a standard db engine, views and tables share the same namespace, and in dbt, when a given model changes from table to view, or from view to table, the previous table or view is of course replaced by the new view or table. |
@fabrice-etanchaud : By the way, is it possible to use a sys table with the adapter? No CREATE VDS instruction is passed to Dremio in my case. Here is the code in the .sql file:
And here is the code in the .yml file:
My colleague @Alain-Barrette would like to include administration views in Dremio via DBT. |
Hi @alexcotecbq , great idea ! And what about open sourcing it if you can ! |
Yes I am sure because the path is TravailEquipe.EquipeData and DBT message is:
|
@fabrice-etanchaud : c'est un code 18 finalement. vive le vendredi! |
Merci @alexcotecbq, je ne connaissais pas l'expression ! Bonne "fin de semaine" ! |
Describe the enhancement requested
Deployment of our code take a lot of time. We are currently deploying 1,025 object in 16minutes. Lots of this is due to
/* {"app": "dbt", "dbt_version": "1.4.1", "profile_name": "cbq", "target_name": "qa", "node_id": "model.cbq.DeclarationAnnuelleAMFLegacyApplication3JeuxDonneesPolicesEnVigueurFinAnnee"} */
drop table if exists "no_schema"."$scratch"."DeclarationAnnuelleAMFLegacyApplication3JeuxDonneesPolicesEnVigueurFinAnnee"
This drop take between 1 and 22 seconds here. It depends on the load of the server and the Thread level we are using.
Since there will never be an object name "no_schema"."$scratch".%, I don't see the need for this.
Is it possible to have this removed ?
Justification for this enhancement
Speed increase during deployment.
The text was updated successfully, but these errors were encountered: