You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When concurrent processes attempt to update the same iceberg table and one hits an ICEBERG_COMMIT_ERROR, sometimes the temp_table created by delete_from_iceberg_table fails to get cleaned up. Reading through the source code, I don't see a clear path for how this would be possible, since the code seems to correctly catch the exception and clean up in finally, but nonetheless the temp tables are getting left behind somehow.
I have not been able to create a reliable minimum code sample to replicate this behavior consistently, but in production we occasionally are hitting commit errors and accumulating temp tables in the glue catalog as a result:
Interesting, yes, seems very similar. Makes me wonder if there is a consistency issue with the Glue API that means a delete on an existent table can fail as table not found if it happens too soon after table creation. I can't seem to find anything in the docs specific to that though.
Describe the bug
When concurrent processes attempt to update the same iceberg table and one hits an ICEBERG_COMMIT_ERROR, sometimes the temp_table created by delete_from_iceberg_table fails to get cleaned up. Reading through the source code, I don't see a clear path for how this would be possible, since the code seems to correctly catch the exception and clean up in
finally
, but nonetheless the temp tables are getting left behind somehow.I have not been able to create a reliable minimum code sample to replicate this behavior consistently, but in production we occasionally are hitting commit errors and accumulating temp tables in the glue catalog as a result:
How to Reproduce
Broadly:
Have two processes call to_iceberg on the same catalog table simultaneously, with write params similar to:
If a commit error occurs, sometimes a temp table is left behind.
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Linux
Python version
3.11
AWS SDK for pandas version
3.7.3
Additional context
No response
The text was updated successfully, but these errors were encountered: