BulkInsert issue with changing database schema #72

riedd2 · 2022-03-23T13:51:33Z

Hey all,

I'm using BulkInsertAsync in parallel with different / changing table schemas.
I run in to some problems because the framework caches the table schema. I found the following relate issue: #24

Unfortunately, the static InformationSchemaManager.ClearInformationSchemaTable(); workaround does not work in our scenario.

I would like to implement a custom schema cache to prevent / control the schema caching. However, the approach mentioned in the linked ticket is no longer available.

Can you advise me in this scenario?

Thanks for the help.
Kind Regards
David

The text was updated successfully, but these errors were encountered:

JonathanMagnan · 2022-03-24T00:27:22Z

Hello @riedd2 ,

How many schemas do you have?

If you only have 2-3 schema, one way is to create a context that inherits the main context for every schema. So every context/schema is now unique which makes everything easier.

If it doesn't work for you, let me know more about your current scenario and we will look at what we can do.

Best Regards,

Jon

riedd2 · 2022-03-24T06:57:52Z

Hey @JonathanMagnan

Thanks for your reply.

I'm creating tables based on json schemas (we have a lot of them).
These schemas change due to natural schema evolution. We have scenarios where we run bulk operation against different versions of the same schema in parallel (e.g. testing). With the schema being cached this cause issues and I would like to be able to control it in such a scenario.

JonathanMagnan · 2022-03-24T13:55:44Z

Thank you @riedd2 ,

Before we start to see if we can find a solution, we have one last question:

Are you using purely the Bulk Operations library? Or are you using EF6 or EF Core?

riedd2 · 2022-03-24T14:47:00Z

I'm sorry I should have mentioned that from the beginning.
We are using Z.BulkOperations (v 3.0.52) for inserting data and EF Core for reading (queries generated based of the schema).

I hope this answers your question.

JonathanMagnan · 2022-04-05T23:29:39Z

Hello @riedd2 ,

I did a follow-up with my developer today as we are late on this request and we are currently not sure how to handle this scenario. We probably don't fully understand what you mean by you are running in parallel multiple bulk operations on different versions of the schema (do you mean table here or really a schema such as dbo)?

The current schema/table information is "cached" in a ConcurrentDictionary.

If we make this property public:

public class InformationSchemaManager
{
	// will become public
	internal static ConcurrentDictionary<string, Table> MemoryTable = new ConcurrentDictionary<string, Table>();

Will it be enough for you as you will be able to set your own implementation of the ConcurrentDictionary and handle it the way you want?

riedd2 · 2022-04-06T05:36:33Z

Hey @JonathanMagnan,

Thanks for your response.
I'll try make a simplified example:

Scenario 1

We create table "example" in an in-memory database
We run bulk operation (insert) against it
We select the data to verify it (Select * from example)

Scenario 2

We create table "example" (this time with an additional column) in an in-memory database
We run bulk operation (insert) against it
We select the data to verify it (Select * from example)

These scenarios run in parallel on different in-memory databases and (if I understand correctly) BulkOperation will cache the schema of table "example" from whichever scenario runs first. If scenario 1 has run first, the bulk insert on scenario 2 will ignore the additional column, since the schema / table information is cached from scenario 1.

This issue bubbled up in our tests surrounding database (schema) evolution, e.g. testing logic against previous version as well as the current one. But this could also happen in a production scenario.

I think with you proposed solution we should be able to address the issue in our case.

Thanks for the help.

JonathanMagnan · 2022-04-06T13:47:29Z

Oh thank you, now everything makes sense if you use an in-memory database

riedd2 · 2022-04-13T04:55:55Z

Yes, the issue can occur quite easily in the test scenario using in-memory databases.
This is however also something we might encounter in a productive scenario since we are running multiple replicas of the code wich inserts the data. If we now have the schema change it would be nice to be able to reset / control the schema cache for the bulk operation to prevent the mentioned issue.

JonathanMagnan · 2022-04-13T14:31:14Z

Hello @riedd2 ,

Unfortunately, the idea to make the dictionary public has not been accepted.

However, my developer added the option DisableInformationSchemaCache=true

You can disable the cache per operation or globally:

BulkOperationManager.BulkOperationBuilder = builder => { builder.DisableInformationSchemaCache = true; };

So whenever you don't want to use the cache, you can now disable it.

Could this new options work for your scenario?

Best Regards,

Jon

riedd2 · 2022-05-05T05:40:14Z

Hey @JonathanMagnan

Sorry for the late reply.

In general, the proposed solutions should address the issue. Assuming DisableInformationSchemaCache=true will only cause the operation to ignore the cache.

With the option above, we will need to disable the cache in general or at least once the database structure changes for the first time. This means that we will lose the benefit the cache provides in general, not sure how much of an impact this will be.

If we cannot implement / access the cache, would it be possible to have an option to just clear the cache on demand? We could do this once we rebuild the database and let operation start caching again with the new schema.

Thank you for your help.
Cheers
David

JonathanMagnan · 2022-05-05T13:00:25Z

Hello @riedd2 ,

To clear the cache, you have the method InformationSchemaManager.ClearInformationSchemaTable(); but you already said in your first message that it will not work for your scenario.

JonathanMagnan self-assigned this Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BulkInsert issue with changing database schema #72

BulkInsert issue with changing database schema #72

riedd2 commented Mar 23, 2022

JonathanMagnan commented Mar 24, 2022

riedd2 commented Mar 24, 2022

JonathanMagnan commented Mar 24, 2022

riedd2 commented Mar 24, 2022

JonathanMagnan commented Apr 5, 2022

riedd2 commented Apr 6, 2022

JonathanMagnan commented Apr 6, 2022

riedd2 commented Apr 13, 2022

JonathanMagnan commented Apr 13, 2022

riedd2 commented May 5, 2022 •

edited

Loading

JonathanMagnan commented May 5, 2022

BulkInsert issue with changing database schema #72

BulkInsert issue with changing database schema #72

Comments

riedd2 commented Mar 23, 2022

JonathanMagnan commented Mar 24, 2022

riedd2 commented Mar 24, 2022

JonathanMagnan commented Mar 24, 2022

riedd2 commented Mar 24, 2022

JonathanMagnan commented Apr 5, 2022

riedd2 commented Apr 6, 2022

JonathanMagnan commented Apr 6, 2022

riedd2 commented Apr 13, 2022

JonathanMagnan commented Apr 13, 2022

riedd2 commented May 5, 2022 • edited Loading

JonathanMagnan commented May 5, 2022

riedd2 commented May 5, 2022 •

edited

Loading