-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add: series migration to be unique #3417
Conversation
Well, looks like I missed something. I need to step away for a bit but can try reviewing the error later tonight. |
Oh, right. It may be failing because it depends on #3416? The copy to |
This might be a separate issue. I see the integration test is failing as well due to some pkg runtime failure. I'll investigate. |
OK, try patching the change I made in 21c77dc. This will fix the integration test failure. |
// Update all BookSeries records for this series to point to the most recent series | ||
const [seriesUpdated] = await queryInterface.sequelize.query( | ||
` | ||
UPDATE BookSeries | ||
SET seriesId = :mostRecentSeriesId | ||
WHERE seriesId IN ( | ||
SELECT id | ||
FROM Series | ||
WHERE name = :name AND libraryId = :libraryId | ||
AND id != :mostRecentSeriesId | ||
) | ||
`, | ||
{ | ||
replacements: { | ||
name: duplicate.name, | ||
libraryId: duplicate.libraryId, | ||
mostRecentSeriesId: mostRecentSeries.id | ||
} | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's an issue here that the test doesn't uncover because the bookSeries table you create there is not set-up correctly.
If you examine an actual ABS database, you will see that the bookSeries table has a unique contraint on [bookId, seriesId].
This means that if some book with id book1
points to two series duplicates with ids seriesDup1
and seriesDup2
, then before this query there are two bookSeries records [book1, seriesDup1]
and [book1, seriesDup2]
, but after this query there will be [book1, seriesDup2]
and [book1, seriesDup2]
, which violates the unique constraint and will throw an exception.
I don't know if such a situation occurs in user databases, but we have to assume it does.
Your test won't uncover this (even if it has the right test) since it creates the bookSeries table without the unique constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing the unique constraint on [bookId, seriesId]
CREATE TABLE bookSeries (
id UUID,
sequence VARCHAR (255),
bookId UUID,
seriesId UUID,
createdAt DATETIME,
CONSTRAINT BOOKSERIES_PK PRIMARY KEY (
id
),
CONSTRAINT FK_bookSeries_books FOREIGN KEY (
bookId
)
REFERENCES books (id) ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT FK_bookSeries_series_2 FOREIGN KEY (
seriesId
)
REFERENCES series (id) ON DELETE CASCADE
ON UPDATE CASCADE
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I was basing my tables off of. If you look at the actual SQLite database, you can see the table is created as below (the unique constraint at the very end of the line)
CREATE TABLE `bookSeries` (`id` UUID PRIMARY KEY, `sequence` VARCHAR(255), `bookId` UUID REFERENCES `books` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, `seriesId` UUID REFERENCES `series` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, `createdAt` DATETIME, UNIQUE (`bookId`, `seriesId`))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@advplyr where is the statement above coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably here?
audiobookshelf/server/models/BookSeries.js
Lines 32 to 66 in b35fabb
static init(sequelize) { | |
super.init( | |
{ | |
id: { | |
type: DataTypes.UUID, | |
defaultValue: DataTypes.UUIDV4, | |
primaryKey: true | |
}, | |
sequence: DataTypes.STRING | |
}, | |
{ | |
sequelize, | |
modelName: 'bookSeries', | |
timestamps: true, | |
updatedAt: false | |
} | |
) | |
// Super Many-to-Many | |
// ref: https://sequelize.org/docs/v6/advanced-association-concepts/advanced-many-to-many/#the-best-of-both-worlds-the-super-many-to-many-relationship | |
const { book, series } = sequelize.models | |
book.belongsToMany(series, { through: BookSeries }) | |
series.belongsToMany(book, { through: BookSeries }) | |
book.hasMany(BookSeries, { | |
onDelete: 'CASCADE' | |
}) | |
BookSeries.belongsTo(book) | |
series.hasMany(BookSeries, { | |
onDelete: 'CASCADE' | |
}) | |
BookSeries.belongsTo(series) | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that in itself indicates that databases started at different points in time are not consistent (sigh)...
This requires another migration...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the migration manager support multiple migration scripts for the same release? Or, do I need to change something else in this migration script? I'm a little confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, mutliple migrations scripts in the same version aren't supported.
To be clear, I am not requiring any additional changes in your migration. I was just saying that fact that we have the existing codebase running databases with different schemas is worrying, but wasn't suggesting that we do anything immediate about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how that could have happened since that relationship has always been setup like that since the first release of that table. It could be that I'm the only one with that difference and it happened during the initial development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
A couple of ideas regarding setting up the unit test tables correctly:
|
Thanks, I'll update my tests to include the rest of the indexes/constraints for the two tables. Is how I'm doing the I like the idea of working off of the database created by the server, but I think that will not work as well for migrations. For this migration, I also updated the model in I would love a test database created by the server that is copied and then used by tests as a way to automate API tests or other e2e tests, though. I had tried playing with that shortly after you added the test framework, but never figured it out. |
Oh, I see what you mean, you're saying have several test databases made beforehand (possibly with some test data)? |
This seems like a good idea but I'm not as familiar with building out unit tests like this. That size is negligible |
In response to mikiher's comments about missing the unique constraint for bookId and seriesId, how do we want to handle combining the book into the same series (with different sequence values)? Should I sort the |
No, not necessarily several database, just one for each migration script you write, that is either empty or contains a small amount of pre-migration dummy data (that was added using ABS). Once you have such a database you:
These tests will (maybe) run a little slower than standard tests because of all the file system operations, but I don't think this will be a huge overhead. If you really care about test performance, we can potentially optimize this: instead of using a file-based temp database, you can:
|
IMO you should not concatenate - the squence column in bookSeries is not meant to have multiple values. You should have a (simple) algorithm that decides which one to keep, just like with the series table. |
Regarding testing with a snapshot of an ABS database, there's even an easier way: you can dump a database as a list of SQL statements. So for example, I just dumped an empty ABS database export.sql using SQLiteStudio (or you can do it with sqlite3 CLI). It contains all the sql queries needed to bring the database into its current state (including table creation, value insertion, index creation, etc.) The dump weighs just about 13kb. You can put it in
And voila - you have an in-memory database identical to the original. Again, if the database is large, this can be expensive, but for small empty/near-empty databases, this will likely take very little time. And of course the added value is that it's very simple to implement. |
Of course the test database would need to be copied/created from a pre-migration database (that's the whole point running the tests, isn't it?). I'm not sure I see why that would be an issue, though. |
Yes, thanks for clarifying. I realized what you meant after I sent that message. |
I have added 3 new test cases to address your initial comments about the same book being able to exist in multiple series. Before removing duplicate series in the Should I change how the database is initialized in the tests to be the in memory from an old ABS database before this PR is merged? Thanks for your patience while I figure this out. |
@nichwall I'm still looking at it. I didn't have much time today, will continue tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an SQL programmer would not have written it like this (they would use a fixed number of bulk update/delete queries with inner joins instead of all these loops), but I don't see any additional logic issues.
Other than asking that you test it on a few actual large databases (including ones that have duplicate series), and fixing the little nitpicks above, I approve.
// Update all BookSeries records for this series to point to the most recent series | ||
const [seriesUpdated] = await queryInterface.sequelize.query( | ||
` | ||
UPDATE BookSeries | ||
SET seriesId = :mostRecentSeriesId | ||
WHERE seriesId IN ( | ||
SELECT id | ||
FROM Series | ||
WHERE name = :name AND libraryId = :libraryId | ||
AND id != :mostRecentSeriesId | ||
) | ||
`, | ||
{ | ||
replacements: { | ||
name: duplicate.name, | ||
libraryId: duplicate.libraryId, | ||
mostRecentSeriesId: mostRecentSeries.id | ||
} | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very odd. This is what I get in my DDL tab for bookSeries:
CREATE TABLE bookSeries (
id UUID PRIMARY KEY,
sequence VARCHAR (255),
createdAt DATETIME NOT NULL,
bookId UUID REFERENCES books (id) ON DELETE CASCADE
ON UPDATE CASCADE,
seriesId UUID REFERENCES series (id) ON DELETE CASCADE
ON UPDATE CASCADE,
UNIQUE (
bookId,
seriesId
)
);
Perhaps your database is very old, from before the implicit constraints were introduced? The SQL code looks quite different. In any case, as I mnetioned above, the documentation clearly says that the table is implicitly created with the composite unique constraint becasue of the many-to-many relation betweeb books and series (bookAuthors is similar in that regard).
Thanks for taking the time to review. I'll try and get those comments taken care of this evening. And you're correct, I am not a SQL person, but I figure an inefficient loop will be good enough for a one-time migration. |
This is working great in my testing. I came across one unrelated issue when testing the migration failing. I had the db file open in DBeaver so the file was locked when the logic to restore backup was happening. That was throwing an exception inside the catch block when trying to move the file. I'm not sure if anything can be done there but I wanted to point that out incase either of you had thoughts. Thanks for all the effort put into this first migration. I'm not sure yet if this should be merged before v2.14.0 is ready. Sequelize will create the index automatically without a migration and the migration is idempotent so it won't break anything if it is merged early. |
In response to @mikiher's comment about database migrations from previous versions, I checked out every release from server version 2.3.0 through 2.13.4 (latest at the time of writing), started the server, dumped the database to an SQL file, and then moved on to the next server version. I then ran a diff on all of the databases newly created to see when changes were implemented. Note that this does not include migrations from old databases, this is just how ABS created the database in each of these server versions. There were 7 files with differences, summarized below. It took a bit because work and life got really busy this past week. Source file schema_v2.3.3.sql to schema_v2.3.4.sql had differencesThis is the migration which stopped loading everything from the database into memory.
Source file schema_v2.4.0.sql to schema_v2.4.1.sql had differences
Source file schema_v2.4.1.sql to schema_v2.4.2.sql had differences
Source file schema_v2.4.3.sql to schema_v2.4.4.sql had differences
Source file schema_v2.7.0.sql to schema_v2.7.1.sql had differences
Source file schema_v2.7.2.sql to schema_v2.8.0.sql had differences
Source file schema_v2.10.1.sql to schema_v2.11.0.sql had differences
|
Are you meaning server version 2.14.0 is released, and then merging this PR, and if so that the unique index is included in 2.14.0 without the migration? If so, I don't think we should add the index without the migration, because anyone affected by the duplicate series issue will have errors when updating. I could be mixing things up, though. |
No, we definitely merge this before releasing v2.14.0 |
Thanks! |
This PR resolves #3207 and is a continuation of #3326.
A migration script is added for
2.13.5
, so it will work for both a minor and patch release (not sure what the plans are for the next release). I added some unit tests for this migration, but also tested it manually before writing the unit tests. I also updated theSeries
model to have an index for series name and libraryId, because series can have the same name in different libraries.I am not sure if I did the QueryInterface parts of the migration script correctly, so if y'all can check it that would be great. Basically, I am using raw SQL instead of Sequelize, so I'm not sure if this is the correct way to do things.