Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add: series migration to be unique #3417

Merged
merged 15 commits into from
Oct 12, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions server/migrations/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

Please add a record of every database migration that you create to this file. This will help us keep track of changes to the database schema over time.

| Server Version | Migration Script Name | Description |
| -------------- | --------------------- | ----------- |
| | | |
| Server Version | Migration Script Name | Description |
| -------------- | ---------------------------- | ------------------------------------------------- |
| v2.13.5 | v2.13.5-series-column-unique | Series must have unique names in the same library |
200 changes: 200 additions & 0 deletions server/migrations/v2.13.5-series-column-unique.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
const Logger = require('../Logger')
nichwall marked this conversation as resolved.
Show resolved Hide resolved

/**
* @typedef MigrationContext
* @property {import('sequelize').QueryInterface} queryInterface - a suquelize QueryInterface object.
* @property {import('../Logger')} logger - a Logger object.
*
* @typedef MigrationOptions
* @property {MigrationContext} context - an object containing the migration context.
*/

/**
* This upward migration script cleans any duplicate series in the `Series` table and
* adds a unique index on the `name` and `libraryId` columns.
*
* @param {MigrationOptions} options - an object containing the migration context.
* @returns {Promise<void>} - A promise that resolves when the migration is complete.
*/
async function up({ context: { queryInterface, logger } }) {
// Upwards migration script
logger.info('UPGRADE BEGIN: 2.13.5-series-column-unique ')

// The steps taken to deduplicate the series are as follows:
// 1. Find all duplicate series in the `Series` table.
// 2. Iterate over the duplicate series and find all book IDs that are associated with the duplicate series in `bookSeries` table.
// 2.a For each book ID, check if the ID occurs multiple times for the duplicate series.
// 2.b If so, keep only one of the rows that has this bookId and seriesId.
// 3. Update `bookSeries` table to point to the most recent series.
// 4. Delete the older series.

// Use the queryInterface to get the series table and find duplicates in the `name` and `libraryId` column
const [duplicates] = await queryInterface.sequelize.query(`
SELECT name, libraryId, MAX(updatedAt) AS latestUpdatedAt, COUNT(name) AS count
nichwall marked this conversation as resolved.
Show resolved Hide resolved
FROM Series
GROUP BY name, libraryId
HAVING COUNT(name) > 1
`)

// Print out how many duplicates were found
logger.info(`[2.13.5 migration] Found ${duplicates.length} duplicate series`)

// Iterate over each duplicate series
for (const duplicate of duplicates) {
// Report the series name that is being deleted
logger.info(`[2.13.5 migration] Deduplicating series "${duplicate.name}" in library ${duplicate.libraryId}`)

// Determine any duplicate book IDs in the `bookSeries` table for the same series
const [duplicateBookIds] = await queryInterface.sequelize.query(
`
SELECT bookId, COUNT(bookId) AS count
nichwall marked this conversation as resolved.
Show resolved Hide resolved
FROM BookSeries
WHERE seriesId IN (
SELECT id
FROM Series
WHERE name = :name AND libraryId = :libraryId
)
GROUP BY bookId
HAVING COUNT(bookId) > 1
`,
{
replacements: {
name: duplicate.name,
libraryId: duplicate.libraryId
}
}
)

// Iterate over the duplicate book IDs if there is at least one and only keep the first row that has this bookId and seriesId
for (const { bookId } of duplicateBookIds) {
logger.info(`[2.13.5 migration] Deduplicating bookId ${bookId} in series "${duplicate.name}" of library ${duplicate.libraryId}`)
// Get all rows of `BookSeries` table that have the same `bookId` and `seriesId`. Sort by `sequence` with nulls sorted last
const [duplicateBookSeries] = await queryInterface.sequelize.query(
`
SELECT id
FROM BookSeries
WHERE bookId = :bookId
AND seriesId IN (
SELECT id
FROM Series
WHERE name = :name AND libraryId = :libraryId
)
ORDER BY sequence NULLS LAST
`,
{
replacements: {
bookId,
name: duplicate.name,
libraryId: duplicate.libraryId
}
}
)

// remove the first element from the array
duplicateBookSeries.shift()

// Delete the remaining duplicate rows
if (duplicateBookSeries.length > 0) {
const [deletedBookSeries] = await queryInterface.sequelize.query(
`
DELETE FROM BookSeries
WHERE id IN (:ids)
`,
{
replacements: {
ids: duplicateBookSeries.map((row) => row.id)
}
}
)
}
logger.info(`[2.13.5 migration] Finished cleanup of bookId ${bookId} in series "${duplicate.name}" of library ${duplicate.libraryId}`)
}

// Get all the most recent series which matches the `name` and `libraryId`
const [mostRecentSeries] = await queryInterface.sequelize.query(
`
SELECT id
FROM Series
WHERE name = :name AND libraryId = :libraryId
ORDER BY updatedAt DESC
LIMIT 1
`,
{
replacements: {
name: duplicate.name,
libraryId: duplicate.libraryId
},
type: queryInterface.sequelize.QueryTypes.SELECT
}
)

if (mostRecentSeries) {
// Update all BookSeries records for this series to point to the most recent series
const [seriesUpdated] = await queryInterface.sequelize.query(
`
UPDATE BookSeries
SET seriesId = :mostRecentSeriesId
WHERE seriesId IN (
SELECT id
FROM Series
WHERE name = :name AND libraryId = :libraryId
AND id != :mostRecentSeriesId
)
`,
{
replacements: {
name: duplicate.name,
libraryId: duplicate.libraryId,
mostRecentSeriesId: mostRecentSeries.id
}
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's an issue here that the test doesn't uncover because the bookSeries table you create there is not set-up correctly.

If you examine an actual ABS database, you will see that the bookSeries table has a unique contraint on [bookId, seriesId].

This means that if some book with id book1 points to two series duplicates with ids seriesDup1 and seriesDup2, then before this query there are two bookSeries records [book1, seriesDup1] and [book1, seriesDup2], but after this query there will be [book1, seriesDup2] and [book1, seriesDup2], which violates the unique constraint and will throw an exception.

I don't know if such a situation occurs in user databases, but we have to assume it does.

Your test won't uncover this (even if it has the right test) since it creates the bookSeries table without the unique constraint.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the unique constraint on [bookId, seriesId]

CREATE TABLE bookSeries (
    id        UUID,
    sequence  VARCHAR (255),
    bookId    UUID,
    seriesId  UUID,
    createdAt DATETIME,
    CONSTRAINT BOOKSERIES_PK PRIMARY KEY (
        id
    ),
    CONSTRAINT FK_bookSeries_books FOREIGN KEY (
        bookId
    )
    REFERENCES books (id) ON DELETE CASCADE
                          ON UPDATE CASCADE,
    CONSTRAINT FK_bookSeries_series_2 FOREIGN KEY (
        seriesId
    )
    REFERENCES series (id) ON DELETE CASCADE
                           ON UPDATE CASCADE
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I was basing my tables off of. If you look at the actual SQLite database, you can see the table is created as below (the unique constraint at the very end of the line)

CREATE TABLE `bookSeries` (`id` UUID PRIMARY KEY, `sequence` VARCHAR(255), `bookId` UUID REFERENCES `books` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, `seriesId` UUID REFERENCES `series` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, `createdAt` DATETIME, UNIQUE (`bookId`, `seriesId`))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@advplyr where is the statement above coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here?

static init(sequelize) {
super.init(
{
id: {
type: DataTypes.UUID,
defaultValue: DataTypes.UUIDV4,
primaryKey: true
},
sequence: DataTypes.STRING
},
{
sequelize,
modelName: 'bookSeries',
timestamps: true,
updatedAt: false
}
)
// Super Many-to-Many
// ref: https://sequelize.org/docs/v6/advanced-association-concepts/advanced-many-to-many/#the-best-of-both-worlds-the-super-many-to-many-relationship
const { book, series } = sequelize.models
book.belongsToMany(series, { through: BookSeries })
series.belongsToMany(book, { through: BookSeries })
book.hasMany(BookSeries, {
onDelete: 'CASCADE'
})
BookSeries.belongsTo(book)
series.hasMany(BookSeries, {
onDelete: 'CASCADE'
})
BookSeries.belongsTo(series)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that in itself indicates that databases started at different points in time are not consistent (sigh)...
This requires another migration...

Copy link
Contributor Author

@nichwall nichwall Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the migration manager support multiple migration scripts for the same release? Or, do I need to change something else in this migration script? I'm a little confused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, mutliple migrations scripts in the same version aren't supported.

To be clear, I am not requiring any additional changes in your migration. I was just saying that fact that we have the existing codebase running databases with different schemas is worrying, but wasn't suggesting that we do anything immediate about it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how that could have happened since that relationship has always been setup like that since the first release of that table. It could be that I'm the only one with that difference and it happened during the initial development.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks!


// Delete the older series
const seriesDeleted = await queryInterface.sequelize.query(
`
DELETE FROM Series
WHERE name = :name AND libraryId = :libraryId
AND id != :mostRecentSeriesId
`,
{
replacements: {
name: duplicate.name,
libraryId: duplicate.libraryId,
mostRecentSeriesId: mostRecentSeries.id
}
}
)
}
}

logger.info(`[2.13.5 migration] Deduplication complete`)

// Create a unique index based on the name and library ID for the `Series` table
await queryInterface.addIndex('Series', ['name', 'libraryId'], {
unique: true,
name: 'unique_series_name_per_library'
})
logger.info('Added unique index on Series.name and Series.libraryId')

logger.info('UPGRADE END: 2.13.5-series-column-unique ')
}

/**
* This removes the unique index on the `Series` table.
*
* @param {MigrationOptions} options - an object containing the migration context.
* @returns {Promise<void>} - A promise that resolves when the migration is complete.
*/
async function down({ context: { queryInterface, logger } }) {
// Downward migration script
logger.info('DOWNGRADE BEGIN: 2.13.5-series-column-unique ')

// Remove the unique index
await queryInterface.removeIndex('Series', 'unique_series_name_per_library')
logger.info('Removed unique index on Series.name and Series.libraryId')

logger.info('DOWNGRADE END: 2.13.5-series-column-unique ')
}

module.exports = { up, down }
6 changes: 6 additions & 0 deletions server/models/Series.js
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,12 @@ class Series extends Model {
// collate: 'NOCASE'
// }]
// },
{
// unique constraint on name and libraryId
fields: ['name', 'libraryId'],
unique: true,
name: 'unique_series_name_per_library'
},
{
fields: ['libraryId']
}
Expand Down
Loading
Loading