Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize system_files table performance #216

Closed
wants to merge 5 commits into from
Closed

Optimize system_files table performance #216

wants to merge 5 commits into from

Conversation

ericp-mrel
Copy link
Contributor

This is in reference to this discussion: #208

This PR will drop the single column indexes and convert them into a composite index for improved performance and convert the attachment_id column to bigint if the database column only contains numeric values.

Feel free to suggest any changes/improvements because this is my first attempt at this type of a change and want this change to be as safe as possible.

@ericp-mrel
Copy link
Contributor Author

Wasn't sure if there was really a way to write tests to be able to seed some test data before running the migration.
If there is, I'd be happy to try and implement some tests to make sure this goes as smoothly as possible.

@mjauvin
Copy link
Member

mjauvin commented Jun 25, 2021

@bennothommo
Copy link
Member

@ericp-mrel does the composite index make that much difference? I was under the impression that there wouldn't be much difference between a composite index, or the separate indexes. I figured that only changing the VARCHAR to an integer column would suffice.

@bennothommo bennothommo added needs response Issues/PRs where a maintainer is awaiting a response from the submitter maintenance PRs that fix bugs, are translation changes or make only minor changes labels Jun 26, 2021
@bennothommo bennothommo added this to the v1.2.0 milestone Jun 26, 2021
@ericp-mrel
Copy link
Contributor Author

@bennothommo In my limited testing it didn't make a ton of difference, but changing VARCHAR to int obviously made the biggest difference. I can remove the index changes if you'd rather not switch to a composite one.

@bennothommo
Copy link
Member

@ericp-mrel let's do that, if you don't mind. Just keeps the potential points of failure to a minimum.

@bennothommo bennothommo added Status: Revision Needed and removed needs response Issues/PRs where a maintainer is awaiting a response from the submitter labels Jun 27, 2021
@LukeTowers
Copy link
Member

@bennothommo I'm fine with changing the index as well, it did result in a performance increase prior to the id column type change so theoretically it will still help even with the varchar to bigint change. @ericp-mrel perhaps you could do a performance test for the difference between the index changed and unchanged (both using bigint attachment_id columns)?

@LukeTowers
Copy link
Member

Also, it looks like my other comment on the subject never posted, but is there a way we can use a DB query to determine if the columns can be changed rather than doing that in PHP, especially with the is_numeric() function which would return true for some forms of data that can't be reliably converted

@bennothommo
Copy link
Member

bennothommo commented Jun 29, 2021

@LukeTowers I had this problem a couple of years ago with a project - my understanding at the time was that the different database server types had different ways of handling this, so there might not be a query that can be accurately run across all our supported types.

That being said, things might've changed in the last couple of years since. Perhaps a regular expression?

@ericp-mrel
Copy link
Contributor Author

I tried to find out if there was a way to run a query to determine if there are non-int values in a column, but I couldn't really find anything. I then thought about if a single query could even work for all database engines, because I know some databases handle types differently and I don't have any experience with Postgres or MSSQL and I wanted to make this as reliable as possible which is why I settled on handling this in PHP.

I am open to handling this with a SQL query if it is possible and if anyone knows of a reliable way to do this across all database engines.

@RomainMazB
Copy link
Contributor

RomainMazB commented Jun 30, 2021

Sidenote: I didn't noticed that Sam rollbacked my previous PR that fixed the keyType on PostgreSQL installations: octobercms/library#502 (before that we saved the keys as integers into a varchar column without typecasting them as strings, PostgreSQL is more strict about types)

I didn't tested yet but this PR should fix the bug Sam re-introduced as we now will be more consistent about the keyType and value which will be both as integers from the model to the database.

@RomainMazB
Copy link
Contributor

RomainMazB commented Jul 20, 2021

That's my feeling too, as said in one on my previous comment's edit here, the more we was digging this, the more complicated the query was, the more I felt sad to use SQL to solve this.

But @LukeTowers was worried about is_numeric method which could generate false truthy here.

Now that we have both solution you are free to chose.

@bennothommo
Copy link
Member

@RomainMazB I'd suggested a fix here (#216 (comment)) which should resolve @LukeTowers concerns. Everything would need to be an integer to pass that check.

@github-actions
Copy link

This pull request will be closed and archived in 3 days, as there has been no activity in the last 60 days.
If this is still being worked on, please respond and we will re-open this pull request.
If this pull request is critical to your business, consider joining the Premium Support Program where a Service Level Agreement is offered.

@github-actions github-actions bot added the stale Issues/PRs that have had no activity and may be archived label Sep 19, 2021
@LukeTowers LukeTowers removed the stale Issues/PRs that have had no activity and may be archived label Sep 19, 2021
@github-actions
Copy link

This pull request will be closed and archived in 3 days, as there has been no activity in the last 60 days.
If this is still being worked on, please respond and we will re-open this pull request.
If this pull request is critical to your business, consider joining the Premium Support Program where a Service Level Agreement is offered.

@github-actions github-actions bot added the stale Issues/PRs that have had no activity and may be archived label Nov 19, 2021
@LukeTowers LukeTowers removed the stale Issues/PRs that have had no activity and may be archived label Nov 19, 2021
@github-actions
Copy link

This pull request will be closed and archived in 3 days, as there has been no activity in the last 60 days.
If this is still being worked on, please respond and we will re-open this pull request.
If this pull request is critical to your business, consider joining the Premium Support Program where a Service Level Agreement is offered.

@github-actions github-actions bot added the stale Issues/PRs that have had no activity and may be archived label Jan 19, 2022
@LukeTowers LukeTowers removed the stale Issues/PRs that have had no activity and may be archived label Jan 19, 2022
@github-actions
Copy link

This pull request will be closed and archived in 3 days, as there has been no activity in the last 60 days.
If this is still being worked on, please respond and we will re-open this pull request.
If this pull request is critical to your business, consider joining the Premium Support Program where a Service Level Agreement is offered.

@github-actions github-actions bot added the stale Issues/PRs that have had no activity and may be archived label Mar 21, 2022
@bennothommo bennothommo removed the stale Issues/PRs that have had no activity and may be archived label Mar 21, 2022
@github-actions
Copy link

This pull request will be closed and archived in 3 days, as there has been no activity in the last 60 days.
If this is still being worked on, please respond and we will re-open this pull request.
If this pull request is critical to your business, consider joining the Premium Support Program where a Service Level Agreement is offered.

@github-actions github-actions bot added the stale Issues/PRs that have had no activity and may be archived label May 21, 2022
@LukeTowers LukeTowers removed the stale Issues/PRs that have had no activity and may be archived label May 21, 2022
@mjauvin
Copy link
Member

mjauvin commented Jul 3, 2022

I would definitely delay this to 1.2.1 personally.

@LukeTowers LukeTowers modified the milestones: v1.2.0, v1.2.1 Jul 4, 2022
@bennothommo bennothommo modified the milestones: v1.2.1, v1.2.2 Sep 12, 2022
@ericp-mrel ericp-mrel closed this by deleting the head repository Feb 20, 2023
@LukeTowers
Copy link
Member

@bennothommo what was left on this to get it merged?

@bennothommo
Copy link
Member

@LukeTowers sorry for the delay on getting back to you on this one.

On reviewing it, I think this PR has kinda gone in two directions. Originally, it was to change the indexes in the system_files table to a composite index for apparently better performance. I'm not too keen on this as I'm not sure of its effect on other database types and I haven't seen any hard numbers to determine its performance benefit.

However, there is also a fix to change the attachment IDs to a bigint column if the table only contains numbers. I think this is needed because some strict SQL servers (like PostgreSQL or MySQL with some SQL modes in play) will type the column as a string column and will reject integer values outright, as opposed to coercing them to strings. This part of the PR I believe is needed to fix the linked issues.

@LukeTowers
Copy link
Member

@bennothommo are you interested in resubmitting the relevant part of the PR that you want to keep?

As an aside, @ericp-mrel did provide hard performance numbers here: #208 (reply in thread) and they were fairly convincing in favour of this PR in its entirety.

@bennothommo
Copy link
Member

Fair enough, so it does have a performance benefit (for MySQL), but we'd need to see if that's the same with the other database types too.

I can certainly make a PR for the bigint change.

@mjauvin mjauvin removed this from the v1.2.2 milestone May 22, 2023
@LukeTowers
Copy link
Member

Ping @bennothommo 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance PRs that fix bugs, are translation changes or make only minor changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to migrate OctoberCMS on Postgres database
5 participants