Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36563 - Remove orphaned content unit cleanup from index_content #10638

Merged
merged 2 commits into from
Jul 12, 2023

Conversation

sjha4
Copy link
Member

@sjha4 sjha4 commented Jul 7, 2023

What are the changes introduced in this pull request?

Move remove content unit orphan deletionout of index_content to avoid increasing indexing times which impacts both syncs and publishes. Instead orphan deletion only happens during OrphanCleanup now.

Considerations taken when implementing this change?

What are the testing steps for this pull request?

Sync a repo. You should see the task: "Remove orphaned content units" run after completion of sync or publish CV actions asynchronously.
0. Check Katello::Rpm.count in foreman-rake console.

  1. Create a lot of orphaned content. You can create these by running below in foreman-rake console:
i = 1
while(i<50000)
f = Katello::Rpm.new(name: i, pulp_id: i)
f.save!
i +=1
end
  1. Check Katello::Rpm.count in console.
  2. Complete sync any small repo. It shouldn't take ridiculously long. You can compare this against earlier snaps.
  3. In foreman-rake console,
    Katello::Rpm.count should stay the same.
  4. Run bundle exec rails katello:delete_orphaned_content . Go to foreman task in UI and wait for it to complete.
  5. Check Katello::Rpm.count again. The orphaned dummy rpm records should get cleaned up..

If you want to use real data, sync several large repos, publish them into a few CVs and create several versions.
That should create enough data to notice slow IndexContent action before this change.

@theforeman-bot
Copy link

Issues: #36563

@sjha4 sjha4 force-pushed the sync_repo_optimization branch 2 times, most recently from a6eb979 to 06f9cd6 Compare July 7, 2023 18:02
Copy link
Contributor

@parthaa parthaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work ok . Checked sync and saw the new action

@sjha4 sjha4 force-pushed the sync_repo_optimization branch 3 times, most recently from 9dd0f85 to 70fc5ba Compare July 10, 2023 04:15
Copy link
Contributor

@parthaa parthaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the remove orphan calls. Behavior looked good. ACK

@@ -31,6 +31,10 @@ def run
output[:new_content][content_type.label] = new_count - initial_counts[content_type.label]
end
end

def finalize
ForemanTasks.async_task(OrphanCleanup::RemoveOrphanedContentUnits, {repo_id: input[:id]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm questioning now if we even need to destroy orphaned units after syncing.

For the orphaned filter rule problem, the solution is this:

def clean_filter_rules(repo_associations_to_destroy)

Otherwise, I don't think orphaned content units should cause any other problems.

Plus, if a user syncs 20 repositories, they're going to get 20 orphan cleanup tasks running, which could eat up resources on their system.

Perhaps we should just rely on the weekly orphan cleanup?

If we do that, I just suggest that we change the following error messages to tell the users to run orphan cleanup instead of run a complete sync:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe we should change those error messages with your PR here regardless. Complete sync no longer deletes the orphaned units itself.

end

def run
content_types_to_index = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content_types_to_index makes it sound a bit like the types will be indexed instead of checked for orphans.

repo = ::Katello::Repository.find(input[:id])
content_types_to_index = repo.repository_type.content_types_to_index
else
fail "Pass either a repository to determine content type or destroy_all to destroy all orphaned content units"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing the destroy_all param? That way only repo_id would be needed and the logic would be a bit simplified. Orphan cleanup should only be deleting useless data, so I don't think a safety net is needed.

@sjha4
Copy link
Member Author

sjha4 commented Jul 11, 2023

Updated the PR to run orphan content unit deletion only in orphan cleanup task which should run once a week in deployed environments.
Removed inputs from task to make it run on all content units during the orphan cleanup.
Updated error messages to point to the rake task.

@jeremylenz
Copy link
Member

PR title needs an update now ;)

@sjha4 sjha4 changed the title Fixes #36563 - Move remove orphan content units to async task Fixes #36563 - Remove orphaned content unit cleanup from index_content Jul 11, 2023
@sjha4
Copy link
Member Author

sjha4 commented Jul 11, 2023

Should be ready for re-review.

app/services/katello/pulp3/repository.rb Outdated Show resolved Hide resolved
@@ -7,27 +7,16 @@ class RemoveOrphans < Actions::Base
end
def plan(proxy)
sequence do
if proxy.pulp_primary?
::Katello::RootRepository.orphaned.destroy_all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this was pulled out of the run phase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really..This was the only operation in the run phase after refactoring the content unit deletion..and this doesn't seem like something that needs to be in the run phase..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a DB operation, should try not to put it in the run phase because run doesn't run in a transaction :)

Copy link
Member

@ianballou ianballou Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a heavy DB operation in the planning phase eat up a Puma worker? Also we shouldn't need transactions here at least since we're deleting orphans. But even if we did need one, couldn't we do a transaction block?

(I don't think deleting orphaned root repos will be a heavy operation, but I'm a little confused because we stick heavy DB ops in tasks in other places)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused because we stick heavy DB ops in tasks in other places

Usually they're in plan or finalize though, right? Those are fine because they run in transactions by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack to that..We tend to put only things that need API calls to other products in run phase generally. For this task, I'd be ok keeping this in run phase over the plan phase. Any failures on this are set to rescue=skip and also we do want to still plan other pulp orphan deletion actions even if the orphan deletion in katello fails. A plan phase failure wouldn't plan those pulp tasks.

We could put this in finalize if we want. Let me know if we feel strongly about that and I can update this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I trust your judgment to choose the phase correctly :)

Copy link
Member

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

@sjha4 sjha4 merged commit 51f7a12 into Katello:master Jul 12, 2023
3 of 4 checks passed
wbclark pushed a commit to wbclark/katello that referenced this pull request Jul 20, 2023
Katello#10638)

* Fixes #36563 - Move remove orphan content units to async task

* Fixes #36563 - Run orphan content unit deletion during orphan cleanup only

(cherry picked from commit 51f7a12)
wbclark pushed a commit to wbclark/katello that referenced this pull request Jul 21, 2023
Katello#10638)

* Fixes #36563 - Move remove orphan content units to async task

* Fixes #36563 - Run orphan content unit deletion during orphan cleanup only

(cherry picked from commit 51f7a12)
wbclark pushed a commit that referenced this pull request Jul 24, 2023
#10638)

* Fixes #36563 - Move remove orphan content units to async task

* Fixes #36563 - Run orphan content unit deletion during orphan cleanup only

(cherry picked from commit 51f7a12)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants