Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36563 - Remove orphaned content unit cleanup from index_content #10638

Merged
merged 2 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
module Actions
module Katello
module OrphanCleanup
class RemoveOrphanedContentUnits < Actions::Base
def plan(options = {})
plan_self(id: options[:repo_id], destroy_all: options[:destroy_all])
end

def run
content_types_to_index = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content_types_to_index makes it sound a bit like the types will be indexed instead of checked for orphans.

if input[:destroy_all]
::Katello::RepositoryTypeManager.enabled_repository_types.each_value do |repo_type|
content_types_to_index << repo_type.content_types_to_index
end
elsif input[:id]
repo = ::Katello::Repository.find(input[:id])
content_types_to_index = repo.repository_type.content_types_to_index
else
fail "Pass either a repository to determine content type or destroy_all to destroy all orphaned content units"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing the destroy_all param? That way only repo_id would be needed and the logic would be a bit simplified. Orphan cleanup should only be deleting useless data, so I don't think a safety net is needed.

end
content_types_to_index.flatten.each do |type|
type.model_class.orphaned.destroy_all
end
end

def rescue_strategy
Dynflow::Action::Rescue::Skip
end
end
end
end
end
19 changes: 4 additions & 15 deletions app/lib/actions/katello/orphan_cleanup/remove_orphans.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,16 @@ class RemoveOrphans < Actions::Base
end
def plan(proxy)
sequence do
if proxy.pulp_primary?
::Katello::RootRepository.orphaned.destroy_all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this was pulled out of the run phase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really..This was the only operation in the run phase after refactoring the content unit deletion..and this doesn't seem like something that needs to be in the run phase..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a DB operation, should try not to put it in the run phase because run doesn't run in a transaction :)

Copy link
Member

@ianballou ianballou Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a heavy DB operation in the planning phase eat up a Puma worker? Also we shouldn't need transactions here at least since we're deleting orphans. But even if we did need one, couldn't we do a transaction block?

(I don't think deleting orphaned root repos will be a heavy operation, but I'm a little confused because we stick heavy DB ops in tasks in other places)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused because we stick heavy DB ops in tasks in other places

Usually they're in plan or finalize though, right? Those are fine because they run in transactions by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack to that..We tend to put only things that need API calls to other products in run phase generally. For this task, I'd be ok keeping this in run phase over the plan phase. Any failures on this are set to rescue=skip and also we do want to still plan other pulp orphan deletion actions even if the orphan deletion in katello fails. A plan phase failure wouldn't plan those pulp tasks.

We could put this in finalize if we want. Let me know if we feel strongly about that and I can update this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I trust your judgment to choose the phase correctly :)

plan_action(RemoveOrphanedContentUnits, destroy_all: true)
end
if proxy.pulp3_enabled?
plan_action(
Actions::Pulp3::Orchestration::OrphanCleanup::RemoveOrphans,
proxy)
end
plan_self
end
end

def run
models = []
::Katello::RepositoryTypeManager.enabled_repository_types.each_value do |repo_type|
indexable_types = repo_type.content_types_to_index
models += indexable_types&.map(&:model_class)
models.select! { |model| model.many_repository_associations }
end
models.each do |model|
model.joins("left join katello_#{model.repository_association} on #{model.table_name}.id = katello_#{model.repository_association}.#{model.unit_id_field}").where("katello_#{model.repository_association}.#{model.unit_id_field} IS NULL").destroy_all
end

::Katello::RootRepository.orphaned.destroy_all
end
end
end
Expand Down
4 changes: 4 additions & 0 deletions app/lib/actions/katello/repository/index_content.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ def run
output[:new_content][content_type.label] = new_count - initial_counts[content_type.label]
end
end

def finalize
ForemanTasks.async_task(OrphanCleanup::RemoveOrphanedContentUnits, {repo_id: input[:id]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm questioning now if we even need to destroy orphaned units after syncing.

For the orphaned filter rule problem, the solution is this:

def clean_filter_rules(repo_associations_to_destroy)

Otherwise, I don't think orphaned content units should cause any other problems.

Plus, if a user syncs 20 repositories, they're going to get 20 orphan cleanup tasks running, which could eat up resources on their system.

Perhaps we should just rely on the weekly orphan cleanup?

If we do that, I just suggest that we change the following error messages to tell the users to run orphan cleanup instead of run a complete sync:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe we should change those error messages with your PR here regardless. Complete sync no longer deletes the orphaned units itself.

end
end
end
end
Expand Down
4 changes: 2 additions & 2 deletions app/models/katello/concerns/pulp_database_unit.rb
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,9 @@ def with_identifiers(ids)

def orphaned
if many_repository_associations
where.not(:id => repository_association_class.where(:repository_id => ::Katello::Repository.all).select(unit_id_field))
where.not(:id => repository_association_class.select(unit_id_field))
else
where.not(:repository_id => ::Katello::Repository.all)
where.not(:repository_id => ::Katello::Repository.select(:id))
end
end

Expand Down
1 change: 0 additions & 1 deletion app/models/katello/repository.rb
Original file line number Diff line number Diff line change
Expand Up @@ -935,7 +935,6 @@ def index_content(options = {})
repository_type.content_types_to_index.each do |type|
Katello::Logging.time("CONTENT_INDEX", data: {type: type.model_class}) do
Katello::ContentUnitIndexer.new(content_type: type, repository: self, optimized: !full_index).import_all
type.model_class.orphaned.destroy_all
end
end
repository_type.index_additional_data_proc&.call(self)
Expand Down
2 changes: 1 addition & 1 deletion app/models/katello/root_repository.rb
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ class RootRepository < Katello::Model
where(:http_proxy_policy => RootRepository::USE_SELECTED_HTTP_PROXY).
where(:http_proxy_id => http_proxy_id)
}
scope :orphaned, -> { where.not(id: Katello::Repository.pluck(:root_id).uniq) }
scope :orphaned, -> { where.not(id: Katello::Repository.select(:root_id)) }
scope :redhat, -> { joins(:provider).merge(Katello::Provider.redhat) }
scope :custom, -> { where.not(:id => self.redhat) }
delegate :redhat?, :provider, :organization, to: :product
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
require 'katello_test_helper'
module Katello
class RemoveOrphanedContentUnitsTest < ActiveSupport::TestCase
def setup
@rhel6 = Repository.find(katello_repositories(:rhel_6_x86_64).id)
end

def test_orphaned_content_task_destroys_orphans
rpm = katello_rpms(:one)
rpm.repository_rpms.destroy_all
ForemanTasks.sync_task(Actions::Katello::OrphanCleanup::RemoveOrphanedContentUnits, {repo_id: @rhel6.id})
assert_raises(ActiveRecord::RecordNotFound) { rpm.reload }
end

def test_orphaned_content_task_destroy_all
rpm = katello_rpms(:one)
rpm.repository_rpms.destroy_all
ForemanTasks.sync_task(Actions::Katello::OrphanCleanup::RemoveOrphanedContentUnits, {destroy_all: true})
assert_raises(ActiveRecord::RecordNotFound) { rpm.reload }
end
end
end
21 changes: 5 additions & 16 deletions test/actions/katello/orphan_cleanup/remove_orphans_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,36 +30,25 @@ class TestBase < ActiveSupport::TestCase
class RemoveOrphansPlanTest < TestBase
let(:action_class) { ::Actions::Katello::OrphanCleanup::RemoveOrphans }

it 'plans proxy orphans cleanup with pulp3 primary' do
it 'plans proxy orphans cleanup and content unit orphan cleanup on pulp3 primary' do
smart_proxy = SmartProxy.pulp_primary
tree = plan_action_tree(action_class, smart_proxy)

assert_tree_planned_with(tree, Actions::Pulp3::OrphanCleanup::RemoveOrphans)
assert_tree_planned_with(tree, Actions::Pulp3::OrphanCleanup::DeleteOrphanRepositoryVersions)
assert_tree_planned_with(tree, ::Actions::Katello::OrphanCleanup::RemoveOrphanedContentUnits)
end

it 'plans proxy orphans cleanup with pulp3 mirror' do
it 'plans proxy orphans cleanup without content unit orphan cleanup on pulp3 mirror' do
smart_proxy = FactoryBot.create(:smart_proxy, :pulp_mirror, :with_pulp3)
smart_proxy.stubs(:pulp_primary?).returns(false)
tree = plan_action_tree(action_class, smart_proxy)

assert_tree_planned_with(tree, Actions::Pulp3::OrphanCleanup::RemoveOrphans)
assert_tree_planned_with(tree, Actions::Pulp3::OrphanCleanup::DeleteOrphanDistributions)
assert_tree_planned_with(tree, Actions::Pulp3::OrphanCleanup::DeleteOrphanRemotes)
assert_tree_planned_with(tree, Actions::Pulp3::OrphanCleanup::DeleteOrphanRepositoryVersions)
end

it 'runs and removes orphan content units' do
smart_proxy = SmartProxy.pulp_primary
file_unit_orphan = Katello::FileUnit.new(:name => "file_unit", :pulp_id => "orphaned")
file_unit_orphan.save!
docker_unit_orphan = Katello::DockerTag.new(:name => "docker_unit", :pulp_id => "orphaned_docker")
docker_unit_orphan.save!
action = create_action(action_class)
action.expects(:plan_self)
plan_action action, smart_proxy
run_action action
assert_raises(ActiveRecord::RecordNotFound) { file_unit_orphan.reload }
assert_raises(ActiveRecord::RecordNotFound) { docker_unit_orphan.reload }
refute_tree_planned(tree, ::Actions::Katello::OrphanCleanup::RemoveOrphanedContentUnits)
end
end
end
4 changes: 2 additions & 2 deletions test/models/repository_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -710,12 +710,12 @@ def test_with_errata
assert_includes Repository.with_errata([errata]), @rhel6
end

def test_index_content_destroys_orphans
def test_orphaned_content_task_destroys_orphans
rpm = katello_rpms(:one)
::Katello::ContentUnitIndexer.any_instance.stubs(:import_all).returns(true)
::Katello::Repository.any_instance.stubs(:import_distribution_data).returns(true)
rpm.repository_rpms.destroy_all
@rhel6.index_content
ForemanTasks.sync_task(Actions::Katello::OrphanCleanup::RemoveOrphanedContentUnits, {repo_id: @rhel6.id})
assert_raises(ActiveRecord::RecordNotFound) { rpm.reload }
end

Expand Down