Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip reindex on normal make up where index alias exists #22819

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

KevinMind
Copy link
Contributor

@KevinMind KevinMind commented Nov 4, 2024

Follow up: #22805

Description

When running make up, if we are not explicitly cleaning and reseting the database and we have a non-empty database it is fair to assume that indexing may have already happened and we can safely skip reindex if the es index is already aliased

Context

See: LINK TO COMMENT for explanation

Testing

Run make up after running make down and you should expect ES indexing to happen. Then run make up again and expect it not to happen. Adding INIT_CEAN=True or wiping the database should also cause reindexing.

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@KevinMind KevinMind mentioned this pull request Nov 4, 2024
5 tasks
@KevinMind KevinMind force-pushed the skip-reindex-already-aliased branch 2 times, most recently from c94776b to e229395 Compare November 4, 2024 16:39
@KevinMind KevinMind requested review from a team and diox and removed request for a team November 4, 2024 16:40
Copy link
Member

@diox diox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#22805 (comment) still applies:

  • You should continue to pass --wipe (important for initial setup)
  • You should only skip if the alias exists, not the index

@KevinMind
Copy link
Contributor Author

#22805 (comment) still applies:

* You should continue to pass `--wipe` (important for initial setup)

* You should only skip if the alias exists, not the index

Is this not addressing the second point?

For the first point. Looking back at the original point

If add-ons are saved before the initial reindex is done, then ES will automatically create an index called "addons" that won't be an alias. It won't have the schema however, so some data will be incorrect, and subsequent calls to reindex without passing --wipe will fail.

So if I don't pass wipe, and addons are created before I call the "reindex" command there is a potential failure? It's not totally clear to me why that would be. If I take your words literally in that case ther would still not be an alias so we would recreate the index anyway.. so that is why we need both the skip if aliased and wipe so if we don't skip due to non aliased index we also remove the existing invalid index first? Do I get it? Not sure.. halllp 🤷

@diox
Copy link
Member

diox commented Nov 6, 2024

Is this not addressing the second point?

It's at the wrong place: it should be done for your new skip_if_exists argument a few lines above. Don't want to change the line that you changed, that has different consequences in terms of what index we touch...

So if I don't pass wipe, and addons are created before I call the "reindex" command there is a potential failure? It's not totally clear to me why that would be. If I take your words literally in that case ther would still not be an alias so we would recreate the index anyway.. so that is why we need both the skip if aliased and wipe so if we don't skip due to non aliased index we also remove the existing invalid index first? Do I get it? Not sure.. halllp 🤷

If you don't pass --wipe and add-ons are created before we call the initial reindex, then reindex will never complete. It will create a timestamped addons-XXXX index, and after if has indexed all add-ons it will want to swap the addons alias to point to that index, and that will fail because it's not an alias. You'll get a traceback in the worker logs, and the reindex command will infinitely loop waiting for the task to successfully complete, which it never will.

BTW, the reason we do that alias switch is to allow a full reindex while still having a fully working search. The idea is:

  • You execute the command to reindex, that creates a new index almost immediately, but addons alias is not updated to point to it just yet. A bunch of indexing tasks are triggered in the background.
  • All reads still go to the old index, because addons still points to that old index.
  • All writes go to both the old index and the new one (in addition to the indexing tasks that were just triggered).
  • Eventually all indexing tasks finish, and the alias is switched to point to the new index, and we stop using the old index & delete it.

This is particularly important in production: we want both reads and writes to continue working transparently while we do a full reindex that might take a couple dozen minutes.

@KevinMind
Copy link
Contributor Author

@diox fixed

"""
Test reindex command with skip_if_exists option when index already exists.
"""
mock_es.indices.exists.return_value = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the right mock - it should be exists_alias - why is this test passing ?

@@ -58,4 +58,4 @@ def handle(self, *args, **options):
# We should reindex even if no data is loaded/modified
# because we might have a fresh instance of elasticsearch
else:
call_command('reindex', '--wipe', '--force', '--noinput')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should keep the --force here. It prevents the error from being raise when there is already a reindexation in progress, which I think is what we want here, in case somehow it was triggered from elsewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants