-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support backup preservation [BF-2219] #171
Conversation
Codecov Report
@@ Coverage Diff @@
## master #171 +/- ##
==========================================
+ Coverage 78.77% 78.81% +0.03%
==========================================
Files 17 17
Lines 4400 4493 +93
Branches 995 1016 +21
==========================================
+ Hits 3466 3541 +75
- Misses 696 716 +20
+ Partials 238 236 -2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
d6c91c0
to
12be115
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and functional. The tests are also great!
However, there are 2 main points that might need refinement:
- the http endpoints could be better
- the named tuple
PreservationRequest
brings no value and obfuscates the code a bit.
On less important points (not blocking blocking):
- the wasteful iteration over
state["backups"]
to find the one we want withstream_id
looks like it's not the right data structure. I'm not sure if there's a reason for it being a list. - type hints missing here and there
- some boolean expressions
myhoard/controller.py
Outdated
def mark_backup_preservation(self, preservation_request: PreservationRequest) -> None: | ||
backup_to_preserve = self._get_backup_by_stream_id(preservation_request.stream_id) | ||
if not backup_to_preserve: | ||
raise Exception(f"Stream {preservation_request.stream_id} was not found in completed backups.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not raise base Exception. We can also create our own. It'll be easier to handle them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This is the pattern used in the entire code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. Then this becomes out of scope. I'll resolve this. Maybe we can create a ticket for new-developer, WDYT?
for backup in self.state["backups"]: | ||
if backup["stream_id"] == stream_id: | ||
return backup["site"] | ||
raise KeyError(f"Stream {stream_id} not found in backups") | ||
return backup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we often call that? If we always want to access backups by their stream_id, it might be worth considering changing self.state["backups"]
from a list[Backup]
to a dict[stream_id, Backup]
.
I don't think this should be in the scope of the ticket though. But depending on your opinion and knowledge, it might be worth creating a ticket for this (if it makes sense), good for new-developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do filter backups by stream_id lots of times... but that sounds as a major change and multiple places on the code will be affected, not sure if it will be worth it at the end of the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, out of scope then. 👍
At the end of the review, I'll create a few tickets about all the points mentioned, to keep track of them and they can be prioritized by Amichai at a later date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#175 handles this
@@ -1408,11 +1434,12 @@ def _purge_old_binlogs(self, *, mysql_maybe_not_running=False): | |||
self.stats.gauge_float("myhoard.binlog.time_since_any_purged", current_time - last_purge) | |||
self.stats.gauge_float("myhoard.binlog.time_since_could_have_purged", current_time - last_could_have_purged) | |||
|
|||
def _refresh_backups_list(self): | |||
def _refresh_backups_list(self, force_refresh: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _refresh_backups_list(self, force_refresh: bool = False): | |
def _refresh_backups_list(self, force_refresh: bool = False) -> List[Backup]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this requires a lot of changes since Controller.get_backup_list
is not returning List[Backup]
, is returning List[Dict[.....]]
. I rather keep this PR small and not involve any refactoring work that is not required. I'll create an issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #175 for this
interval = self.backup_refresh_interval_base | ||
if self.mode == self.Mode.active: | ||
interval *= self.BACKUP_REFRESH_ACTIVE_MULTIPLIER | ||
if time.time() - self.state["backups_fetched_at"] < interval: | ||
|
||
if force_refresh is False and time.time() - self.state["backups_fetched_at"] < interval: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if force_refresh is False and time.time() - self.state["backups_fetched_at"] < interval: | |
if not force_refresh and time.time() - self.state["backups_fetched_at"] < interval: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clicked a wrong button
aabe449
to
0efe31c
Compare
dismissed, some suggestions are not related to this PR
0efe31c
to
a67d291
Compare
b468e5c
to
5464b20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has quite a lot of commits that look like internal development history, didn't check if there are some that make sense as individual ones but probably they should mostly be squashed.
180c599
to
02605fb
Compare
Sometimes, it becomes necessary to retain a backup, as there may be instances where a backup is in the process of being restored, but it suddenly gets removed due to its age and the current amount of backups is surpassing the maximum allowed to be retained. To prevent such scenario, one can request the preservation of a specific backup until a particular date. To initiate this process, a PUT request to /backup/{stream_id}/preserve must be executed. This action will prompt the controller to add the preservation request under 'pending_preservation_requests,' signifying that it will subsequently update the backup's preservation status while in 'active' mode. The preservation of a backup involves the storage of a 'preserve.json' file within the site's object storage, which contains the designated 'preserve until' date." [BF-2219]
02605fb
to
707c475
Compare
applied suggestions and squashed all commits into a final one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving still open for the team working on this component nowadays to review but as far as I'm concerned looks good now.
One gotcha this will introduce is that if a backup is preserved for so long that we end up purging a backup that was newer than the one we preserved, we end up with a hole in the stream of binlogs and the mechanism that automatically restores an older backup plus binlogs of all newer backups if we encounter a broken backup will not work. I suspect that won't be an issue in practice as this is presumably just a temporary measure we can use to keep backups for some extra minutes or hours.
About this change: What it does, why it matters
Supports request for preserving/holding a backup till a particular date, this way we can guarantee that the backup is not deleted during a restoration request.
If the oldest backup is being preserved, myhoard will not purge any backup but will keep generating new backups. Purging is resumed till the preservation date is met.
How to request a backup preservation?
For preserving a backup, a request to the HTTP API needs to be performed. For example:
This will trigger the controller on fetching the respective backup stream and storing
preserved_info.json
on the object storage, it will contain the requestpreserve_until
. After storing the file, all backups metadata are immediately refreshed and myhoard considers updatedpreserve_until
values before trying to purge old backups.How to cancel a backup preservation?
This will trigger the controller on fetching the respective backup stream and storing
preserved_info.json
on the object storage, but will store an empty "preseve_until" value.