Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For Loki Sources position files: Use a best effort atomic rename #5772

Merged
merged 5 commits into from
Nov 16, 2023

Conversation

mattdurham
Copy link
Collaborator

@mattdurham mattdurham commented Nov 14, 2023

PR Description

This switches the bookmark and positions file in loki.sources to use a more resilient method. This is primarily an issue on windows where a reboot on writing to the file causes an issue. The default os.rename in windows is not reliable. The atomic package uses a much more reliable method though there are issues where it doesnt work. These are generally limited to network file systems and non-ntfs. This should cover 99.9% of our cases without problem.

Most of the changes are fork lifting from promtail.

Which issue(s) this PR fixes

Closes #4916

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated

@mattdurham mattdurham changed the title Use a best effort atomic rename For Loki Sources position files: Use a best effort atomic rename Nov 14, 2023
@mattdurham mattdurham marked this pull request as ready for review November 14, 2023 20:38
@mattdurham mattdurham requested review from tpaschalis, erikbaranowski and thampiotr and removed request for tpaschalis November 15, 2023 13:48
Copy link
Contributor

@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions. Would be nice if we could test this somehow too, although I realise it's on Windows so would be limited utility.

//go:build windows
// +build windows

// This code is copied from Promtail with minor changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a commit sha of what promtail it's copied from? Makes it easier to refresh with changes in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add that in.

Comment on lines 50 to 73
if err != nil {
return nil, err
}
// otherwise open the current one.
file, err := os.OpenFile(path, os.O_RDWR, 0666)
if err != nil {
return nil, err
}
fileContent, err := io.ReadAll(file)
if err != nil {
return nil, err
}
fileString := string(fileContent)
// load the current bookmark.
bm, err := win_eventlog.CreateBookmark(fileString)
if err != nil {
return nil, err
}
return &bookMark{
handle: bm,
path: path,
isNew: fileString == "",
buf: buf,
}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this fragment of code few things can still go wrong and possibly the agent will be crashlooping until someone manually repairs the bookmark file or, more likely, deletes it.

Should we instead have on a high-level something like this?

func newBookmark() {
  if not fileExists {
    createNewOne()
  } else {
    err := useExistingFile()
    if err != nil {
      renameCorruptedFile()
      createNewOne()
    }
  }
}

The idea being that if we detect a corrupted file, we attempt to recover.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we error on the create bookmark then we could call it again with a blank string to force a new bookmark.

@mattdurham
Copy link
Collaborator Author

Unit tests are hard since it would need to stop during the write to disk.

@mattdurham mattdurham merged commit a608dd6 into main Nov 16, 2023
10 checks passed
@mattdurham mattdurham deleted the positions_file_rename branch November 16, 2023 14:29
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agend Flow crash after client reboot
2 participants