Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] enable file logging sink that occasionally checks if original inode/device matches the current per-path one #239

Open
jnpkrn opened this issue Nov 30, 2016 · 11 comments

Comments

@jnpkrn
Copy link
Contributor

jnpkrn commented Nov 30, 2016

Working on something entirely unrelated, I came across something that
has a potential to affect pacemaker+corosync stack (apparently using
libqb which can possibly offer the solution) wrt. logging when logrotate
program installed and scheduled to run regularly (common case with
the former precondition).

The problem is that logrotate.d configuration shipped with corosync:

https://github.com/corosync/corosync/blob/v2.4.2/conf/logrotate/corosync.in#L4

contains copytruncate keyword, which asks for troubles per logrotate(8):

Note that there is a very small time slice between copying the file
and truncating it, so some logging data might be lost.

That's definitely undesired effect at any rate.

As that unrelated stuff that lead me here was Python based and related
to logging, the solution was quite simple:

https://hg.python.org/cpython/file/3.6/Lib/logging/handlers.py#l416

i.e., using logging.handlers.WatchedFileHandler from standard library.

From there, immediate extrapolation onto libqb feature set can be seen,
and hence this RFE.

The Python's implementation is pretty simple, though it's not very
efficient. I can imagine that making the check every X messages
or when the last message was accepted for emitting more than Y
seconds ago, would be a better choice, as it combines the advantages:

  • nothing being lost (unless even more extreme circumstances), because at
    worst, the data are, in the temporary window, written still to the rotated log
  • efficiency, because it doesn't stat the target file every time,
    especially if there's a burst of messages to proceed

When this is done, it would be wise to have corosync drop copytruncate
directive from logrotate configuration, only relying on implicit rotating
and maybe implicit/explicit create.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 1, 2016

When at it, Python offers handlers implementing also their own log
rotation, based on:

So one of the options would be to mimic these as well/instead.

On the other hand, logrotate is highly configurable, with custom
hooks etc., and is something providing unified interface to log files
handling from administrator's POV.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 2, 2016

When speaking of corosync, the risk of losing data in

very small time slice between copying the file and truncating it

is amplified by the fact that corosync + logging thread are running with
real-time priority under normal circumstances, whereas cron daemon
and, in turn, it's logrotate child are running with non-elevated
priority.

If I understand the thing properly, there are possibly two problems
with logrotate + copytruncate:

  1. original file X is being copied to X' while another logrotation
    event is started doing the same procedure, possibly silently
    corrupting X' along

  2. original file X has been copied to X', but before ftruncate
    is called, new data has been written to X that will get
    subsequently lost when the truncation kicks in

For 1., it seems solved thanks to O_EXCL to open.

But it doesn't look there's a reliable solution for 2. (there's no
call that would atomically combine file swap (already can be atomic
with renameat2 + RENAME_EXCHANGE) with some kind of freopen
like logic (reassociating existing file descriptors with the newly
swapped file in one go).

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 6, 2016

Note that pacemaker itself also uses copytruncate option:
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-1.1.16/extra/logrotate/pacemaker.in#L9
(affects pacemaker prior to switch to corosync's log file in the runtime, or pacemaker_remoted + perhaps more).

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 7, 2016

One of the things that were brought up in relation this was a question
of a backward compatibility, because there's no hard relevance between
say pacemaker and its logrotate file and the libqb that is indeed used
(which may be short of the new functionality).

One of the possibilities to employ in the updated logrotate files is to
use firstscript option with custom command to check the libqb version
(through *.so version info) and if not high enough, to trigger logrotate
with an out-of-path original configuration (copytruncate) and
subsequently fail to denote the procedure should not continue.


For the new logrotate config itself and regarding what libqb could be
taught to do, it was suggested that the libqb user could opt-in to have
libqb listen for SIGHUP or the like and restart the logging stream
instead of what was originally proposed (i.e. check inode on some
log events actively). Problem is that if the libqb's user is also
dealing with other signals on its own (or via libqb's mainloop),
there may be some undesired interferences.

Another alternative would be to use inotify, but that's limited to
a single platform only.

A compromise could be that libqb will create a named pipe X.pipe for
the target log file X which then will be written to per the logrotate
configuration. This currently seems the best technical solution:

  • named pipes should be available everywhere we desire
  • custom command in logrotate config would be immediately clear on
    whether the libqb supporting the signalling mechanism is in use
    thanks to existing X.pipe file (a la test -p), and thus supporting
    the sketched fallback scheme for older libqb in use
    (it could be viceversa: old -> new)
  • no sign of "bad polling" (i.e., checking whether inode has changes
    over and over), just a "good polling" (poll on X.pipe)
  • signal-safe

@kgaillot
Copy link
Contributor

kgaillot commented Dec 8, 2016

For the new logrotate config itself and regarding what libqb could be
taught to do, it was suggested that the libqb user could opt-in to have
libqb listen for SIGHUP or the like and restart the logging stream

To clarify, I was thinking that libqb would simply provide a "reopen log files" function, and the library user could call that from its own signal handler for some available signal (e.g. SIGHUP, SIGUSR1 or SIGUSR2).

I don't quite understand the pipe idea. What would write to the pipe, and what would read from it?

@chrissie-c
Copy link
Contributor

Yes, I don't like the pipe idea. libraries have no business providing that sort of unexpected interface. If logs need rotating then i's the application's responsibility to say when and how it's triggered.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 8, 2016 via email

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 9, 2016 via email

@jfriesse
Copy link
Member

My 2cents. Looks like a extremely overengineered solution. Ability to reopen log seems like a viable feature and it's just good enough solution.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 12, 2016 via email

@jnpkrn jnpkrn added the RFE label Oct 31, 2017
@jnpkrn jnpkrn added the design label Dec 15, 2017
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 15, 2017

I don't know why, I've missed there was a pre-existing issue #142.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants