Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: New design #27

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

WIP: New design #27

wants to merge 3 commits into from

Conversation

vimalk78
Copy link

This PR attempts to rewrite the log-metrics-file-exporter with a new design.

Earlier design used linux inotify via package github.com/fsnotify/fsnotify, which abstarcts inotify callbacks with its own defined values. Also adding a IN_MODIFY watch on a log file generates too many events which very quickly leads to overflow error.

New design directly uses golang.org/x/sys/unix package to use inotify for better control, and uses IN_ONESHOT flag to reduce the number of events generated when log files are written. Please see DESIGN.md for details.

/cc @jcantrill @alanconway

/hold

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vimalk78

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

RUN make build

#FROM registry.access.redhat.com/ubi8
FROM centos:centos8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only way we can install "strace" and "strace-cmd

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added for debugging purpose only, will remove it.

RUN sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
RUN sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*
#RUN echo 'kernel.yama.ptrace_scope = 0' >> /etc/sysctl.conf
RUN yum -y install strace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we can not collapse these statements?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added for debugging purpose only, will remove it.

@@ -0,0 +1,7 @@
*
!Makefile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean to ignore the Makefile when copying?... same woth go.mod and go.sum. Don't we need these to build the image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line says ignore all files, except the following lines which start with !
So the entries in this file are the files which get added.

set of watch flags are used for directories and files.

Watches for folders:
- `IN_CREATE`: watch is triggered when a file or direcory is created in this directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

as one returned in step 1. Once this happens, there are no more events received for file changes. the loop essentially halts.

#### Workaround
Save the watch desctiptor for files, and compare the new watch descriptor with earlier one, if same add watch again will the returned watch descriptor is different.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...add until(?) the returned watch...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

// a new container directory got created, add a watch for this directory
glog.V(3).Infof("A new container got created. container: %q", newdir)
}
must(n.WatchDir(newdir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails the entire process if the watch fails right? Is there no recovery from here from that which we know?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best is to restart the process

// ignore files which are not log files
if strings.HasSuffix(logfile, ".log") {
glog.V(3).Infof("a new log file got created: %q\n", logfile)
must(n.WatchLogFile(logfile))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best is to restart the process

case e.IsDelete():
if e.path != "" {
if e.IsDir() {
// a directory got created in a directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update or remove comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

case <-tickerCh:
/**/
sizemtx.Lock()
glog.V(0).Infof("sizes(%d): %s\nEventsSent: %d, EventsHandled: %d\n", len(Sizes), func(m map[string]int64) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like we should bump the verbosity since Marshalling is likely an expensive operation. Should the marshal actually have a if block around the actual work so it only happens when v>X

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for debugging purpose only, will remove this once metrics are added.

n.mtx.RUnlock()

l := strings.Join(wl, ",\n")
glog.V(0).Infof("Watching paths: \n%s\nTotal watches: %d\n", l, len(wl))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. seems expensive

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like a running log of paths being watched and is also for debugging purpose. It is getting printed once every tick, which can be made configurable, and log level changed to 3.

@alanconway
Copy link
Member

@vimalk78 I've been poking around. You don't need to create a separate watch for every file. A watch on a directory will also be notified about events to files in that directory, with the file name in the event. That may solve your mystery with running into the same watch descriptor and hanging - perhaps you are running out of watch descriptors. Watching files only will reduce the number greatly.

That does create a problem with ONESHOT - while you're not watching the directory you could miss file create/delete events. Possibly you can solve that by having 2 watches per directory. One is only for create/delete and is not ONESHOT.
The other is for all other file change events, and is ONEHSOT. That means ONESHOT lets you ignore some WRITE events, but the continuous create/delete watch won't miss any create or delete events.

Here's a demos:

package main

import (
	"bytes"
	"fmt"
	"os"
	"time"
	"unsafe"

	"golang.org/x/sys/unix"
)

func must[T any](v T, err error) T {
	if err != nil {
		panic(err)
	}
	return v
}

func main() {
	fd := must(unix.InotifyInit1(unix.IN_CLOEXEC | unix.IN_NONBLOCK))
	wf := os.NewFile(uintptr(fd), "")
	_ = os.MkdirAll("/tmp/test", 0777)
	_ = must(unix.InotifyAddWatch(fd, "/tmp/test", unix.IN_ALL_EVENTS))
	go func() {
		const size = unix.SizeofInotifyEvent
		buf := make([]byte, size*1000)
		for {
			n := must(wf.Read(buf[:]))
			for i := 0; i+size < n; {
				e := *(*unix.InotifyEvent)(unsafe.Pointer(&buf[i]))
				i += size      // After the event struct
				if e.Len > 0 { // Event has a name
					end := bytes.IndexByte(buf[i:], 0)
					name := string(buf[i : i+end]) // NULL terminated string
					fmt.Println(name)
				}
				fmt.Println(e)
				i += int(e.Len) // Move past name
			}
		}
	}()
	f := must(os.Create("/tmp/test/x"))
	fmt.Fprintln(f, "hello")
	fmt.Fprintln(f, "goodbye")
	f.Close()
	f = must(os.Create("/tmp/test/y"))
	f.Close()
	time.Sleep(time.Minute)
}

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 9, 2023

@vimalk78: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 6b211e2 link true /test images
ci/prow/test 6b211e2 link true /test test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants