Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

Enable logs to be stored for successful CI builds #944

Open
amshinde opened this issue Mar 7, 2018 · 16 comments
Open

Enable logs to be stored for successful CI builds #944

amshinde opened this issue Mar 7, 2018 · 16 comments
Assignees

Comments

@amshinde
Copy link
Contributor

amshinde commented Mar 7, 2018

Currently we can only retrieve the logs when a build has failed with Jenkins. We should be able to retrieve them for successful builds as well to able able to inspect if we are running with the correct environment.

@amshinde
Copy link
Contributor Author

amshinde commented Mar 7, 2018

@chavafg Can you take a look at this?

@jodh-intel
Copy link
Contributor

I'm guessing this should really have been raised on https://github.com/clearcontainers/jenkins.

/cc @grahamwhaley as this might have implications for the metrics system storage requirements.

@grahamwhaley
Copy link
Contributor

We should probably discuss and define which logs, and how much debug they have in them.
If we take all the system logs and have all the CC debug enabled in the toml for instance then the logs come out pretty big (100's of Kb iirc), which we may not want to gather and store for every run.
If we know what info we want in advance, then we could run some commands at startup such as cc-runtime cc-env, docker info and @jodh-intel 's magic system info collection script. We could even run all of those to gather into a file and add the file to the stored 'results archive' in Jenkins, which would help reduce pollution in the console output screen/log.

@chavafg I think it was recently pointed out that the metrics CI logs were already pretty big, and I should check that, as that is not intentional.

@jodh-intel
Copy link
Contributor

For reference, that magic script is https://github.com/clearcontainers/runtime/blob/master/data/collect-data.sh.in.

@amshinde - can you give a concrete example where retaining logs would have helped? I'm not disagreeing that it's a good idea, but it would be good to explore if there are other ways to give you what you want.

How long do we think we'll need to store logs? "Forever" probably won't cut it so would a month (4 releases) be sufficient do you think?

But as @grahamwhaley's suggesting, I'm not sure we need to keep the logs as long as we can know the environment the tests ran in, to allow a test run to be recreated, namely:

  • the commit version of every component.
  • the runtime config.
  • the version of the container manager being used.
  • the container manager config.
  • the version of the distro.
  • the package set being used (rpm -qa / dpkg -l).

As denoted by the checkboxes, the collect-data.sh script captures almost all we need here. The package set is the only missing item (although the script does capture the versions of any CC packages installed on the system already).

For reference, the output of the collect script when gzip -9'd is ~6k (for a system without any CC errors in the journal).

If we decide to store full logs for all PRs, we'll need something in place to warn about the ENOSPC that is almost guaranteed to happen one day... 😄

@jodh-intel
Copy link
Contributor

Oh - we might also want to include procenv output (see clearcontainers/jenkins#5) for things like system limits, etc.

@grahamwhaley
Copy link
Contributor

Agree on logs and longevity - I'm going to presume Jenkins has some plugin or setting that can manage and expire the gathered results files - and we should look at that indeed (we do collect up the .csv results files for the metrics for instance at present, but do not expire them)

@grahamwhaley
Copy link
Contributor

procenv was the magic I was thinking of :-)

@jodh-intel
Copy link
Contributor

Ah - soz - so much magic about! ;)

@chavafg
Copy link
Contributor

chavafg commented Mar 7, 2018

I think @amshinde concern is to know the agent version, which at some point last week, we had a wrong version testing latest PRs.
As for keeping the logs, I can add a rule to gather them in the Azure jenkins configuration, that way the metrics jenkins will not have any impact. But also the azure jenkins server may have storage issues in the future if we continue growing the logs we keep on every run.
As @jodh-intel and @grahamwhaley said, it would be better to gather the information we require instead of getting all the logs from the execution.

@jodh-intel
Copy link
Contributor

@chavafg - we could just run cc-collect-data.sh in the teardown script couldn't we? That way we get what info we want but also we ensure that script is being run regularly. If we need the complete list of packages, it would be easy to add an extra --all-packages option or similar.

@chavafg
Copy link
Contributor

chavafg commented Mar 7, 2018

@jodh-intel yes, I think that would be the best. does cc-collect-data.sh collect the agent version? Because I have seen that it appears as unknown.

[Agent]
  Type = "hyperstart"
  Version = "<<unknown>>"

@jodh-intel
Copy link
Contributor

@chavafg - good point! No, it doesn't.

I've had a think about this and I can think of two ways we could do this:

The gross hack

We could capture the agent version by adding something like a "--full" option to cc-collect.sh script. That option would run as normal, but would then:

  • enable full debug
  • change cc-collect-data.sh to run:
    sudo docker run --runtime cc-runtime busybox true
    
  • look at the proxy messages in the system journal because the first message from the agent will contain it's version string.

But it's a hack ;)

The slightly-less gross option

Change the runtime so that it loop-mounts the currently configured container image read-only (with mount -oro,noatime,noload (thanks @grahamwhaley)) and then run cc-agent --version and grab the output.

That seems liked the best option but wdyt @grahamwhaley, @sboeuf, @sameo?

@grahamwhaley
Copy link
Contributor

I had sometime very recently also considered we could loop mount the .img file and run the agent on the host with --version to extract that info. Either we can do that in the collect script or have the runtime do it. In the runtime feels a little skank, but I guess then we could in theory add the info into cc-env.

@jodh-intel
Copy link
Contributor

jodh-intel commented Mar 7, 2018

I was having similar feelings about having that sort of code in the runtime too. That said, we do sort of have precedent if you look at cc-check.go which calls modinfo(8).

I'm happy for us to have this purely in the collect script but, yes, if it doesn't go in the runtime, we need to remove the Agent.Version field that @chavafg highlighted as currently it's static.

@amshinde
Copy link
Contributor Author

amshinde commented Mar 7, 2018

@chavafg @jodh-intel @grahamwhaley Gathering the agent version was one of the requirements I had in mind, as we were running with a wrong agent last week. What I really wanted to have a look at were the CRIO logs, to take a look at the lifecycle events in the log and see that the container storage driver passed is actually the one being used with crio.
I would say for successful builds, one would be interested in the logs typically just after the build, so I am ok with keeping this around for a week or even just for a couple of days.

@grahamwhaley
Copy link
Contributor

It looks like in the Jenkins 'discard old builds' option we may also have the ability to specify how long to keep artifacts for btw.

mcastelino pushed a commit to mcastelino/tests that referenced this issue Jan 23, 2019
…eout_docker

ci: cleanup: add timeouts to docker on cleanups
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants