Skip to content

Commit

Permalink
Add metric collection via njmon
Browse files Browse the repository at this point in the history
This patch adds a new role to manage "njmon". This tool allows to
collect metrics on the hypervisor so that we can check how the resources
were used on it.

By toggling `cifmw_monitoring` boolean to `true`, you will enable its
installation at the very beginning of the reproducer.yml run, as well as
the graphs generation at the very end of the run.

This can help understanding potential issues, for instance related to
memory shortage (oom-killer), slow I/O (disks) or clogged CPU
(overprovisioning).

One of the advantages of njmon, in addition to being really small and
light, is its capacity to send data to a remote InfluxDB. This would
allow you to display the graphs in realtime in Grafana, for instance.
  • Loading branch information
cjeanner committed Aug 29, 2024
1 parent 88a6b80 commit d468184
Show file tree
Hide file tree
Showing 16 changed files with 455 additions and 9 deletions.
4 changes: 4 additions & 0 deletions docs/dictionary/en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ dataplane
dataplanedeployments
dataplanenodeset
dataplanenodesets
dataset
dcn
dd'd
ddr
Expand Down Expand Up @@ -206,6 +207,7 @@ igmp
igogicbjyxbzig
ihbyb
img
influxdb
ingressvips
ini
init
Expand Down Expand Up @@ -319,6 +321,8 @@ nfs
nftables
nic
nigzpbgugpsavdmfyl
njmon
njmonchart
nlcggvjgnsdxn
nmcli
nmstate
Expand Down
1 change: 1 addition & 0 deletions docs/source/usage/01_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ are shared among multiple roles:
- `cifmw_parent_scenario`: (String or List(String)) path to existing scenario/parameter file to inherit from.
- `cifmw_configure_switches`: (Bool) Specifies whether switches should be configured. Computes in `reproducer.yml` playbook. Defaults to `false`.
- `cifmw_crc_default_network`: (String) name of the untagged network used to address DNS on the crc node. Default is `default`.
- `cifmw_monitoring`: (Bool) Enable metric collection via njmon on the hypervisor. Defaults to `false`.

```{admonition} Words of caution
:class: danger
Expand Down
7 changes: 7 additions & 0 deletions reproducer-clean.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,13 @@
path: "{{ lookup('env', 'HOME') }}/ci-framework-data/ci-reproducer"
state: absent

- name: Remove njmon related data
tags:
- deepscrub
ansible.builtin.import_role:
name: "njmon"
tasks_from: "cleanup.yml"

- name: Remove basedir
tags:
- never
Expand Down
34 changes: 25 additions & 9 deletions reproducer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,12 @@
tasks_from: rhos_release.yml
roles:
- role: ci_setup
tasks:
- name: Deploy monitoring if requested
when:
- cifmw_monitoring | default(false) | bool
ansible.builtin.include_role:
name: njmon

- name: Prepare switches
vars:
Expand All @@ -77,12 +83,22 @@
ansible.builtin.command: # noqa: command-instead-of-module
cmd: iptables -I LIBVIRT_FWI 1 -o ocpbm -j ACCEPT

- name: Run deployment if instructed to
when:
- cifmw_deploy_architecture | default(false) | bool
no_log: "{{ cifmw_nolog | default(true) | bool }}"
async: "{{ 7200 + cifmw_test_operator_timeout | default(3600) }}" # 2h should be enough to deploy EDPM and rest for tests.
poll: 20
delegate_to: controller-0
ansible.builtin.command:
cmd: "/home/zuul/deploy-architecture.sh"
- name: Try/always pattern
block:
- name: Run deployment if instructed to
when:
- cifmw_deploy_architecture | default(false) | bool
no_log: "{{ cifmw_nolog | default(true) | bool }}"
async: "{{ 7200 + cifmw_test_operator_timeout | default(3600) }}" # 2h should be enough to deploy EDPM and rest for tests.
poll: 20
delegate_to: controller-0
ansible.builtin.command:
cmd: "/home/zuul/deploy-architecture.sh"

always:
- name: Generate metrics if needed
when:
- cifmw_monitoring | default(false) | bool
ansible.builtin.include_role:
name: njmon
tasks_from: chart.yml
63 changes: 63 additions & 0 deletions roles/njmon/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# njmon

Install [njmon](https://nmon.sourceforge.io/pmwiki.php?n=Site.Njmon) from sources,
configure it and run it in the background.

## Privilege escalation
None

## Parameters
* `cifmw_njmon_basedir`: Base directory. Defaults to `{{ cifmw_basedir }}` which defaults to `~/ci-framework-data`.
* `cifmw_njmon_repository`: njmon repository. Defaults to `http://sourceforge.net/projects/nmon/files`.
* `cifmw_njmon_release`: njmon release. Defaults to `v83`.
* `cifmw_njmon_archive`: njmon archive name. Defaults to `njmon_linux_{{ cifmw_njmon_release }}.zip`.
* `cifmw_njmon_output_dir`: Output directory for njmon data. Defaults to `{{ cifmw_njmon_basedir }}/artifacts/njmon_stats`.
* `cifmw_njmon_options`: Additional njmon options. Defaults to `[]`.
* `cifmw_njmon_chart_release`: njmonchart release. Defaults to `v40`.
* `cifmw_njmon_chart_archive`: njmonchart archive name. Defaults to `njmonchart_{{ cifmw_njmon_chart_release }}.zip`.

### Default options
By default, we inject the following options via the `cifmw_njmon_default_opts` parameter:
```
-m {{ cifmw_njmon_output_dir }}
-K {{ cifmw_njmon_basedir }}/tmp/njmon.pid
-f
-s 10
-n
```
It is NOT recommended to change those default options.

## How to visualize data

### InfluxDB

In case you have a grafana infrastructure, you can inject the needed parameters to instruct
njmon to ship its data to the InfluxDB. Check `njmon -h` or the website for the correct options.

### njmonchart

You can fetch [njmonchart](https://nmon.sourceforge.io/pmwiki.php?n=Site.Njmon) from the website,
and run it against the dataset. It will generate not-so-beautiful, yet useful charts to visualize
the resources.

You can also import the `chart.yml` tasks to get the njmonchart binary (see examples).

## Examples

```yaml
- name: Deploy and stat njmon
ansible.builtin.import_role:
name: njmon

# do your other tasks, resources will be recorded

- name: Install njmonchart and generate HTML outputs
ansible.builtin.import_role:
name: njmon
tasks_from: chart.yml

- name: Cleanup njmon
ansible.builtin.import_role:
name: njmon
tasks_from: cleanup.yml
```
31 changes: 31 additions & 0 deletions roles/njmon/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
# Copyright Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


# All variables intended for modification should be placed in this file.

cifmw_njmon_basedir: >-
{{
cifmw_basedir | default(ansible_user_dir ~ '/ci-framework-data')
}}
cifmw_njmon_repository: http://sourceforge.net/projects/nmon/files
cifmw_njmon_release: "v83"
cifmw_njmon_archive: "njmon_linux_{{ cifmw_njmon_release }}.zip"
cifmw_njmon_output_dir: "{{ cifmw_njmon_basedir }}/artifacts/njmon_stats"
cifmw_njmon_options: []
# njmonchart related content
cifmw_njmon_chart_release: "v40"
cifmw_njmon_chart_archive: "njmonchart_{{ cifmw_njmon_chart_release }}.zip"
Empty file added roles/njmon/files/.gitkeep
Empty file.
30 changes: 30 additions & 0 deletions roles/njmon/meta/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
# Copyright Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


galaxy_info:
author: CI Framework
description: CI Framework Role -- njmon
company: Red Hat
license: Apache-2.0
min_ansible_version: "2.14"
namespace: cifmw
galaxy_tags:
- cifmw

# List your role dependencies here, one per line. Be sure to remove the '[]' above,
# if you add dependencies to this list.
dependencies: []
38 changes: 38 additions & 0 deletions roles/njmon/molecule/default/converge.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
# Copyright Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


- name: Converge
hosts: all
roles:
- role: "njmon"
tasks:
# By default, we dump njmon data each 10 seconds.
# Waiting 30 seconds should allow to get 3 sets
# of data, allowing to generate graphs.
- name: Wait 30 seconds before generating graphs
ansible.builtin.pause:
seconds: 30

- name: Install njmonchart and generate graphs
ansible.builtin.import_role:
name: "njmon"
tasks_from: "chart.yml"

- name: Clean njmon
ansible.builtin.import_role:
name: "njmon"
tasks_from: "cleanup.yml"
11 changes: 11 additions & 0 deletions roles/njmon/molecule/default/molecule.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
# Mainly used to override the defaults set in .config/molecule/
# By default, it uses the "config_podman.yml" - in CI, it will use
# "config_local.yml".
log: true

provisioner:
name: ansible
log: true
env:
ANSIBLE_STDOUT_CALLBACK: yaml
54 changes: 54 additions & 0 deletions roles/njmon/tasks/chart.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
# Copyright Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.

- name: Create directories
ansible.builtin.file:
mode: "0755"
path: "{{ cifmw_njmon_basedir }}/tmp/njmonchart"
state: directory

- name: Get njmonchart archive
ansible.builtin.unarchive:
dest: "{{ cifmw_njmon_basedir }}/tmp/njmonchart"
remote_src: true
src: "{{ cifmw_njmon_repository }}/{{ cifmw_njmon_chart_archive }}"

- name: Gather existing JSONs
register: _njmon_jsons
ansible.builtin.find:
path: "{{ cifmw_njmon_output_dir }}"
pattern: "*.json"

# We don't want to use the `creates` parameter here:
# the JSON file is created once when njmon starts, and gets
# updates during the whole lifetime of the application.
# It means that, if we run multiple times the role, we'll
# get updated charts every time.
- name: Generate HTML from JSONs
ansible.builtin.command:
chdir: "{{ cifmw_njmon_basedir }}/tmp/njmonchart"
cmd: >-
python3 njmonchart_linux_{{ cifmw_njmon_chart_release }}.py
{{ item.path }}
loop: "{{ _njmon_jsons.files }}"
loop_control:
label: "{{ item.path | basename }}"

- name: Output HTML location
ansible.builtin.debug:
msg: >-
You can find the generated HTML files in
{{ cifmw_njmon_output_dir }}
65 changes: 65 additions & 0 deletions roles/njmon/tasks/cleanup.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
# Copyright Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.

- name: Ensure we have a PID file
register: _njmon_running
ansible.builtin.stat:
path: "{{ cifmw_njmon_basedir }}/tmp/njmon.pid"
get_attributes: false
get_checksum: false
get_mime: false

- name: Manage service if we have a PID
when:
- _njmon_running.stat.exists
block:
- name: Get njmon PID
register: _njmon_pid
ansible.builtin.slurp:
path: "{{ cifmw_njmon_basedir }}/tmp/njmon.pid"

# It may happen the service is already dead. Let's not
# fail the playbook in case "kill" can't find the PID.
- name: Kill njmon using its PID
vars:
_pid: "{{ _njmon_pid.content | b64decode }}"
failed_when: false
ansible.builtin.command:
cmd: "kill {{ _pid }}"

- name: Remove temporary directories
ansible.builtin.file:
path: "{{ item }}"
state: absent
loop:
- "{{ cifmw_njmon_basedir }}/tmp/njmon"
- "{{ cifmw_njmon_basedir }}/tmp/njmonchart"

- name: Remove njmon binary
ansible.builtin.file:
path: "{{ ansible_user_dir }}/bin/njmon"
state: absent

- name: Remove configuration and data (deepscrub only)
tags:
- never
- deepscrub
ansible.builtin.file:
path: "{{ item }}"
state: absent
loop:
- "{{ cifmw_njmon_basedir }}/artifacts/njmon_opts.txt"
- "{{ cifmw_njmon_output_dir }}"
Loading

0 comments on commit d468184

Please sign in to comment.