Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(probe): Fortiswitch port stats #219

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,37 @@ Global:
* _WebUI/State_
* `fortigate_last_reboot_seconds`
* `fortigate_last_snapshot_seconds`

* _SwitchController/ManageSwitch/PortStats_
* `fortiswitch_status`
* `fortiswitch_port_status`
* `fortiswitch_port_speed_bps`
* `fortiswitch_port_transmit_packets_total`
* `fortiswitch_port_transmit_bytes_total`
* `fortiswitch_port_transmit_unicast_packets_total`
* `fortiswitch_port_transmit_multicast_packets_total`
* `fortiswitch_port_transmit_broadcast_packets_total`
* `fortiswitch_port_transmit_errors_total`
* `fortiswitch_port_transmit_drops_total`
* `fortiswitch_port_transmit_oversized_packets_total`
* `fortiswitch_port_receive_packets_total`
* `fortiswitch_port_receive_bytes_total`
* `fortiswitch_port_receive_unicast_packets_total`
* `fortiswitch_port_receive_multicast_packets_total`
* `fortiswitch_port_receive_broadcast_packets_total`
* `fortiswitch_port_receive_errors_total`
* `fortiswitch_port_receive_drops_total`
* `fortiswitch_port_receive_oversized_packets_total`
* _SwitchController/ManageSwitch/Health_
* `fortiswitch_health_summary_cpu`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Avoid using words like mem when memory works
  • These health metrics names have no unit in them (e.g. add _celsius tofortiswitch_health_temperature

* `fortiswitch_health_summary_mem`
* `fortiswitch_health_summary_uptime`
* `fortiswitch_health_summary_temp`
* `fortiswitch_health_temperature`
* `fortiswitch_health_performance_stats_cpu_user`
* `fortiswitch_health_performance_stats_cpu_system`
* `fortiswitch_health_performance_stats_cpu_idle`
* `fortiswitch_health_performance_stats_cpu_nice`

Per-VDOM:

* _System/VDOMResources_
Expand Down Expand Up @@ -402,6 +432,8 @@ To improve security, limit permissions to required ones only (least privilege pr
|System/Status | *any* |api/v2/monitor/system/status |
|System/Time/Clock | sysgrp.cfg |api/v2/monitor/system/time |
|System/VDOMResources | sysgrp.cfg |api/v2/monitor/system/resource/usage |
|SwitchController/ManageSwitch/PortStats | |api/v2/monitor/switch-controller/managed-switch?port_stats=true |
|SwitchController/ManageSwitch/Health | |api/v2/monitor/switch-controller/managed-switch/health |
|User/Fsso | authgrp |api/v2/monitor/user/fsso |
|VPN/IPSec | vpngrp |api/v2/monitor/vpn/ipsec |
|VPN/Ssl/Connections | vpngrp |api/v2/monitor/vpn/ssl |
Expand Down
151 changes: 151 additions & 0 deletions pkg/probe/fortiswitch_health.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
package probe

import (
"log"

"github.com/bluecmd/fortigate_exporter/pkg/http"
"github.com/prometheus/client_golang/prometheus"
)

func probeSwitchHealth(c http.FortiHTTP, meta *TargetMetadata) ([]prometheus.Metric, bool) {
var (
mSumCPU = prometheus.NewDesc(
"fortiswitch_health_summary_cpu",
"Summary CPU health",
[]string{"rating", "fortiswitch", "VDOM"}, nil,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercase all labels

)
mSumMem = prometheus.NewDesc(
"fortiswitch_health_summary_mem",
"Summary MEM health",
[]string{"rating", "fortiswitch", "VDOM"}, nil,
)
mSumUpTime = prometheus.NewDesc(
"fortiswitch_health_summary_uptime",
"Summary Uptime",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have the documentation it would be really great if these descriptions could be made more verbose so I for example have any idea what these means. But if you don't, it's fine - sometimes we simply don't know

[]string{"rating", "fortiswitch", "VDOM"}, nil,
)
mSumTemp = prometheus.NewDesc(
"fortiswitch_health_summary_temp",
"Summary Temperature health",
[]string{"rating", "fortiswitch", "VDOM"}, nil,
)
mTemp = prometheus.NewDesc(
"fortiswitch_health_temperature",
"Temperature per switch sensor",
[]string{"unit", "module", "fortiswitch", "VDOM"}, nil,
)
mCpuUser = prometheus.NewDesc(
"fortiswitch_health_performance_stats_cpu_user",
"Fortiswitch CPU user usage",
[]string{"unit", "fortiswitch", "VDOM"}, nil,
)
mCpuSystem = prometheus.NewDesc(
"fortiswitch_health_performance_stats_cpu_system",
"Fortiswitch CPU system usage",
[]string{"unit", "fortiswitch", "VDOM"}, nil,
)
mCpuIdle = prometheus.NewDesc(
"fortiswitch_health_performance_stats_cpu_idle",
"Fortiswitch CPU idle",
[]string{"unit", "fortiswitch", "VDOM"}, nil,
)
mCpuNice = prometheus.NewDesc(
"fortiswitch_health_performance_stats_cpu_nice",
"Fortiswitch CPU nice usage",
[]string{"unit", "fortiswitch", "VDOM"}, nil,
)
)
type Sum struct {
Value float64 `json:"value"`
Rating string `json:"rating"`
}
type Status struct {
Value float64 `json:"value"`
Unit string `json:"unit"`
}
type Uptime struct {
Days Status `json:"days"`
Hours Status `json:"hours"`
Minutes Status `json:"minutes"`
}
type Network struct {
In1Min Status `json:"in-1min"`
In10Min Status `json:"in-10min"`
In30Min Status `json:"in-30min"`
}
type Memory struct {
Used Status `json:"used"`
}
type CPU struct {
User Status `json:"user"`
System Status `json:"system"`
Nice Status `json:"nice"`
Idle Status `json:"idle"`
}
type PerformanceStatus struct {
CPU CPU `json:"cpu"`
Memory Memory `json:"memory"`
Network Network `json:"network"`
Uptime Uptime `json:"uptime"`
}
type Temperature struct {
Module string
Status Status
}
type Summary struct {
Overall string `json:"overall"`
CPU Sum
Memory Sum
Uptime Sum
Temperature Sum
}
type Poe struct {
Value int `json:"value"`
MaxValue int `json:"max_value"`
Unit string `json:"unit"`
}
type Results struct {
PerformanceStatus PerformanceStatus `json:"performance-status"`
Temperature []Temperature `json:"temperature"`
Summary Summary `json:"summary"`
Poe Poe `json:"poe"`
}

type swResponse struct {
Results map[string]Results `json:"results"`
VDOM string
}
var r swResponse
//var r map[string]swResponse
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dead code

//var r []swResponse

if err := c.Get("api/v2/monitor/switch-controller/managed-switch/health", "vdom=root", &r); err != nil {
log.Printf("Error: %v", err)
return nil, false
}
m := []prometheus.Metric{}
//for _, sw := range r {
for fswitch, hr := range r.Results {

m = append(m, prometheus.MustNewConstMetric(mSumCPU, prometheus.GaugeValue, hr.Summary.CPU.Value, hr.Summary.CPU.Rating, fswitch, r.VDOM))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rating likely shouldn't be a level ä label but rather a metric like "xxx_is_good". That, or an enum label on a _rating metric. Otherwise it's really hard to use it in alerts.

m = append(m, prometheus.MustNewConstMetric(mSumMem, prometheus.GaugeValue, hr.Summary.Memory.Value, hr.Summary.Memory.Rating, fswitch, r.VDOM))
m = append(m, prometheus.MustNewConstMetric(mSumUpTime, prometheus.GaugeValue, hr.Summary.Uptime.Value, hr.Summary.Uptime.Rating, fswitch, r.VDOM))
m = append(m, prometheus.MustNewConstMetric(mSumTemp, prometheus.GaugeValue, hr.Summary.Temperature.Value, hr.Summary.Temperature.Rating, fswitch, r.VDOM))

for _, ts := range hr.Temperature {
m = append(m, prometheus.MustNewConstMetric(mTemp, prometheus.GaugeValue, ts.Status.Value, ts.Status.Unit, ts.Module, fswitch, r.VDOM))
}

CpuUnit := hr.PerformanceStatus.CPU.System.Unit
/*if CpuUnit == "%" {
CpuUnit = "%%"
}*/

m = append(m, prometheus.MustNewConstMetric(mCpuUser, prometheus.GaugeValue, hr.PerformanceStatus.CPU.User.Value, CpuUnit, fswitch, r.VDOM))
m = append(m, prometheus.MustNewConstMetric(mCpuNice, prometheus.GaugeValue, hr.PerformanceStatus.CPU.Nice.Value, CpuUnit, fswitch, r.VDOM))
m = append(m, prometheus.MustNewConstMetric(mCpuSystem, prometheus.GaugeValue, hr.PerformanceStatus.CPU.System.Value, CpuUnit, fswitch, r.VDOM))
m = append(m, prometheus.MustNewConstMetric(mCpuIdle, prometheus.GaugeValue, hr.PerformanceStatus.CPU.Idle.Value, CpuUnit, fswitch, r.VDOM))
}
//}
return m, true
}
90 changes: 90 additions & 0 deletions pkg/probe/fortiswitch_health_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
package probe

import (
"strings"
"testing"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/testutil"
)

func TestSwitchHealth(t *testing.T) {
c := newFakeClient()
c.prepare("api/v2/monitor/switch-controller/managed-switch/health", "testdata/fsw-health.jsonnet")
r := prometheus.NewPedanticRegistry()
if !testProbe(probeSwitchHealth, c, r) {
t.Errorf("probeSwitchHealth() returned non-success")
}

em := `
# HELP fortiswitch_health_performance_stats_cpu_idle Fortiswitch CPU idle
# TYPE fortiswitch_health_performance_stats_cpu_idle gauge
fortiswitch_health_performance_stats_cpu_idle{VDOM="root",fortiswitch="FS00000000000024",unit="%%"} 100
fortiswitch_health_performance_stats_cpu_idle{VDOM="root",fortiswitch="FS00000000000027",unit="%%"} 100
fortiswitch_health_performance_stats_cpu_idle{VDOM="root",fortiswitch="FS00000000000030",unit="%%"} 100
fortiswitch_health_performance_stats_cpu_idle{VDOM="root",fortiswitch="FS00000000000038",unit="%%"} 100
# HELP fortiswitch_health_performance_stats_cpu_nice Fortiswitch CPU nice usage
# TYPE fortiswitch_health_performance_stats_cpu_nice gauge
fortiswitch_health_performance_stats_cpu_nice{VDOM="root",fortiswitch="FS00000000000024",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_nice{VDOM="root",fortiswitch="FS00000000000027",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_nice{VDOM="root",fortiswitch="FS00000000000030",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_nice{VDOM="root",fortiswitch="FS00000000000038",unit="%%"} 0
# HELP fortiswitch_health_performance_stats_cpu_system Fortiswitch CPU system usage
# TYPE fortiswitch_health_performance_stats_cpu_system gauge
fortiswitch_health_performance_stats_cpu_system{VDOM="root",fortiswitch="FS00000000000024",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_system{VDOM="root",fortiswitch="FS00000000000027",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_system{VDOM="root",fortiswitch="FS00000000000030",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_system{VDOM="root",fortiswitch="FS00000000000038",unit="%%"} 0
# HELP fortiswitch_health_performance_stats_cpu_user Fortiswitch CPU user usage
# TYPE fortiswitch_health_performance_stats_cpu_user gauge
fortiswitch_health_performance_stats_cpu_user{VDOM="root",fortiswitch="FS00000000000024",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_user{VDOM="root",fortiswitch="FS00000000000027",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_user{VDOM="root",fortiswitch="FS00000000000030",unit="%%"} 0
fortiswitch_health_performance_stats_cpu_user{VDOM="root",fortiswitch="FS00000000000038",unit="%%"} 0
# HELP fortiswitch_health_summary_cpu Summary CPU health
# TYPE fortiswitch_health_summary_cpu gauge
fortiswitch_health_summary_cpu{VDOM="root",fortiswitch="FS00000000000024",rating="good"} 0
fortiswitch_health_summary_cpu{VDOM="root",fortiswitch="FS00000000000027",rating="good"} 0
fortiswitch_health_summary_cpu{VDOM="root",fortiswitch="FS00000000000030",rating="good"} 0
fortiswitch_health_summary_cpu{VDOM="root",fortiswitch="FS00000000000038",rating="good"} 0
# HELP fortiswitch_health_summary_mem Summary MEM health
# TYPE fortiswitch_health_summary_mem gauge
fortiswitch_health_summary_mem{VDOM="root",fortiswitch="FS00000000000024",rating="good"} 10
fortiswitch_health_summary_mem{VDOM="root",fortiswitch="FS00000000000027",rating="good"} 15
fortiswitch_health_summary_mem{VDOM="root",fortiswitch="FS00000000000030",rating="good"} 50
fortiswitch_health_summary_mem{VDOM="root",fortiswitch="FS00000000000038",rating="good"} 32
# HELP fortiswitch_health_summary_temp Summary Temperature health
# TYPE fortiswitch_health_summary_temp gauge
fortiswitch_health_summary_temp{VDOM="root",fortiswitch="FS00000000000024",rating="good"} 48.952749999999995
fortiswitch_health_summary_temp{VDOM="root",fortiswitch="FS00000000000027",rating="good"} 46.156000000000006
fortiswitch_health_summary_temp{VDOM="root",fortiswitch="FS00000000000030",rating="good"} 39.71875
fortiswitch_health_summary_temp{VDOM="root",fortiswitch="FS00000000000038",rating="good"} 41.624750000000006
# HELP fortiswitch_health_summary_uptime Summary Uptime
# TYPE fortiswitch_health_summary_uptime gauge
fortiswitch_health_summary_uptime{VDOM="root",fortiswitch="FS00000000000024",rating="good"} 3.928968e+07
fortiswitch_health_summary_uptime{VDOM="root",fortiswitch="FS00000000000027",rating="good"} 3.928974e+07
fortiswitch_health_summary_uptime{VDOM="root",fortiswitch="FS00000000000030",rating="good"} 2.661288e+07
fortiswitch_health_summary_uptime{VDOM="root",fortiswitch="FS00000000000038",rating="good"} 2.661258e+07
# HELP fortiswitch_health_temperature Temperature per switch sensor
# TYPE fortiswitch_health_temperature gauge
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000024",module="sensor1(CPU Board Temp)",unit="celsius"} 41.937
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000024",module="sensor2(MAIN Board Temp1)",unit="celsius"} 63.875
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000024",module="sensor3(MAIN Board Temp2)",unit="celsius"} 51.312
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000024",module="sensor4(MAIN Board Temp3)",unit="celsius"} 38.687
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000027",module="sensor1(CPU Board Temp)",unit="celsius"} 39
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000027",module="sensor2(MAIN Board Temp1)",unit="celsius"} 60.625
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000027",module="sensor3(MAIN Board Temp2)",unit="celsius"} 48.937
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000027",module="sensor4(MAIN Board Temp3)",unit="celsius"} 36.062
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000030",module="sensor1(CPU Board Temp)",unit="celsius"} 33.875
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000030",module="sensor2(MAIN Board Temp1)",unit="celsius"} 53.75
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000030",module="sensor3(MAIN Board Temp2)",unit="celsius"} 41
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000030",module="sensor4(MAIN Board Temp3)",unit="celsius"} 30.25
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000038",module="sensor1(CPU Board Temp)",unit="celsius"} 35.437
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000038",module="sensor2(MAIN Board Temp1)",unit="celsius"} 55.625
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000038",module="sensor3(MAIN Board Temp2)",unit="celsius"} 43.125
fortiswitch_health_temperature{VDOM="root",fortiswitch="FS00000000000038",module="sensor4(MAIN Board Temp3)",unit="celsius"} 32.312
`
if err := testutil.GatherAndCompare(r, strings.NewReader(em)); err != nil {
t.Fatalf("metric compare: err %v", err)
}
}
Loading