-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Ilya Kheifets <[email protected]>
- Loading branch information
1 parent
20ebf8f
commit cbf165b
Showing
8 changed files
with
312 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# Changelog | ||
|
||
## Unreleased | ||
- add metrics dashboard | ||
|
||
### Changed | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,254 @@ | ||
<form version="1.1" theme="dark"> | ||
<label>sc4snmp</label> | ||
<fieldset submitButton="false" autoRun="true"></fieldset> | ||
<row> | ||
<panel> | ||
<title>SNMP polling status</title> | ||
<input type="dropdown" token="poll_status_host" searchWhenChanged="true"> | ||
<label>SNMP device</label> | ||
<choice value="*">all</choice> | ||
<default>*</default> | ||
<initialValue>*</initialValue> | ||
<fieldForLabel>ip</fieldForLabel> | ||
<fieldForValue>ip</fieldForValue> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" "Scheduler: Sending due task sc4snmp;*;*;poll" | rex field=_raw "Sending due task sc4snmp;(?<ip>.+);(?<num>\d+);poll" | stats count by ip</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
</search> | ||
</input> | ||
<chart> | ||
<title>In case of unsuccessful polling status, please copy spl query from this chart and find failed tasks. Explanation of error log messages you can find at the https://splunk.github.io/splunk-connect-for-snmp/main/bestpractices/</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" splunk_connect_for_snmp.snmp.tasks.poll $poll_status_host$ | rex field=_raw "Task splunk_connect_for_snmp.*\[*\] (?<status>\w+)" | where status != "received" | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.axisTitleX.visibility">visible</option> | ||
<option name="charting.axisTitleY.visibility">visible</option> | ||
<option name="charting.axisTitleY2.visibility">visible</option> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.chart.nullValueMode">connect</option> | ||
<option name="charting.drilldown">all</option> | ||
<option name="charting.legend.placement">right</option> | ||
<option name="height">331</option> | ||
<option name="refresh.display">progressbar</option> | ||
<option name="trellis.enabled">0</option> | ||
<drilldown> | ||
<link target="_blank">search?q=index%3D*%20sourcetype%3D%22*%3Acontainer%3Asplunk-connect-for-snmp-*%22%20splunk_connect_for_snmp.snmp.tasks.poll%20$poll_status_host$%20%7C%20rex%20field%3D_raw%20%22Task%20splunk_connect_for_snmp.*%5C%5B*%5C%5D%20(%3F%3Cstatus%3E%5Cw%2B)%22%20%7C%20where%20status%20!%3D%20%22received%22&earliest=-24h@h&latest=now</link> | ||
</drilldown> | ||
</chart> | ||
</panel> | ||
<panel> | ||
<title>SNMP schedule of polling tasks</title> | ||
<input type="dropdown" token="poll_host" searchWhenChanged="true"> | ||
<label>SNMP device</label> | ||
<choice value="*">all</choice> | ||
<default>*</default> | ||
<initialValue>*</initialValue> | ||
<fieldForLabel>ip</fieldForLabel> | ||
<fieldForValue>ip</fieldForValue> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" "Scheduler: Sending due task sc4snmp;*;*;poll" | rex field=_raw "Sending due task sc4snmp;(?<ip>.+);(?<num>\d+);poll" | stats count by ip</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
</search> | ||
</input> | ||
<chart> | ||
<title>Using this chart you can understand when SC4SNMP scheduled polling for your SNMP device last time. The process works if it runs regularly.</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" Scheduler: Sending due task sc4snmp;$poll_host$;*poll | timechart count</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">all</option> | ||
<option name="height">331</option> | ||
<option name="refresh.display">progressbar</option> | ||
<drilldown> | ||
<link target="_blank">search?q=index%3D*%20sourcetype%3D%22*%3Acontainer%3Asplunk-connect-for-snmp-*%22%20Scheduler%3A%20Sending%20due%20task%20sc4snmp%3B$poll_host$%3B*poll&earliest=-24h@h&latest=now</link> | ||
</drilldown> | ||
</chart> | ||
</panel> | ||
</row> | ||
<row> | ||
<panel> | ||
<title>SNMP walk status</title> | ||
<input type="dropdown" token="walk_status_host"> | ||
<label>SNMP device</label> | ||
<choice value="*">all</choice> | ||
<default>*</default> | ||
<initialValue>*</initialValue> | ||
<fieldForLabel>ip</fieldForLabel> | ||
<fieldForValue>ip</fieldForValue> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" "Scheduler: Sending due task sc4snmp;*;walk" | rex field=_raw "Sending due task sc4snmp;(?<ip>.+);walk" | stats count by ip</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
</search> | ||
</input> | ||
<chart> | ||
<title>In case of unsuccessful walk status, please copy spl query from this chart and find failed tasks. Explanation of error log messages you can find at the https://splunk.github.io/splunk-connect-for-snmp/main/bestpractices/</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" splunk_connect_for_snmp.snmp.tasks.walk $walk_status_host$ | rex field=_raw "Task splunk_connect_for_snmp.*\[*\] (?<status>\w+)" | where status != "received" | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">all</option> | ||
<option name="height">327</option> | ||
<option name="refresh.display">progressbar</option> | ||
<drilldown> | ||
<link target="_blank">search?q=index%3D*%20sourcetype%3D%22kube%3Acontainer%3Asplunk-connect-for-snmp-*%22%20splunk_connect_for_snmp.snmp.tasks.walk%20$walk_status_host$%20%7C%20rex%20field%3D_raw%20%22Task%20splunk_connect_for_snmp.*%5C%5B*%5C%5D%20(%3F%3Cstatus%3E%5Cw%2B)%22%20%7C%20where%20status%20!%3D%20%22received%22&earliest=-24h@h&latest=now</link> | ||
</drilldown> | ||
</chart> | ||
</panel> | ||
<panel> | ||
<title>SNMP schedule for walk tasks</title> | ||
<input type="dropdown" token="walk_host"> | ||
<label>SNMP device</label> | ||
<choice value="*">all</choice> | ||
<default>*</default> | ||
<initialValue>*</initialValue> | ||
<fieldForLabel>ip</fieldForLabel> | ||
<fieldForValue>ip</fieldForValue> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" "Scheduler: Sending due task sc4snmp;*;walk" | rex field=_raw "Sending due task sc4snmp;(?<ip>.+);walk" | stats count by ip</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
</search> | ||
</input> | ||
<chart> | ||
<title>Using this chart you can understand when SC4SNMP scheduled walk for your SNMP device last time. The process works if it runs regularly.</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" Scheduler: Sending due task sc4snmp;$walk_host$;walk | timechart count</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">all</option> | ||
<option name="height">324</option> | ||
<option name="refresh.display">progressbar</option> | ||
<drilldown> | ||
<link target="_blank">search?q=index%3D*%20sourcetype%3D%22*%3Acontainer%3Asplunk-connect-for-snmp-*%22%20Scheduler%3A%20Sending%20due%20task%20sc4snmp%3B$walk_host$%3Bwalk&earliest=-24h@h&latest=now</link> | ||
</drilldown> | ||
</chart> | ||
</panel> | ||
</row> | ||
<row> | ||
<panel> | ||
<title>SNMP trap status</title> | ||
<chart> | ||
<title>In case of unsuccessful trap status, please copy spl query from this chart and find failed tasks. Explanation of error log messages you can find at the https://splunk.github.io/splunk-connect-for-snmp/main/bestpractices/</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" splunk_connect_for_snmp.snmp.tasks.trap | rex field=_raw "Task splunk_connect_for_snmp.*\[*\] (?<status>\w+)" | where status != "received" | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">all</option> | ||
<option name="height">332</option> | ||
<option name="refresh.display">progressbar</option> | ||
<drilldown> | ||
<link target="_blank">search?q=index%3D*%20sourcetype%3D%22*%3Acontainer%3Asplunk-connect-for-snmp-*%22%20splunk_connect_for_snmp.snmp.tasks.trap%20%7C%20rex%20field%3D_raw%20%22Task%20splunk_connect_for_snmp.*%5C%5B*%5C%5D%20(%3F%3Cstatus%3E%5Cw%2B)%22%20%7C%20where%20status%20!%3D%20%22received%22&earliest=-24h@h&latest=now</link> | ||
</drilldown> | ||
</chart> | ||
</panel> | ||
<panel> | ||
<title>SNMP trap authorisation</title> | ||
<chart> | ||
<title>If it's not succeeded it means that you have SNMP authorisation problem.</title> | ||
<search> | ||
<query>index=* "ERROR Security Model failure for device" OR "splunk_connect_for_snmp.snmp.tasks.trap\[*\] succeeded" | eval status=if(searchmatch("succeeded"), "succeeded", "failed") | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">all</option> | ||
<option name="height">329</option> | ||
<option name="refresh.display">progressbar</option> | ||
<drilldown> | ||
<link target="_blank">search?q=index%3D*%20%22ERROR%20Security%20Model%20failure%20for%20device%22&earliest=-24h@h&latest=now</link> | ||
</drilldown> | ||
</chart> | ||
</panel> | ||
</row> | ||
<row> | ||
<panel> | ||
<title>SNMP send to Splunk status</title> | ||
<chart> | ||
<title>In case of unsuccessful enrich status, please copy spl query from this chart and find failed tasks. Explanation of error log messages you can find at the https://splunk.github.io/splunk-connect-for-snmp/main/bestpractices/</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" splunk_connect_for_snmp.splunk.tasks.send | rex field=_raw "Task splunk_connect_for_snmp.*\[*\] (?<status>\w+)" | where status != "received" | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">none</option> | ||
<option name="refresh.display">progressbar</option> | ||
</chart> | ||
</panel> | ||
<panel> | ||
<title>SNMP enrich task status</title> | ||
<chart> | ||
<title>In case of unsuccessful enrich status, please copy spl query from this chart and find failed tasks. Explanation of error log messages you can find at the https://splunk.github.io/splunk-connect-for-snmp/main/bestpractices/</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" splunk_connect_for_snmp.enrich.tasks.enrich | rex field=_raw "Task splunk_connect_for_snmp.*\[*\] (?<status>\w+)" | where status != "received" | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">none</option> | ||
<option name="refresh.display">progressbar</option> | ||
</chart> | ||
</panel> | ||
<panel> | ||
<title>SNMP prepare task status</title> | ||
<chart> | ||
<title>In case of unsuccessful enrich status, please copy spl query from this chart and find failed tasks. Explanation of error log messages you can find at the https://splunk.github.io/splunk-connect-for-snmp/main/bestpractices/</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" splunk_connect_for_snmp.splunk.tasks.prepare | rex field=_raw "Task splunk_connect_for_snmp.*\[*\] (?<status>\w+)" | where status != "received" | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">none</option> | ||
<option name="refresh.display">progressbar</option> | ||
</chart> | ||
</panel> | ||
<panel> | ||
<title>SNMP inventory poller task status</title> | ||
<chart> | ||
<title>In case of unsuccessful enrich status, please copy spl query from this chart and find failed tasks. Explanation of error log messages you can find at the https://splunk.github.io/splunk-connect-for-snmp/main/bestpractices/</title> | ||
<search> | ||
<query>index=* sourcetype="*:container:splunk-connect-for-snmp-*" splunk_connect_for_snmp.inventory.tasks.inventory_setup_poller | rex field=_raw "Task splunk_connect_for_snmp.*\[*\] (?<status>\w+)" | where status != "received" | timechart count by status</query> | ||
<earliest>-24h@h</earliest> | ||
<latest>now</latest> | ||
<refresh>5m</refresh> | ||
<refreshType>delay</refreshType> | ||
</search> | ||
<option name="charting.chart">line</option> | ||
<option name="charting.drilldown">none</option> | ||
<option name="refresh.display">progressbar</option> | ||
</chart> | ||
</panel> | ||
</row> | ||
</form> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Dashboard | ||
|
||
Using dashboard you can monitor SC4SNMP and be sure that is healthy and working correctly. | ||
|
||
## Presetting | ||
|
||
1. [Create metrics indexes](gettingstarted/splunk-requirements.md#requirements-for-splunk-enterprise-or-enterprise-cloud) in Splunk. | ||
2. Enable metrics logging for your runtime: | ||
* For K8S install [Splunk OpenTelemetry Collector for K8S](gettingstarted/sck-installation.md) | ||
* For docker-compose use [Splunk logging driver for docker](dockercompose/9-splunk-logging.md) | ||
|
||
## Install dashboard | ||
|
||
1. In Splunk platform open **Search -> Dashboards**. | ||
2. Click on **Create New Dashboard** and make an empty dashboard. Be sure to choose Classic Dashboards. | ||
3. In the **Edit Dashboard** view, go to Source and replace the initial xml with the contents of [dashboard/dashboard.xml](https://github.com/splunk/splunk-connect-for-snmp/blob/main/dashboard/dashboard.xml) published in the SC4SNMP repository. | ||
4. Save your changes. Your dashboard is ready to use. | ||
|
||
|
||
## Metrics explanation | ||
|
||
### Polling dashboards | ||
|
||
To check that polling on your device is working correctly first of all check **SNMP schedule of polling tasks** dashboard. | ||
Using this chart you can understand when SC4SNMP scheduled polling for your SNMP device last time. The process works if it runs regularly. | ||
|
||
After double-checking that SC4SNMP scheduled polling tasks for your SNMP device we need to be sure that polling is working. | ||
For that look at another dashboard **SNMP polling status** and if everything is working you will see only **succeeded** status of polling. | ||
If something is going wrong you will see also another status (like on screenshot), then use [troubleshooting docs for that](bestpractices.md) | ||
|
||
![Polling dashboards](images/dashboard/polling_dashboard.png) | ||
|
||
### Walk dashboards | ||
|
||
To check that walk on your device is working correctly first of all check **SNMP schedule of walk tasks** dashboard. | ||
Using this chart you can understand when SC4SNMP scheduled walk for your SNMP device last time. The process works if it runs regularly. | ||
|
||
After double-checking that SC4SNMP scheduled walk tasks for your SNMP device we need to be sure walk is working. | ||
For that look at another dashboard **SNMP walk status** and if everything is working you will see only **succeeded** status of walk. | ||
If something is going wrong you will see also another status (like on screenshot), then use [troubleshooting docs for that](bestpractices.md) | ||
|
||
![Walk dashboards](images/dashboard/walk_dashboard.png) | ||
|
||
### Trap dashboards | ||
|
||
First of all check **SNMP traps authorisation** dashboard, if you see only **succeeded** status it means that authorisation is configured correctly, otherwise please use [troubleshooting docs for that](bestpractices.md#identifying-traps-issues). | ||
|
||
After checking that we have not any authorisation traps issues we can check that trap tasks are working correctly. For that we need to go **SNMP trap status** dashboard, if we have only **succeeded** status it means that everything is working, otherwise we will see information with another status. | ||
|
||
![Trap dashboards](images/dashboard/trap_dashboard.png) | ||
|
||
### Other dashboards | ||
|
||
We also have tasks that will be a callback for walk and poll. For example **send** will publish result in Splunk. We need to be sure that after successful walk and poll this callbacks finished. Please check that we have only successful status for this tasks. | ||
|
||
![Other dashboards](images/dashboard/other_dashboard.png) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters