It is being worked on in the open in an effort to replace the node manager plugin. Please report any bugs by creating an issue.
Plugin to collect inventory, health, power/thermal related metrics from platforms which expose such data through Intel Node Manager, DCMI or generic IPMI interfaces. Currently it is using Ipmitool or open ipmi driver to collect data. This plugin is based on the snap-plugin-collector-node-manager and extend the coverage to more IPMI enabled platforms.
Plugin collects specified metrics in-band on OS level
- Support server platforms with Intel Node Manager, DCMI or generic IPMI support.
- Currently it works only on Linux Servers (will be tested on a subset of Linux distributions)
- Ipmitool needs to be installed on platform
You can get the pre-built binaries for your OS and architecture at snap's Github Releases page.
Fork https://github.com/intelsdi-x/snap-plugin-collector-intel-dcm-platform
Clone repo into $GOPATH/src/github/intelsdi-x/
:
$ git clone https://github.com/<yourGithubID>/snap-plugin-collector-intel-dcm-platform
Build the plugin by running make in repo:
$ make
This builds the plugin in /build/rootfs
On OS level user needs to load modules:
- ipmi_msghandler
- ipmi_devintf
- ipmi_si
Those modules provides specific IPMI device which can collect data from NM, DCMI or generic IPMI
There are currently 7 configuration options:
- mode - defines mode of plugin work, possible values: legacy_inband, legacy_inband_openipmi, oob
- channel - defines communication channel address (default: "0x00")
- slave - defines target address (default: "0x00")
- user - for OOB mode only, user for authentication to remote host
- password - for OOB mode only, password for authentication to remote host
- host - for OOB mode only, BMC IP address of host which will be monitored OOB
- protocol - defines the communication protocol used to collect metric data, possible values: node_manager, dcmi, ipmi
Sample configuration of intel dcm platform plugin:
{
"control" : {
"plugins": {
"collector": {
"intel-dcm-platform": {
"all": {
"protocol": "node_manager",
"mode": "legacy_inband",
"channel": "0x06",
"slave": "0x2C"
}
}
}
}
}
}
This plugin has the ability to gather the following metrics:
Namespace | Data Type | Description (optional) |
---|---|---|
/intel/dcm/airflow/cur | uint16 | Current Volumetric Airflow |
/intel/dcm/airflow/avg | uint16 | Average Volumetric Airflow |
/intel/dcm/airflow/max | uint16 | Maximal Volumetric Airflow |
/intel/dcm/airflow/min | uint16 | Minimal Volumetric Airflow |
/intel/dcm/cups/cpu_cstate | uint16 | CUPS CPU Bandwidth |
/intel/dcm/cups/io_bandwith | uint16 | CUPS I/O Bandwidth |
/intel/dcm/cups/memory_bandwith | uint16 | CUPS Memory Bandwidth |
/intel/dcm/power/cpu/cur | uint16 | Current CPU power consumption |
/intel/dcm/power/cpu/avg | uint16 | Average CPU power consumption |
/intel/dcm/power/cpu/max | uint16 | Maximal CPU power consumption |
/intel/dcm/power/cpu/min | uint16 | Minimal CPU power consumption |
/intel/dcm/power/policy/power_limit | uint16 | Power policy |
/intel/dcm/margin/cpu/tj | uint16 | Margin-to-throttle functional (CPU) |
/intel/dcm/margin/cpu/tj/margin_offset | uint16 | Margin-to-spec reliability (CPU) |
/intel/dcm/power/memory/cur | uint16 | Current Memory power consumption |
/intel/dcm/power/memory/avg | uint16 | Average Memory power consumption |
/intel/dcm/power/memory/max | uint16 | Maximal Memory power consumption |
/intel/dcm/power/memory/min | uint16 | Minimal Memory power consumption |
/intel/dcm/power/system/cur | uint16 | Current Platform power consumption |
/intel/dcm/power/system/avg | uint16 | Average Platform power consumption |
/intel/dcm/power/system/max | uint16 | Maximal Platform power consumption |
/intel/dcm/power/system/min | uint16 | Minimal Platform power consumption |
/intel/dcm/temperature/cpu/cpu/<cpu_id> | uint16 | Current CPU temperature |
/intel/dcm/temperature/pmbus/VR/<VR_id> | uint16 | Current VR's temperature |
/intel/dcm/temperature/memory/dimm/<dimm_id> | uint16 | Current Memory dimms temperature |
/intel/dcm/temperature/outlet/cur | uint16 | Current Outlet (exhaust air) temperature |
/intel/dcm/temperature/outlet/avg | uint16 | Average Outlet (exhaust air) temperature |
/intel/dcm/temperature/outlet/max | uint16 | Maximal Outlet (exhaust air) temperature |
/intel/dcm/temperature/outlet/min | uint16 | Minimal Outlet (exhaust air) temperature |
/intel/dcm/temperature/inlet/cur | uint16 | Current Inlet Temperature |
/intel/dcm/temperature/inlet/avg | uint16 | Average Inlet Temperature |
/intel/dcm/temperature/inlet/max | uint16 | Maximal Inlet Temperature |
/intel/dcm/temperature/inlet/min | uint16 | Minimal Inlet Temperature |
/intel/dcm/inventory/firmware_version | string | Version of management firmware |
/intel/dcm/inventory/bmc_mac | string | MAC address string of BMC |
/intel/dcm/inventory/product_manufacturer | string | Product Manufacturer name queried from FRU |
/intel/dcm/inventory/product_name | string | Product Name queried from FRU |
/intel/dcm/inventory/product_serial | string | Product Serial number queried from FRU |
/intel/dcm/health/processor | string | "OK" for good state and other message for corresponding processor error |
/intel/dcm/health/memory | string | "OK" for good state and other message for corresponding memory error |
/intel/dcm/health/fan | string | "OK" for good state and other message for corresponding fan error |
/intel/dcm/health/powersupply | string | "OK" for good state and other message for corresponding power supply error |
/intel/dcm/health/driverslot | string | "OK" for good state and other message for corresponding driver error |
Namespace | Tag | Description |
---|---|---|
/intel/dcm/* | source | Host IP address |
Example task manifest to use Intel OPEN DCM Platform plugin:
{
"version": 1,
"schedule": {
"type": "simple",
"interval": "5s"
},
"workflow": {
"collect": {
"metrics": {
"/intel/dcm/power/system/avg": {},
"/intel/dcm/power/system/max": {},
"/intel/dcm/power/system/min": {},
"/intel/dcm/inventory/product_name ": {},
"/intel/dcm/inventory/product_manufacturer ":{},
"/intel/dcm/inventory/firmware_version":{},
"/intel/dcm/health/powersupply":{},
"/intel/dcm/health/fan":{},
"/intel/dcm/health/processor":{},
"/intel/dcm/thermal/inlet/cur":{},
"/intel/dcm/thermal/inlet/max":{},
},
"config": {
},
"process": null,
"publish": [
{
"plugin_name": "file",
"plugin_version": 2,
"config": {
"file": "/tmp/published_dcminfo"
}
}
]
}
}
}
As we launch this plugin, we have a few items in mind for the next release:
- Remove IPMI tool support
- More health info support
- Scalability support for multiple hosts
This repository is one of many plugins in snap, a powerful telemetry framework. See the full project at http://github.com/intelsdi-x/snap To reach out to other users, head to the main framework
We love contributions!
There's more than one way to give back, from examples to blogs to code updates. See our recommended process in CONTRIBUTING.md.
snap, along with this plugin, is an Open Source software released under the Apache 2.0 License.
- Author: Dancy Ding
- Author: Xin Dong
- Author: Jialei Wang
And thank you! Your contribution, through code and participation, is incredibly important to us.