Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cisco_telemetry_mdt input plugin does not properly handle subtree format for NXOS DME telemetry #15922

Closed
yusufshalaby opened this issue Sep 21, 2024 · 0 comments · Fixed by #15923
Labels
bug unexpected problem or unintended behavior

Comments

@yusufshalaby
Copy link
Contributor

yusufshalaby commented Sep 21, 2024

Relevant telegraf.conf

[agent]
  ## Log at debug level.
  debug = true
  ## Log only error level messages.
  quiet = false

[[inputs.cisco_telemetry_mdt]]
 transport = "grpc"
 service_address = ":57000"

[[outputs.file]]
 files = ["/tmp/metrics.out"]
 data_format = "influx"

Logs from Telegraf

Note these logs are not useful since there's no error.

telegraf-1  | 2024-09-21T02:54:20Z I! Loading config: /etc/telegraf/telegraf.conf
telegraf-1  | 2024-09-21T02:54:20Z I! Starting Telegraf 1.31.1 brought to you by InfluxData the makers of InfluxDB
telegraf-1  | 2024-09-21T02:54:20Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-store
s
telegraf-1  | 2024-09-21T02:54:20Z I! Loaded inputs: cisco_telemetry_mdt
telegraf-1  | 2024-09-21T02:54:20Z I! Loaded aggregators: 
telegraf-1  | 2024-09-21T02:54:20Z I! Loaded processors: 
telegraf-1  | 2024-09-21T02:54:20Z I! Loaded secretstores: 
telegraf-1  | 2024-09-21T02:54:20Z I! Loaded outputs: file
telegraf-1  | 2024-09-21T02:54:20Z I! Tags enabled: host=c08343f8c7a7
telegraf-1  | 2024-09-21T02:54:20Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"c08343f8c7a7", Flush Interval:10s
telegraf-1  | 2024-09-21T02:54:20Z D! [agent] Initializing plugins
telegraf-1  | 2024-09-21T02:54:20Z D! [agent] Connecting outputs
telegraf-1  | 2024-09-21T02:54:20Z D! [agent] Attempting connection to [outputs.file]
telegraf-1  | 2024-09-21T02:54:20Z D! [agent] Successfully connected to outputs.file
telegraf-1  | 2024-09-21T02:54:20Z D! [agent] Starting service inputs
telegraf-1  | 2024-09-21T02:54:30Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
telegraf-1  | 2024-09-21T02:54:36Z D! [inputs.cisco_telemetry_mdt]  Accepted Cisco MDT GRPC dialout connection from 192.168.65.1:42053
telegraf-1  | 2024-09-21T02:54:36Z D! [inputs.cisco_telemetry_mdt]  No measurement alias for encoding path: sys/intf
telegraf-1  | 2024-09-21T02:54:36Z D! [inputs.cisco_telemetry_mdt]  Closed Cisco MDT GRPC dialout connection from 192.168.65.1:42053
telegraf-1  | 2024-09-21T02:54:40Z D! [outputs.file]  Wrote batch of 65 metrics in 12.849875ms
telegraf-1  | 2024-09-21T02:54:40Z D! [outputs.file]  Buffer fullness: 0 / 10000 metrics
telegraf-1  | 2024-09-21T02:54:41Z D! [inputs.cisco_telemetry_mdt]  Accepted Cisco MDT GRPC dialout connection from 192.168.65.1:42053
telegraf-1  | 2024-09-21T02:54:41Z D! [inputs.cisco_telemetry_mdt]  Closed Cisco MDT GRPC dialout connection from 192.168.65.1:42053
telegraf-1  | 2024-09-21T02:54:46Z D! [inputs.cisco_telemetry_mdt]  Accepted Cisco MDT GRPC dialout connection from 192.168.65.1:42053
telegraf-1  | 2024-09-21T02:54:46Z D! [inputs.cisco_telemetry_mdt]  Closed Cisco MDT GRPC dialout connection from 192.168.65.1:42053
telegraf-1  | 2024-09-21T02:54:50Z D! [outputs.file]  Wrote batch of 130 metrics in 22.650458ms

System info

Telegraf 1.32, NXOS 10.3, Docker Desktop version 27.2.0 on MacOS 14.6.1

Docker

services:
  telegraf:
    image: telegraf:1.32-alpine
    volumes:
      - ${PWD}/telegraf.conf:/etc/telegraf/telegraf.conf:ro
      - ${PWD}/metrics.out:/tmp/metrics.out:rw
    ports:
      - "57000:57000"

Steps to reproduce

  1. Log into an NXOS switch and configure descriptions on some of the interfaces. For example:
interface eth1/1
    descr this_is_eth1/1
interface eth1/2
    descr this_is_eth1/2
interface eth1/3
    descr this_is_eth1/3
  1. Now configure the telemetry on the switch like so:
telemetry
  destination-group 100
    ip address <YOUR_IP> port 57000 protocol gRPC encoding GPB
    use-vrf management
    use-chunking size 4096
  sensor-group interfaces
    path sys/intf query-condition query-target=subtree&target-subtree-class=l1PhysIf
  subscription 200
    dst-grp 100
    snsr-grp interfaces sample-interval 30000
  1. Run docker compose up -d using the given telegraf config on the host with the IP address used above.
  2. Observe metrics.out

Expected behavior

There should be 64 metrics. Measurement for each metric should be sys/intf and tags should be host, path, source,subscription, sys/intf and dn. The dn values take the format of sys/intf/phys-[ethx/y] depending on the interface. The descr field should match the descriptions we set on the switch.

Actual behavior

eth1/1 has eth1/2’s description, eth1/2 has eth1/3’s description, and so on until it loops back around and eth1/64 has eth1/1's description. Here is a sample metric from my metrics.out:

sys/intf,dn=sys/intf/phys-[eth1/1],host=75c755542199,path=sys/intf,source=switch2,subscription=200,sys/intf=sys/intf descr="this_is_eth1/2",dot1qEtherType=33024i,duplex="auto",ethpmCfgFailedTs=0i,ethpmCfgState=0i,id="eth1/1"... 1726757486464000000

I truncated it for brevity.

Additional info

I have recreated this on several different switches running NXOS 10.3. The bug occurs for all DME telemetry queries where query-target=subtree. This returns a flat list of objects which can be mapped back to their DME path using the dn field. The plugin sets dn as a tag, but does so incorrectly, resulting in off by one errors.

I narrowed down the issue in the plugin and will be submitting a PR shortly with a corresponding test case that captures the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
1 participant