Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] cake plugins #941

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

RubenKelevra
Copy link
Contributor

@RubenKelevra RubenKelevra commented Sep 11, 2018

cake.cake_ renders the drops, overlimits and requeues as well as the backlog
cake_wlp3s0-day

cake.cake_tin_ renders the packets sent in each tin
cake_tin_wlp3s0-day
This plugin can handle diffserv3, diffserv4 and diffserv8... while diffserv8 is somewhat incompatible with the others, since the tins are named differently.

@sumpfralle thanks for your initial feedback.

  • No, the arrays won't work on POSIX shells, so I changed the shell required to bash.
  • I thought again about the name. But it looks still fine to me.
  • I really dislike those multigraph plugins, since they, well multigraph, which would be a mess with all those values. So I think I'll stick with the stacked area graphs per interface - thanks for your idea on that, anyway. :)
  • The exit 1 for autoconf is exactly like on network.tc_, so I like to stick with this for the moment. If you think, that's an issue, feel free to change my plugin as well, when you fix network.tc_

@dtaht
Copy link

dtaht commented Sep 11, 2018

This is very cool. A couple notes:

  1. I like seeing the backlog stat!
  2. marks are happening more often now that apple deployed ecn.

Having a daily stat would be awesome

@RubenKelevra
Copy link
Contributor Author

@dtaht
the first plugin already supports backlog. The queue was just always empty, that's why it doesn't show up on that graphs.

The issue with this is, the value is just evaluated once in a 5 minute period, so it's not an average... (I currently have no idea how to fix this - without a service). But over a period of a day it should still show up a nice graph.

Marks is a great idea. :)

cake.cake_  renders the drops, overlimits and requeues as well as the backlog
cake.cake_tin_ renders the packets sent in each tin
@RubenKelevra RubenKelevra changed the title cake: add two simple plugins [WIP] cake plugins Sep 11, 2018
@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented Sep 11, 2018

I'm currently working on some more plugins after some reconsidering and feedback. :)

cake tin delay graph:
cake_tin_delay_wlp3s0-day

@RubenKelevra
Copy link
Contributor Author

@dtaht there's your ecn graph - quite boring without ecn capable server.

cake_tin_ecn_wlp3s0-day
:

@tohojo
Copy link

tohojo commented Sep 11, 2018

I would suggest using the JSON output of TC rather than trying to parse the human-readable output with regexes. Especially if you want to get at the tin stats...

Any version of TC that has support for CAKE is also going to support JSON output...

as well as up to days, not that I would expect those numbers, but...
@RubenKelevra
Copy link
Contributor Author

@tohojo
Hey Toke,

thanks for the idea, but parsing JSON in shellscript isn't an easy task. This is a task Python is suited much better for - which means a complete rewrite and quite a lot overhead for loading the python interpreter for each call.

I think I stick with simple screen parsing, also since the original network.tc_-plugin is written the same way.

Any thoughts on the graphs itself?

@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented Sep 11, 2018

The cake_tin_throughput_ looks boring, but should show some quite interesting graphs on systems with more traffic.

It combines the traffic limit set for cake with the average traffic for each tin.

cake_tin_throughput_wlp3s0-day

@tohojo
Copy link

tohojo commented Sep 11, 2018

Munin doesn't have some kind of json parsing facility? You could also depend on something like jq for the parsing...

As for the graphs themselves:

The backlog/dropped/overlimits/requeues graph is not CAKE-specific. Not sure how Munin organises this; is there a generic qdisc plugin as well / in addition to?

Are you handling a varying number of tins correctly?

@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented Sep 11, 2018

@tohojo
Well, no. Munin just calls executable files and grab the standard output. So you can build it in any language you like. Calling 10 python scripts every 5 minutes just to measure some numbers for one IF is likely an overkill.

The backlog/dropped/overlimits/requeues graph is not CAKE-specific. Not sure how Munin organises this; is there a generic qdisc plugin as well / in addition to?

Yes, theres tc_ in the network folder. But since it's designed for other output it probably won't output the stuff you like to see. It's also a lot bigger. I just wanted to see the drops and the backlog for cake, so I cut the tc_-plugin down, to just graph those. Munin handles this change fine, since the plugin has a different name.

Are you handling a varying number of tins correctly?

Those plugins read the settings with the keywords listed by tc, so diffserv3, diffserv4 and diffserv8 are implemented. They are named like the tc output, so for diffserv8 it's Tin 1, Tin 2... etc.

If you set best effort, the plugin will not longer output anything, if it depends on the diffserv's. Which means the graph disappears.

Edit: I meant not flowblind but best effort, corrected.

@tohojo
Copy link

tohojo commented Sep 11, 2018 via email

@dtaht
Copy link

dtaht commented Sep 11, 2018

I would plot ecn marks and drops on the same graph. And a five minute summary of those is probably fine for many systems. We really don't drop all that much. Can a log scale be used? In which case you could actually include packets. cool work, exciting! I wonder if snmp has a mib that includes ecn....

@heistp
Copy link

heistp commented Sep 11, 2018

Cool, I might use of this at an ISP in the process of trying to get cake deployed. I was hoping we'd be able to look at some of the cake tin stats alongside the SmokePing graphs.

I do second the idea that using jq might make the parsing more robust, and it's pretty lightweight, but I also understand if you're not ok with the dependency. Thanks for the effort!

@RubenKelevra
Copy link
Contributor Author

@tohojo

Right, fair enough; you're the one who has to update the script if the
output parsing breaks, so obviously your call ;)

I'm fine with this. ;)

Right. Well, mostly a UI issue, I guess; and up to the Munin upstream if
they want two different-but-related plugins :)

Munin-Plugins are separated from this community contribution repo and an official one. So the barrier for community plugins is lower. :)

What happens if the CAKE configuration changes while the same monitor
instance is running?

The graph is instantly reconfigured. This means old values are probably disappearing, cause they are incompatible and the new ones appear.

@RubenKelevra
Copy link
Contributor Author

@dtaht

I would plot ecn marks and drops on the same graph.

yes, thats an good idea. I'll implement it that way, today.

Can a log scale be used?

Yep, that's possible. So you want packets, drops and ecn marks in one graph? That could be too much if you consider 8 tins...

I guess individual graphs would be better than. We could use logarithm scale and a stacked graph as area for packets and drops, so you get easily the proposition and a line for ecn marks. That's the way the first graph is build, backlog is a line and everything else is a stacked area.

But I'm not sure about the direction, packets at the bottom, and a line to see how many packets are ecn marked?

cool work, exciting!

Graphs of stuff you build are always exiting. Great to hear that from you, too. :)

@RubenKelevra
Copy link
Contributor Author

@heistp

Cool, I might use of this at an ISP in the process of trying to get cake deployed. I was hoping we'd be able to look at some of the cake tin stats alongside the SmokePing graphs.

I do second the idea that using jq might make the parsing more robust, and it's pretty lightweight, but I also understand if you're not ok with the dependency. Thanks for the effort!

Hey cool!

I would recommend to deploy it in a test environment first, to see if the graphs are working right. I'm especially worried about cake_tin_throughput_. It "should" work the way it's inteded, but I haven't found the time to check it with some synthetic throughput to measure that it's right.

I guess it would be also great if we can get some graphs with real traffic back, for a subpage on bufferbloat.net to show how it's working. :)

@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented Sep 12, 2018

I'll push today a change which makes the plugins able to detect the corresponding ingress device, by parsing tc filter output of the configured device for the plugin one.

If the ifb device is found, the graph is reconfigured to show the ingress traffic below, like you're used to by the traffic graphs.

Note that Munin is might not capable of handling reconfiguring graphs as expected. It could be wise to delete the corresponding rrd files after updating your plugin files with something like this:

# rm -f /var/lib/munin/myhostname/myhostname-cake_*

To use the plugins you just need to install them in e.g. /usr/lib/munin/plugins/, they need to be chmod 755 and linked to the normal network interface you like to monitor like:

# ln -s /usr/lib/munin/plugins/cake_tin_throughput_ /etc/munin/plugins//usr/lib/munin/plugins/cake_tin_throughput_wlp3s0

The corresponding virtual interface with the ingress traffic of wlp3s0 mirrored, to allow handling of those with cake too, is detected automatically and plotted downwards in the graphs.

If there's no virtual mirrored interface, the graphs look like before.

Here are some examples:

cake_wlp3s0-day
cake_tin_pps_wlp3s0-day
cake_tin_delay_wlp3s0-day

@gamanakis
Copy link

gamanakis commented Sep 17, 2018

@RubenKelevra thanks for this!
I set it up on my router, but some scripts cannot read some of the stats.
The values that are reported, seem to be correct though.

cake_tin_delay:

bulk.value 0.000039000
besteffort.value
voice.value

cake_tin_throughput:

bandwidth.value 19000000
bulk.value 30725536
besteffort.value
voice.value

@RubenKelevra
Copy link
Contributor Author

@sumpfralle

Personally I would really like to use these plugins in the OpenWrt routers of our local wireless community. Thus I would appreciate sh-only plugins.

What do you think about a cut down version in a personal repo as an openwrt feed? This would allow to develop this plugins without any restrictions and the sh-ones to be much smaller - I guess you don't want to run munin-node anway, so a more simple output you can push per http or similar would be enough, right?

@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented Sep 18, 2018

@gamanakis

thanks for the feedback, can you please quote the output of tc -s qdisc show dev YOURDEV for debug purposes, please? thanks!

Also the tc and sch_cake version would be nice.

@gamanakis
Copy link

I am runnning sch_cake/tc from repos:
github.com/dtaht/sch_cake
github.com/dtaht/tc-adv

root@arch:~# tc -s qdisc show dev br-lan
qdisc cake 800e: root refcnt 2 bandwidth 18Mbit diffserv3 dual-dsthost ingress split-gso rtt 100.0ms noatm overhead 18 mpu 64 
 Sent 3458265254 bytes 2370831 pkt (dropped 1417, overlimits 4457648 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 582625b of 4Mb
 capacity estimate: 18Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       64 /    1518
 average network hdr offset:           14

                   Bulk  Best Effort        Voice
  thresh       1125Kbit       18Mbit     4500Kbit
  target         16.1ms        5.0ms        5.0ms
  interval      111.1ms      100.0ms      100.0ms
  pk_delay        7.3ms         88us        253us
  av_delay        1.6ms         15us         12us
  sp_delay          2us          2us          2us
  backlog            0b           0b           0b
  pkts          2356277        11172         4799
  bytes      3458013591      2041653       341452
  way_inds         4425          140            0
  way_miss         4865         7942           25
  way_cols            0            0            0
  drops            1415            2            0
  marks              32            0            0
  ack_drop            0            0            0
  sp_flows           25            3            1
  bk_flows            2            0            0
  un_flows            0            0            0
  max_len         21196        13626         6056
  quantum           300          549          300

root@arch:~# ./cake_tin_delay_br-lan 
bulk.value 0.001600000
besteffort.value 
voice.value 

@sumpfralle
Copy link
Collaborator

Personally I would really like to use these plugins in the OpenWrt routers of our local wireless community. Thus I would appreciate sh-only plugins.

What do you think about a cut down version in a personal repo as an openwrt feed? This would allow to develop this plugins without any restrictions and the sh-ones to be much smaller [..]

Using ash/dash-compatible features (instead of bash) is in my experience rarely a real restriction.
Only 17 out of 161 of munin's core shell plugins use bash. Thus it does not seem to be a tough challenge :)

I guess you don't want to run munin-node anway, so a more simple output you can push per http or similar would be enough, right?

There is the excellent muninlite available in OpenWrt. I use it a lot.

But I can understand, that you may want to stick to the environment you are comfortable with. In this case I will propose changes (after your plugin is merged), that will make it run with sh-only.

@RubenKelevra
Copy link
Contributor Author

Personally I would really like to use these plugins in the OpenWrt routers of our local wireless community. Thus I would appreciate sh-only plugins.

What do you think about a cut down version in a personal repo as an openwrt feed? This would allow to develop this plugins without any restrictions and the sh-ones to be much smaller [..]

Using ash/dash-compatible features (instead of bash) is in my experience rarely a real restriction.
Only 17 out of 161 of munin's core shell plugins use bash. Thus it does not seem to be a tough challenge :)

I guess you don't want to run munin-node anway, so a more simple output you can push per http or similar would be enough, right?

There is the excellent muninlite available in OpenWrt. I use it a lot.

But I can understand, that you may want to stick to the environment you are comfortable with. In this case I will propose changes (after your plugin is merged), that will make it run with sh-only.

Yeah, I'm comfortable with sh it's just that it sometimes looks ugly :D

This, for example, is from me for sh:
https://github.com/VfN-NRW/offline-ssid/blob/master/gluon-offline-ssid/files/usr/sbin/ff-offline-ssid

@RubenKelevra
Copy link
Contributor Author

@gamanakis
Thanks for your report, found the bug and it should be fixed in 8cc759a

@RubenKelevra
Copy link
Contributor Author

@sumpfralle I found some rendering issues after switching to logarithmic scale (I suspect). Maybe you got an idea about that:

cake_tin_pps_wlp3s0-day__
cake_wlp3s0-day__

One direction isn't rendered properly but cut off, while the values below definitely show, that the values are collected correctly.

@tohojo
Copy link

tohojo commented Sep 26, 2018 via email

@sumpfralle
Copy link
Collaborator

How is your progress?
Can I help you somehow?

@dtaht
Copy link

dtaht commented Nov 15, 2018

Yes, I'm rooting for you here, too! Can I help?

@sumpfralle
Copy link
Collaborator

Ping?

@sumpfralle
Copy link
Collaborator

I am willing to help, if something is missing.What needs to be done?

@RubenKelevra
Copy link
Contributor Author

@dtaht @sumpfralle sorry guys those notifcations got somehow lost between work. I'm currently working on a small thing for the next 1-2 days, after this I'll complete this pull request :)

@RubenKelevra
Copy link
Contributor Author

But good thing: The plugins still works, they run unchanged fine. This are the latest graphs I got (I disabled munin a while ago)

cake_tin_delay_wlp3s0-week
cake_tin_throughput_wlp3s0-week
cake_wlp3s0-week

@dtaht
Copy link

dtaht commented Jan 23, 2020 via email

@sumpfralle
Copy link
Collaborator

@RubenKelevra: ping?

Do you need help with some finer details?

@RubenKelevra
Copy link
Contributor Author

@sumpfralle thanks for the offering, my workload just shifted massively due to the current crisis. It's on my to-do list, but I just got no time 🙄

@dtaht
Copy link

dtaht commented Apr 11, 2020

There's kind of a related discussion about putting some stats in luci, also, on the openwrt forums: https://forum.openwrt.org/t/sqm-reporting/59960/5

@dtaht
Copy link

dtaht commented Apr 11, 2020

there is kind of a related discussion about cake and luci over here: https://forum.openwrt.org/t/sqm-reporting/59960/5

@github-actions
Copy link

Stale pull request message

@sumpfralle
Copy link
Collaborator

Ping?

@dtaht
Copy link

dtaht commented Jul 24, 2021

Is this project still alive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

8 participants