Transparent Cake Box

Thank you, I'll check my home setup' results now that I know what to look for.

Lorenzo

Hi moeller0,
I tried running multiple dslr test in parallel and got unexpected results: no fair sharing and increased bufferbloat, so I'd like to try the dual-cake setup.

I think the right configuration is to set ingress or egress for both interfaces, because they face opposite direction (please correct me if I'm wrong), but I don't know which is better and how to apply dual_xxxsource options.

Many thanks
Lorenzo

Since you use flent, could you post the RRUL_CS8 all plot here in the thread and annotate at what times the other windows host ran their speedtests, please?

Correct, but ingress requires an IFB which incurrs some processing cost, so on case of using two interfaces, always instantiate sqm-scripts on egress (by setting the ingress bandwidth to 0, which denotes "do not shape" as "shape to 0" would end up with a non-functional link...)

Hope that helps

Best Regards

I didn't keep the results, next time I'll try both configurations and post all the info.

Bye
Lorenzo

Just to confirm this; flent will automatically save a data file (even if you just requested a plot). So unless you actively deleted that file it should be somewhere on your linux machine, most likely in the directory from which you called flent.
The name would be (for a hypothetical rrul test performed at 2017-06-06):

rrul_cs8-2017-06-06T235936.159804.$YOURNAME.flent.gz

Maybe we are lucky :wink:

Best Regards

I deleted everything, too bad results :wink:
Dual queue setup seems better: http://www.dslreports.com/speedtest/22049399

tc -s qdisc

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 90494382844 bytes 124737125 pkt (dropped 0, overlimits 0 requeues 10)
backlog 0b 0p requeues 10
maxpacket 1514 drop_overlimit 0 new_flow_count 26 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-sqm root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 800f: dev eth0.1 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-dsthost rtt 100.0ms raw
Sent 334242905 bytes 378520 pkt (dropped 10576, overlimits 466115 requeues 0)
backlog 0b 0p requeues 0
memory used: 1411200b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 0us 775us 382us
av_delay 0us 44us 61us
sp_delay 0us 8us 11us
pkts 0 388540 556
bytes 0 350124323 33914
way_inds 0 11158 0
way_miss 0 9256 21
way_cols 0 0 0
drops 0 10576 0
marks 0 1 0
sp_flows 0 1 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 90

qdisc cake 8011: dev eth0.2 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-srchost rtt 100.0ms raw
Sent 203765278 bytes 366162 pkt (dropped 1734, overlimits 260102 requeues 0)
backlog 0b 0p requeues 0
memory used: 185472b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 0us 2.1ms 263us
av_delay 0us 139us 30us
sp_delay 0us 9us 12us
pkts 0 366987 909
bytes 0 206316066 56386
way_inds 0 20898 0
way_miss 0 10093 14
way_cols 0 0 0
drops 0 1734 0
marks 0 0 0
sp_flows 0 1 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 167

qdisc mq 0: dev wlan0 root
Sent 4059264 bytes 14492 pkt (dropped 0, overlimits 0 requeues 179)
backlog 0b 0p requeues 179
qdisc fq_codel 0: dev wlan0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 49300 bytes 404 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 4009964 bytes 14088 pkt (dropped 0, overlimits 0 requeues 179)
backlog 0b 0p requeues 179
maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0

I'll post flent results in the next days, now I need to work!

Lorenzo

Hi all,
some news from my tests about link layer adaptation: I found an optimal value of 12 bytes by trial and error, I applied it only to the WAN iface.

rrul_cs8

double-layer-cake-lla12-all-scaled

Now I'm trying to get fair sharing, but running 2 parallel flent test from different machines against the same server (flent-london.bufferbloat.net) gave me this results:

parallell rrul_cs8

lla-12 is from the previous test
double-layer-cake-lla12-box-totals

What's going wrong?

p.s.
I have the flent results if it can be useful

Thanks
L.

Interesting, could you post the output of "cat /etc/config/sqm", "tc -d qdisc" and "tc -s qdisc" again please?

Not sure, the bandwidth sharing looks roughly okay, but the latency skyrockets. I wonder, could you repeat that test again with both shapers set to 15000? Ingress shaping is a bit approximate and will generally need more playroom the more flows you have. The thing is qdisc shapers traditionally shape their output to the desired rate, but that most of the time there are more packets coming in that are dropped, but for ingress shaping that behaviour is not ideal. Cake's principal author believes he has a solution for that (by making cake attempt to shape its incoming rate) but that is still in testing.

Best Regards

I will try lowering the shapers, for now the output you asked for:

cat /etc/config/sqm

config queue 'eth1'
option ingress_ecn 'ECN'
option itarget 'auto'
option etarget 'auto'
option linklayer 'none'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option eqdisc_opts 'dual-dsthost'
option upload '18944'
option qdisc 'cake'
option script 'layer_cake.qos'
option download '0'
option interface 'eth0.1'
option enabled '1'

config queue 'eth2'
option ingress_ecn 'ECN'
option itarget 'auto'
option etarget 'auto'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option interface 'eth0.2'
option upload '18944'
option qdisc 'cake'
option script 'layer_cake.qos'
option eqdisc_opts 'dual-srchost'
option download '0'
option enabled '1'
option linklayer 'ethernet'
option overhead '12'

tc -d qdisc

qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc noqueue 0: dev br-sqm root refcnt 2
qdisc cake 802b: dev eth0.1 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-dsthost rtt 100.0ms raw
qdisc cake 802d: dev eth0.2 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-srchost rtt 100.0ms raw
linklayer ethernet overhead 12
qdisc mq 0: dev wlan0 root
qdisc fq_codel 0: dev wlan0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev wlan0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev wlan0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev wlan0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn

tc -s qdisc

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 294566352341 bytes 428909789 pkt (dropped 0, overlimits 0 requeues 61)
backlog 0b 0p requeues 61
maxpacket 1514 drop_overlimit 0 new_flow_count 496 ecn_mark 1
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-sqm root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 802b: dev eth0.1 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-dsthost rtt 100.0ms raw
Sent 4117621179 bytes 4669927 pkt (dropped 51697, overlimits 5501306 requeues 0)
backlog 0b 0p requeues 0
memory used: 1538208b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 0us 13.9ms 427us
av_delay 0us 2.9ms 58us
sp_delay 0us 11us 12us
pkts 0 4696334 25290
bytes 0 4188830147 1533464
way_inds 0 378899 0
way_miss 0 155289 103
way_cols 0 0 0
drops 0 51697 0
marks 0 144 0
sp_flows 0 2 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 460

qdisc cake 802d: dev eth0.2 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-srchost rtt 100.0ms raw
Sent 1549530083 bytes 3993040 pkt (dropped 3802, overlimits 1695814 requeues 0)
backlog 0b 0p requeues 0
memory used: 401184b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 78.1ms 417us 237us
av_delay 8.6ms 23us 20us
sp_delay 8us 9us 9us
pkts 14841 3915078 66923
bytes 9174840 1510343623 35764375
way_inds 0 287762 0
way_miss 6 167085 123
way_cols 0 0 0
drops 459 2571 772
marks 0 49 0
sp_flows 0 0 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 1526 1526 1526

qdisc mq 0: dev wlan0 root
Sent 5486303 bytes 19662 pkt (dropped 0, overlimits 0 requeues 204)
backlog 0b 0p requeues 204
qdisc fq_codel 0: dev wlan0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 66106 bytes 504 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 5420197 bytes 19158 pkt (dropped 0, overlimits 0 requeues 204)
backlog 0b 0p requeues 204
maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0

Again, many thanks for your support!

Lorenzo

and

and

As I expected, you are not using cake's overhead compensation, but tc's stab option. That is not bad in itself (it actually is quite fine), but stab does not account for the amount of overhead the kernel automatically adds for ethernet interfaces, namely 14 bytes (6 dst mac, 6 src mac, 2 ethertype). In essence this expands your specified overhead of 12 bytes into a more reasonsable 26 bytes. Why do I say 12 bytes is unreasonable? Because ethernet overhead alone takes more than that...

Best Regards

Hi moeller0,
I've noticed some strange behaviours during my tests, so I reverted to LinkLayerAdaptation=none to keep it simple. I also disabled ingress_ecn for LAN iface to reflect single cake setup' defaults.

/etc/config/sqm

config queue 'eth1'
option itarget 'auto'
option etarget 'auto'
option linklayer 'none'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option qdisc_really_really_advanced '1'
option eqdisc_opts 'dual-dsthost'
option qdisc 'cake'
option download '0'
option interface 'eth0.1'
option upload '16384'
option enabled '1'
option squash_dscp '0'
option squash_ingress '0'
option egress_ecn 'NOECN'
option ingress_ecn 'NOECN'
option script 'layer_cake.qos'

config queue 'eth2'
option itarget 'auto'
option etarget 'auto'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option qdisc_really_really_advanced '1'
option interface 'eth0.2'
option qdisc 'cake'
option eqdisc_opts 'dual-srchost'
option download '0'
option upload '16384'
option enabled '1'
option linklayer 'none'
option egress_ecn 'NOECN'
option ingress_ecn 'ECN'
option script 'layer_cake.qos'

Results:

rrul

rrul_double_piece_of_cake

rrul_cs8

rrul_cs8_double_piece_of_cake

Same settings for both test except queue setup script, but opposite ping results, could it be related to squash_dscp / sqash_ingress settings?
If I understand correctly your post here piece_of_cake and layer_cake should have different defaults, is it correct?

Many thanks
Lorenzo

Quick question, I was just trying to understand your request, when the data ans the request disappeared. I hope you found a solution to your question. If so, would you mind sharing that, if you want per PM.

Best Regards

Hi moeller0,
the post is back again.
I double checked and found a misconfigured SQUASH setting that was causing that strange behaviour:

Hi moeller0,
given that I'm not a network admin and until a few weeks days hours ago I didn't know anything about QoS, DSCP, ECN etc.. I'm trying to understand how different cake scripts and options affect traffic shaping.

I summarise what I think I've understood:

I can't take advantage of layer_cake because I use this transparent approach and the main router that eventually sets DS field is after the cake-box. I could only rely on applications setting DS bits at the source.

Layer_cake is also heavier and it introduces a bit of latency trying to shape into different queues, so the best approach in my scenario is piece_of_cake and squash/ignore DSCP on ingress, is it correct?

I also have a couple of questions about flent tests:

  • What's the difference between the rrul and rrul_cs8 test? which one is better suited during troubleshoot?
  • Do you have a good example of box_ping test using layer_cake? Just to know how it looks like.

Thank you very much.

Lorenzo

But that is the charming idea about DSCP markings, ideally the end points of a connection set them and the intermediary networks either honor or ignore those. Unfortunately in reality what often happens is that intermediary networks actually re-map the DSCP fields to different values for their internal usage. But the idea is very much that the applications are the ones requesting a specific DSCP and it is up to the network to either honor or ignore this. Some applications actually set meaningful DSCPs already (I believe ssh does) so layer_cake might improve things even if only the egress packets have meaningful markings...

For ingress you would need to run a few packet captures to figure out whether you want to trust the incoming DSCPs or not, if you do set both squash_dscp and squash_ingress to 0. (The first instructs cake to remap the DSCPs to all zero, the default for the TOS field that is universally interpreted as best effort).

Not necessarily, yes layer_cake is more computationally expensive, but I have not quantified how much more expensive, and it might still do the right thing for you assuming your internal applications set the "correct" DSCPs. I guess you need to try it out?

rrul uses four flows per direction all with different DSCP markings. rrul_cs8 uses 8 flows per direction each using one of the the 8 dscp class selector (CS) markings. So rrul_cs8 will simply offer more flows and will also sample the priority band strategy of the whole end-to-end link a bit better. For fast links having 8 instead of 4 flows will make the measured total bandwidth come closer to the real limit (more interleaving of the different TCPs probing for the bandwidth limit).
I like the CS system as it a)only uses 3 of the 6 DSCP bits and b) I strongly believe that 8 different priority bands should be sufficient for most home users. (Heck many ISPs use the 3 priority bits in the VLAN tags so do just fine with just 8 priority bands and wifi/wmm uses just four different priority classes; so the full 6 bit of dscp markings seem quite overkill. I also would love if everybody would agree to split the 6 bits into two groups of three each, one group for the endpoints to code their intention, and one group for each intermediary network to use for real, that way at least the intention would be carried end-to-end; but this is just a dream).

Not at the moment, also I typically see more effects in local tests that when I go though the internet. I will see whether I can create one later. (Typically I see a stronger effect on the bandwidth, as the latency probes are still sparse and will be typically be boosted in comparison to the bulk TCP packets in each of the priority bands that cake uses, so the pings are often flat even though the bandwidth graphs show differences)...

Best Regards

1 Like

Thank you moeller0, your answers are very useful for me to understand how things works!

Just to be sure:
ingress -> cake on wan iface
egress -> cake on lan iface

There isn't a squash_egress option, right?

For now this is beyond my skills :wink:

I've already tried, and in every test layer_cake showed higher avg (few ms) and peak (CS1_BK up to more than double) ping values. I didn't report all my result, because I'm not sure I'm testing correctly.
In fact I'm almost sure I always miss something! :slight_smile:

I'm also having problem with the various DSCP marking standard, understanding how they overlap and/or work together.

while (!fully_understand) { try(); fail(); learn(); } :grinning:

Best regards
Lorenzo

I had forgotten about your exact topology, but I meant packets that pass from your internal network to the internet. But now that you remind me, cake will naturally be attached to the egress side of an interface (for ingress shaping you need the ifb device), so in your case the shaper on the wan side effectively shapes egress, and the one on the LAN egress side handles packets coming from the internet. I hope this clears things up?

With a heavy emphasis on "now", you are making a lot of progress in understanding these things in a very short amount of time (it took me way longer).

Well, CS1_BK is the background "scavenger" class, so it is intended to only use up left-over bandwidth and yield quickly to more important packets, so higher RTT values for probes marked CS1 is to be expected and just shows things to be working as intended. I would be more interested in the relative RTTs of the other classes. Could you maybe post the "all" plots here, as they allow a decent first glimpse into the general sqm performance?

Wellcome to the club :wink: As far as I can tell eveybody nowadays hates strict precedence, but other than that there is not a really strict consensus what to use when. There are some heuristics based on some DSCP markings that actually are used in the wild (e.g. by VoIP applications and VoIP servers), but all in all it is a mess. IMHO not to the least because the DSCP bits are not guaranteed to be stable end-to-end, instead they are free for everybody to use and (re-)set when ever they please. That said, on the egress side you have full control (well potentially) over which applications use which markings (I believe in windows that can be set with a group policy, so might not need to be configured explicitly ion each machine, but zero actual erperience myself) and how the AQM interprets those... (Okay, by using cake you will need to make your applications use those DSCP markings that cake actually handles...)

Best Regrds

I understand that ingress is traffic coming from the interface to the router and egress is traffic from the router to the interface.

Summary

Simple sqm setup (cake on wan interface shaping both ingress and egress)

wan ingress -> download
wan egress -> upload

dual setup (lan & wan in transparent bridge configuration)

wan egress -> upload
lan egress -> download

Let's start with the doubts: how to apply ECN and DSCP setting in my scenario.
Having cake on 2 interfaces means a lot of possible combinations and this is causing me headache :wink:

In the simple scenario I have 4 combinations for ECN, in the dual setup they could be 16, but I'm almost sure many doesn't have any real sense, so I'd like to know which are the good ones.

For example:
simple setup: ingress ECN, egress NOECN (default)
what is the equivalent for dual setup:
WAN ingress ECN, WAN egress NOECN, LAN ingress NOECN, LAN egress NOECN
or WAN ingress ECN, WAN egress ECN, lan ingress NOECN, LAN egress NOECN

and if I want ECN enabled both for ingress and for egress?

The DSCP settings are even more confusing:
there's only the ingress option both for squash and ignore, while I'm using only egress..
Also, setting ignore = 1 and squash = 0 or viceversa has any sense?

Maybe these are silly questions, but for me is important to know which tests does have relevance.

Yes, of course, but with piece_of_cake I get 35ms median and avg for all 8 streams, while with layer_cake I don't have any gain, but worse CS1 results..maybe it's caused by some weird settings :slight_smile: Here is a zip with some flent result: https://file.io/FAPVZF

Is it possible that rrul and rrul_cs8 would show different results because of ISP honoring some DSCP markings, but not CS?

Many thanks
Lorenzo

Oh, that one is simple, ideally one uses ECN in both directions (any client not desiring to use ECN can simply be configured to not use it), that means you simply set both shapers to use outbound ECN:

Explicit congestion notification (ECN) status on outbound packets (egress).: ECN

Since you have no shaper on the ingress leg of both instances it does not matter what you put into the equivalent "inbound ECN" field.
UNLESS you really do not want to use ECN signalling then you set both to NOECN. The defaults with inbound ECN and outbound NOECN are there as on really slow links it is more reactive to drop such a packet; in inbound however the packet already traversed the bottleneck so it you already payed the serialisation delay for that transfer, you might as well keep the packet then (also I assume in that case the feedback loop via ECN is faster than if the packet is dropped).

So these are slightly different in scope:
IGNORE: just instructs sqm to use the besteffort shaper model which just has one priority tier and does not look at the packets dscp fields at all; use this if you want to preserve the dscp fields but do not intend to act upon them.
SQUASH: actually re-map the DSCP field in all traversing packets to ZERO, which basically denotes best effort and is the old default. Use this option if you do not want stray DSCP markings from the internet to enter your network (some/many wifi adapters map from DSCPs to AC and that might or might not be what you want). If you also want to clean your outgoing packets from your internal dscp markings (to not leak information about your setup) you could also enable squash manually on egress.

So to summarize, sqm-scripts defaults to using 3 priority tiers, if you do not want this add besteffort to the field named:
"Advanced option string to pass to the egress queueing disciplines; no error checking, use very carefully." (further AdvancedEgressOptions)
But note piece_of_cake basucally just does that, it sets both ingress and egress to besteffort.
If you also want to clean DSCP marks on an egress interface add wash to the AdvancedEgressOptions.

I am not sure whether that is not really to be expected, the background tier, after all is yielding almost all bandwidth under saturating loads (like rrul or rrul_cs8) so it will also have a relatively larger serialisation delay, the other two classes should have sufficient bandwidth to not show this issue. I guess if you use diffserv8 instead of besteffort in AdvancedEgressOptions you might see more differences... From sch_cake.c to get a feeling which CSs will map to which priority tier (or tin in cake parlance):
/* Pruned list of traffic classes for typical applications:
*
* Network Control (CS6, CS7)
* Minimum Latency (EF, VA, CS5, CS4)
* Interactive Shell (CS2, TOS1)
* Low Latency Transactions (AF2x, TOS4)
* Video Streaming (AF4x, AF3x, CS3)
* Bog Standard (CS0 etc.)
* High Throughput (AF1x, TOS2)
* Background Traffic (CS1)
*
* Total 8 traffic classes.
*/

That link does not seem to work for me, sorry.

Potentially, but "honoring" might be the wrong concept here (any ISP actually usinf DSCP in its own network should make sure no stray markings enter their network at all; but many ISPs seem to use VLAN priority (as switches can honor that) or might be using MPLS and hence will simply not look into the IP header for most of their network).

I need some time to read, understand and run new tests, meanwhile here is the correct link: https://we.tl/AbUQVFfi6S

Best regards
Lorenzo