LibreQoS Backend Architecture and Queueing Design
This page explains how LibreQoS backend systems fit together at runtime:
Data-path packet handling (
XDP/eBPF->tc-> queue tree)Queue hierarchy design (
mq+HTB+ leaf qdiscs)AQM behavior (
fq_codel,CAKE) and why benefits can appear below strict line rateControl-path updates (Scheduler,
lqosd, Bakery, incremental vs full reload)Practical design boundaries for operators
For a full queueing deep-dive, see HTB + fq_codel + CAKE: Detailed Queueing Behavior.
Source Context
This page incorporates details from our devblog posts:
1) Backend Mental Model
LibreQoS has two cooperating planes:
Data plane:
classify packets quickly
map packets to queue classes
enforce fairness and latency behavior at line-rate
Control plane:
compute desired state from
network.jsonandShapedDevices.csvapply the smallest safe set of queue changes
avoid unnecessary reload churn
In production terms:
XDP/eBPFand lookup maps decide packet identity and CPU path.Linux traffic control (
tc) enforces queueing policy.Bakery manages update deltas and reload boundaries.
Queue mode (
shapevsobserve) controls whether the subscriber shaping tree is active or intentionally removed for baseline measurement.
2) Runtime Invariants
These invariants are useful for reasoning about whether backend behavior is healthy.
Invariant |
Why it matters |
Symptom when broken |
First check |
|---|---|---|---|
Every shaped circuit maps to a valid hierarchy parent |
No parent means no effective queue placement |
Subscribers appear unshaped or bypass intended limits |
Validate parent relationships in |
Multi-queue root assumptions match NIC/runtime reality |
CPU distribution depends on queue model consistency |
One core saturated while others idle, unstable shaping at load |
Verify NIC queue model and |
Data-plane mapping is stable between XDP and |
Mis-mapped packets cause wrong queue assignment |
Unexpected class counters, mis-accounted traffic |
Compare expected class IDs vs observed |
Control-plane changes stay within incremental-safe bounds |
Reduces disruption from full-tree rebuilds |
Frequent packet-impacting reload windows |
Review change patterns: structural vs speed/mapping updates |
Queue count stays within CPU/RAM budget |
Leaf qdisc scale directly impacts resources |
Memory growth, reload slowdowns, jitter under churn |
Track queue count, RAM headroom, and update cadence |
Queue mode and dataplane mappings are sequenced consistently |
Packets must not be steered toward removed queue state |
Brief outages, stale class targets, mis-accounted traffic |
Check queue mode, IP mapping lifecycle, and live |
4) End-to-End Packet and Control Path
flowchart LR
subgraph DP[Data Plane]
A[Ingress Packet] --> B[XDP parse: VLAN/PPPoE/IP/ports]
B --> C[Flow cache and/or LPM mapping]
C --> D[CPU steering via cpumap]
D --> E[Metadata handoff to tc classifier]
E --> F[tc class selection]
F --> G[mq root]
G --> H[HTB hierarchy]
H --> I[Leaf qdisc: CAKE or fq_codel]
I --> J[Egress]
end
subgraph CP[Control Plane]
K[Scheduler inputs\nnetwork.json + ShapedDevices.csv] --> L[Desired state + command buffer]
L --> M[lqosd command bus]
M --> N[Bakery diff engine]
N --> O[Incremental tc updates]
N --> P[Controlled full reload]
end
N -. updates queue state .-> H
5) Data Plane Design
5.1 XDP and classification pipeline
LibreQoS performs early packet work in XDP where possible:
Parse packet headers once.
Resolve identity (flow/cache/LPM path).
Attach mapping metadata for downstream stages.
A key optimization direction has been reducing repeated lookups:
use hot-cache hits for active addresses/flows
fall back to LPM when needed
avoid duplicate work between XDP and
tcwhen metadata can be passed forward
5.2 Why cpumap is central
cpumap is used to spread work across cores so shaping does not bottleneck on a single queue path. This is a major part of scaling from “works” to “works at ISP traffic levels”.
5.3 Cache, generation, and lock-pressure reduction
The high-level progression described in development notes/devblogs:
reduce full map wipes
move toward generation/epoch-style stale handling
reduce lock-heavy maintenance on hot paths
Operationally, this helps stabilize latency and CPU under frequent updates.
5.4 Mapping state is part of the data-path contract
Queue classes and IP mappings are related but not identical backend state.
The shaping tree defines where packets can land in
tc.IP mappings define which circuit/class packets are currently steered toward.
During ordinary
shape -> shapeupdates, LibreQoS tries to preserve stable mappings and stable queue handles where safe.During disruptive transitions, especially
observe <-> shape, LibreQoS can intentionally clear and later republish mappings so packets are not pointed at queue state that no longer exists.
This sequencing matters as much as the queue commands themselves. A healthy backend design must reason about queue-tree changes and mapping changes together.
6) Queue Hierarchy: mq -> HTB -> leaf qdisc
LibreQoS queueing is intentionally layered:
mqroot for multi-queue distributionHTBfor hierarchical rate envelopesleaf qdisc (
CAKEorfq_codel) for fairness/AQM behavior inside each envelope
flowchart TD
A[mq root qdisc] --> B[HTB parent class CPU/RXQ 0]
A --> C[HTB parent class CPU/RXQ 1]
B --> D[HTB topology class: Site/AP/POP]
D --> E[HTB circuit class: Subscriber/Circuit]
E --> F[Leaf qdisc: CAKE or fq_codel]
C --> G[HTB topology class: Site/AP/POP]
G --> H[HTB circuit class: Subscriber/Circuit]
H --> I[Leaf qdisc: CAKE or fq_codel]
F --> J[Shaped egress packets]
I --> J
6.1 HTB internals that matter in production
Important mechanics:
Tokens: each packet consumes tokens based on size.
Refill timing: token refill follows kernel timing (
jiffies).quantum: bytes served before scheduler rotates class focus.r2q: influences derived quantum defaults and behavior.
Why operators care:
very small or very large quantum values can affect fairness smoothness
parent/child shaping relationships matter more than single-class tuning folklore
HTB is the rate envelope; leaf qdiscs are not a drop-in replacement for HTB policy
7) AQM in LibreQoS: fq_codel and CAKE
7.1 Responsibility split
Practical split:
HTB: hierarchical bandwidth policy and limits
fq_codel/CAKE: queue fairness and delay control within that policy
7.2 Why AQM still helps even below line-rate saturation
Even when a link is not saturated, fq_codel and CAKE still affect packet behavior. The drop logic (CoDel/BLUE/COBALT) usually stays idle, but the fair-queueing scheduler continues to interleave flows, preventing bursts from one flow from monopolizing the serialization point of the link. This keeps latency-sensitive traffic responsive.
Let’s think about the way these work:
A packet is enqueued for sending (from any source). It goes into TC and is matched to the SQM qdisc.
A flow key is generated (and the tin determined if using CAKE with diffserv). The enqueue time is recorded so we know how long the packet has been waiting.
With both fq_codel and CAKE, the packet is now sitting in a queue specific to that flow (how specific depends on configuration and hashing).
Now dequeue happens – the interface indicates that it can accept more packets. When the link is not saturated, this tends to happen quickly.
fq_codel and CAKE then schedule packets between flows (conceptually round-robin using deficit scheduling; CAKE also applies tin priority).
At dequeue time, the sojourn time (time spent in the queue) is evaluated. In congested conditions this can trigger the CoDel or BLUE/COBALT drop logic. However, when the link is not saturated those mechanisms are rarely active.
However, the flow queueing itself still matters.
Packets are drawn from multiple flow queues in a fair order before reaching the device queue. At the physical layer the link ultimately serializes traffic one bit at a time, so the order packets reach that serialization point still affects latency behavior.
Even without sustained congestion:
Responsiveness of well-behaved flows remains stable. Short control packets (DNS, SSH, TCP ACKs, etc.) are less likely to get stuck behind bursts from another flow.
Burstiness is reduced before packets hit the serialization point. Instead of one flow dumping a large burst, flows are interleaved by the scheduler.
So even when the AQM drop logic is mostly idle, the fair-queueing part of SQM is still doing useful work by controlling how packets reach the wire.
7.3 CAKE vs fq_codel in LibreQoS terms
General pattern:
Prefer CAKE when mixed-traffic smoothness and default behavior are the priority.
Prefer fq_codel when queue-count/resource pressure is dominant and observed QoE remains acceptable.
Re-test after major topology or queue-count changes.
Resource reality:
both are flow-aware and keep state
CAKE can have higher memory/CPU footprint in large queue populations
7.4 When below-line-rate gains may be limited
Lower latency under mixed load is common, but not guaranteed in every scenario.
Expect smaller gains when:
The bottleneck is outside the controlled queue path.
Traffic is sparse with little real queue contention.
Upstream/downstream shaping is applied in only one direction while the pain point is the opposite direction.
Hardware constraints force a queue design that cannot maintain enough isolation at peak moments.
Operator takeaway:
Treat AQM gains as a result of queue dynamics and contention control, then validate empirically on your own traffic mix.
8) Bakery and Reload Behavior
Bakery exists to avoid unnecessary queue rebuilds and reduce reload penalties.
High-level flow:
Build desired state.
Diff desired vs active state.
Apply smallest safe delta.
Trigger full reload when a change is outside live-mutation support, or when runtime verification/drift detection marks incremental topology mutation unsafe.
8.1 Lazy queueing and expiration
Key controls:
lazy_queues: defer creating parts of the hierarchy until active use.lazy_expire_seconds: remove inactive queue state after timeout.
Practical effect:
reduced memory overhead for dormant endpoints
lower churn for large but partially active subscriber populations
8.2 Incremental vs reload boundary
Change type |
Usually incremental-safe |
Often requires full reload |
Why |
|---|---|---|---|
Circuit IP-only change |
Yes |
No |
Mapping updates can usually be applied without rebuilding the queue tree |
Circuit SQM-only change |
Yes |
No |
Leaf qdisc kind/parameter changes can usually be applied live |
Circuit/site speed change (subset) |
Yes |
Sometimes |
Depends on structural impact, queue-count pressure, and available class handles |
Ordinary circuit parent move |
Yes |
Sometimes |
Bakery uses staged live migration for common active parent/class moves, including qdisc-handle rotation and final-state verification, but it still escalates to reload if the migration cannot be applied or verified safely |
TreeGuard runtime node virtualization (supported subtree/top-level rebalance path) |
Yes |
Sometimes |
Bakery can apply supported runtime virtualization live, but deferred cleanup, live-state verification failures, or accumulated dirty runtime subtrees can mark |
Bulk all-circuit changes |
Sometimes |
Often |
Scale and transaction/cardinality limits still matter, even with better incremental behavior |
Site add/remove or broader structural topology change |
Rarely |
Yes |
HTB subtree mutation constraints remain much stricter at site/topology level than at per-circuit level |
Add/remove circuits |
Yes (small/moderate) |
Sometimes |
Handle availability, tree size, and diff correctness boundaries |
This table reflects Bakery design behavior and Linux tc mutation constraints discussed in the devblog material.
flowchart TD
A[Config or runtime change arrives] --> B[Build desired or runtime target state]
B --> C{Any effective state change?}
C -->|No| D[No-op]
C -->|Yes| E{Supported live mutation path?}
E -->|No| F[Controlled full reload]
E -->|Yes| G[Apply staged incremental/runtime mutation]
G --> H{Live verification and cleanup safe?}
H -->|Yes| I[Keep incremental state authoritative]
H -->|No| J[Mark reload required and freeze further incremental topology mutation]
J --> F
F --> K[Re-establish single authoritative queue model]
8.3 Queue modes and transition semantics
LibreQoS currently has two explicit queue modes:
shapenormal shaping mode
root
mqpresentHTB hierarchy present
leaf qdiscs present
per-circuit IP mappings present
observetrue-baseline mode
root
mqretainedsubscriber shaping tree removed
per-circuit IP mappings cleared before teardown
mappings republished after returning to
shape
Important operator boundary:
observeis intentionally honest, not hitlessswitching
observe <-> shapecan briefly interrupt traffic because the shaping tree really is removed and later rebuiltthis is different from ordinary
shape -> shapeupdates, where LibreQoS tries to preserve stable handles and queue placement where possible
8.4 Retained-root full reload behavior
Current Bakery full reloads try to avoid unnecessary root churn.
Verify live kernel
tcstate, not just planned state.If the root
mqis healthy and matches the expected layout, retain it.Prune child qdiscs beneath the retained root and verify the subtree is clean.
Rebuild the shaping tree beneath that retained root.
Fall back to root recovery only when retained-root reuse is unsafe or verification fails.
This retained-root strategy reduces avoidable root-level churn, while still preferring explicit recovery when live state is ambiguous.
8.5 Reload boundary quick rules
Prefer frequent small mapping/speed deltas over large structural churn.
Batch topology surgery into planned windows.
Expect higher risk when many circuits and many structure-affecting changes happen together.
Build operations cadence around incremental-safe updates by default.
8.6 Full-reload safety guards
Current Bakery full reloads apply two conservative safety checks before and during large queue rebuilds:
A qdisc preflight estimates planned qdiscs per interface and also separates infrastructure,
cake, andfq_codelleaf qdiscs.That same preflight applies a conservative memory forecast and hard-blocks clearly unsafe full reloads before
tc -batchstarts.During chunked full reload apply, Bakery re-checks host memory at chunk boundaries and aborts the remaining apply if available memory drops below its safety floor.
These guards are intentionally biased toward false positives on large reloads so the system fails early with diagnostics instead of spiraling into an OOM event.
8.7 Runtime safety model
Bakery’s newer runtime-safety direction is intentionally narrow:
Reconcile enough live state to decide whether deferred cleanup is safe.
Detect material drift between Bakery’s intended state and live kernel
tcstate.Stop trusting incremental/runtime mutation once drift is real.
Escalate to a controlled full reload as the recovery path.
This is intentionally not a broad self-healing reconciler. LibreQoS is biased toward:
lightweight cleanup gating for expected lag
explicit
reload requiredescalation on material live-state driftone controlled full reload to re-establish a single authoritative queue model
That design keeps failure handling easier to reason about than trying to incrementally repair arbitrary split-brain queue state.
8.8 Explicit qdisc-handle management
Bakery now treats leaf qdisc handles as persistent runtime state rather than disposable auto-allocation details.
Circuit leaf qdiscs are assigned explicit handle majors and persisted across applies.
Handle assignments rotate when a live mutation changes the effective leaf qdisc kind or qdisc parent.
Full reload planning reserves live handle majors so rebuilds do not collide with surviving kernel state.
Parent-changed live migration is rejected if Bakery detects stale-handle reuse that would make final state ambiguous.
This handle model is one of the mechanisms that makes common live circuit migration safer than earlier Bakery generations.
8.9 Runtime virtualization limits and operator expectations
Current runtime virtualization support is intentionally constrained.
Non-top-level runtime virtualization is limited to same-queue / same-major-domain subtree paths.
Top-level runtime virtualization uses a separate rebalance/promote path and only applies when Bakery can derive a deterministic split.
Runtime operations may remain in
AppliedAwaitingCleanupwhile deferred prune work completes.Runtime operations can become
Dirty; repeated dirty subtree states escalate toreload requiredrather than attempting broad self-healing.
Operator takeaway:
treat runtime virtualization as a narrow live-mutation feature with verification gates
treat
reload requiredas the authoritative signal that Bakery no longer trusts incremental topology mutation
9) Design Boundaries for Operators
9.1 Observability boundaries
Signal |
Strong for |
Weak for |
|---|---|---|
Queue counters and shaping metrics |
Trend diagnosis, congestion behavior, policy validation |
Exact per-packet causal proof |
CAKE/fq_codel drops/marks |
Detecting persistent queue pressure and policy effects |
Full end-to-end application blame assignment |
CPU/RAM and command timing |
Capacity and reload risk planning |
Isolating every microburst source |
9.2 Metric sample semantics and clock domains
Backend metrics do not all come from the same sampling source or clock edge.
Some values are sampled from queue/kernel state.
Some values are sampled from flow telemetry.
Some rollups are built from canonical raw samples and only later rendered as percentages.
Practical implication:
do not casually combine unrelated numerator/denominator sources and assume they describe the same exact second
prefer transporting canonical samples/counts and deriving percentages at presentation time
treat “same field name” and “same clock domain” as separate questions
This matters especially for retransmit, packet, and rate-derived health metrics.
9.3 Capacity risk factors and mitigations
Risk factor |
Typical symptom |
Mitigation |
|---|---|---|
Very high queue counts with CAKE everywhere |
RAM growth and scheduler overhead |
Use |
Frequent full-tree updates |
Brief packet disruption windows |
Increase incremental-safe update usage; batch structural changes, and let Bakery keep ordinary circuit moves incremental where possible |
Incomplete parent mapping in hierarchy |
Subscribers unexpectedly unshaped |
Validate parent relationships in |
Single-queue/weak NIC virtualization behavior |
Poor spread and unstable shaping |
Ensure multi-queue NIC path and verify queue mapping assumptions |
10) Symptom-to-Cause Troubleshooting Matrix
Symptom |
Common backend cause |
First checks |
Typical corrective direction |
|---|---|---|---|
Latency spikes but interface throughput is not fully pegged |
Microburst queue buildup, poor flow isolation, or direction mismatch |
Compare latency vs queue/drop trends; verify both directions are shaped |
Tune leaf qdisc strategy and verify directional shaping design |
One CPU runs hot while others are underused |
Queue steering imbalance or weak multi-queue path |
Inspect CPU utilization and per-class counters by queue branch |
Fix queue mapping assumptions and verify |
Subscribers intermittently appear unshaped |
Parent/hierarchy mapping mismatch |
Validate parent node references and resulting class creation |
Correct hierarchy mappings, then apply and verify class presence |
Frequent short disruption during updates |
Too many full-reload-triggering changes or runtime drift escalation |
Classify recent changes as structural vs incremental, and check for |
Re-batch operations to favor incremental-safe deltas and investigate live-state drift |
RAM growth during scale-up |
Too many active leaf qdiscs or aggressive CAKE footprint |
Measure queue count and memory trends over update windows |
Use lazy queue creation/expiry and consider selective fq_codel use |
Dashboard traffic appears higher than expected user throughput |
Counter scope differs from post-drop forwarded traffic |
Compare dashboard metrics with |
Align runbooks to metric semantics before escalating |
11) Change Validation Workflow
Use this lightweight workflow for backend-impacting changes.
11.1 Pre-change
Classify change type: mapping/speed/structure.
Estimate impact scope: number of affected circuits/classes.
Capture baseline:
latency trend
drop/mark behavior
CPU and RAM headroom
tcclass/qdisc snapshot
11.2 During change
Watch control-plane behavior:
incremental apply vs full reload occurrence
command/runtime warnings or errors
Watch data-plane signals:
queue growth anomalies
directional latency drift
per-class counter discontinuities
11.3 Post-change
Re-check the same baseline signals.
Confirm hierarchy/class presence for changed circuits.
Verify subscriber-facing latency and throughput expectations.
If degraded, rollback or reduce change scope and re-apply in smaller batches.
For observe <-> shape transitions, add two specific checks:
confirm the queue mode actually matches the intended state
confirm per-circuit mappings were cleared or republished at the right phase of the transition
11.4 Minimal command checklist
Adjust device names to your environment.
tc -s qdisc show dev <ifname>
tc -s class show dev <ifname>
journalctl -u lqosd --since "15 min ago"
12) Practical Tuning Sequence
Recommended order:
Validate topology hierarchy and parent mappings first.
Confirm queue counts and memory headroom.
Validate
mq/multi-core spread behavior.Choose CAKE vs fq_codel by observed QoE and resource budget.
Tune update cadence to favor incremental-safe changes.
Re-test after major speed-plan, topology, or integration-cadence changes.
13) Glossary
XDP: earliest high-performance packet hook in Linux.eBPF: in-kernel programmable packet processing.LPM: longest-prefix-match lookup for identity mapping.cpumap: XDP map for steering processing to CPUs.tc: Linux traffic-control subsystem.qdisc: queue discipline object intc.mq: multi-queue root structure.HTB: hierarchical token bucket scheduler/shaper.fq_codel: fair queueing + CoDel delay control.CAKE: integrated shaper/fairness/AQM qdisc.Bakery: LibreQoS state-diff and incremental update subsystem.epoch/generation: state-aging approach used to reduce lock-heavy global clears.