LibreQoS Backend Architecture and Queueing Design

This page explains how LibreQoS backend systems fit together at runtime:

Data-path packet handling (XDP/eBPF -> tc -> queue tree)
Queue hierarchy design (mq + HTB + leaf qdiscs)
AQM behavior (fq_codel, CAKE) and why benefits can appear below strict line rate
Control-path updates (Scheduler, lqosd, Bakery, incremental vs full reload)
Practical design boundaries for operators

For a full queueing deep-dive, see HTB + fq_codel + CAKE: Detailed Queueing Behavior.

Source Context

This page incorporates details from our devblog posts:

1) Backend Mental Model

LibreQoS has two cooperating planes:

Data plane:
- classify packets quickly
- map packets to queue classes
- enforce fairness and latency behavior at line-rate
Control plane:
- compute desired state from network.json and ShapedDevices.csv
- apply the smallest safe set of queue changes
- avoid unnecessary reload churn

In production terms:

XDP/eBPF and lookup maps decide packet identity and CPU path.
Linux traffic control (tc) enforces queueing policy.
Bakery manages update deltas and reload boundaries.
Queue mode (shape vs observe) controls whether the subscriber shaping tree is active or intentionally removed for baseline measurement.

2) Runtime Invariants

These invariants are useful for reasoning about whether backend behavior is healthy.

Invariant	Why it matters	Symptom when broken	First check
Every shaped circuit maps to a valid hierarchy parent	No parent means no effective queue placement	Subscribers appear unshaped or bypass intended limits	Validate parent relationships in `network.json` and device input
Multi-queue root assumptions match NIC/runtime reality	CPU distribution depends on queue model consistency	One core saturated while others idle, unstable shaping at load	Verify NIC queue model and `mq`/class layout with `tc` output
Data-plane mapping is stable between XDP and `tc`	Mis-mapped packets cause wrong queue assignment	Unexpected class counters, mis-accounted traffic	Compare expected class IDs vs observed `tc -s` class counters
Control-plane changes stay within incremental-safe bounds	Reduces disruption from full-tree rebuilds	Frequent packet-impacting reload windows	Review change patterns: structural vs speed/mapping updates
Queue count stays within CPU/RAM budget	Leaf qdisc scale directly impacts resources	Memory growth, reload slowdowns, jitter under churn	Track queue count, RAM headroom, and update cadence
Queue mode and dataplane mappings are sequenced consistently	Packets must not be steered toward removed queue state	Brief outages, stale class targets, mis-accounted traffic	Check queue mode, IP mapping lifecycle, and live `tc` state together

3) Runtime Authority and Configuration Model

LibreQoS has both on-disk config files and a long-running runtime daemon, but they are not the same authority at every moment.

lqosd is the runtime control-plane authority while it is running.
UI/config API updates go through lqosd, which updates in-memory state and then drives apply/reload behavior.
Manual edits to /etc/lqos.conf are operator-managed source inputs, but they do not automatically become active runtime state until the daemon reload path consumes them.

Operational takeaway:

treat the UI/config API path as the authoritative live-update workflow
treat direct file edits as a separate operator action that still needs a runtime reload/apply boundary
do not assume “file changed” and “runtime desired state changed” are equivalent at the same instant

4) End-to-End Packet and Control Path

        flowchart LR
    subgraph DP[Data Plane]
      A[Ingress Packet] --> B[XDP parse: VLAN/PPPoE/IP/ports]
      B --> C[Flow cache and/or LPM mapping]
      C --> D[CPU steering via cpumap]
      D --> E[Metadata handoff to tc classifier]
      E --> F[tc class selection]
      F --> G[mq root]
      G --> H[HTB hierarchy]
      H --> I[Leaf qdisc: CAKE or fq_codel]
      I --> J[Egress]
    end

    subgraph CP[Control Plane]
      K[Scheduler inputs\nnetwork.json + ShapedDevices.csv] --> L[Desired state + command buffer]
      L --> M[lqosd command bus]
      M --> N[Bakery diff engine]
      N --> O[Incremental tc updates]
      N --> P[Controlled full reload]
    end

    N -. updates queue state .-> H

5) Data Plane Design

5.1 XDP and classification pipeline

LibreQoS performs early packet work in XDP where possible:

Parse packet headers once.
Resolve identity (flow/cache/LPM path).
Attach mapping metadata for downstream stages.

A key optimization direction has been reducing repeated lookups:

use hot-cache hits for active addresses/flows
fall back to LPM when needed
avoid duplicate work between XDP and tc when metadata can be passed forward

5.2 Why `cpumap` is central

cpumap is used to spread work across cores so shaping does not bottleneck on a single queue path. This is a major part of scaling from “works” to “works at ISP traffic levels”.

5.3 Cache, generation, and lock-pressure reduction

The high-level progression described in development notes/devblogs:

reduce full map wipes
move toward generation/epoch-style stale handling
reduce lock-heavy maintenance on hot paths

Operationally, this helps stabilize latency and CPU under frequent updates.

5.4 Mapping state is part of the data-path contract

Queue classes and IP mappings are related but not identical backend state.

The shaping tree defines where packets can land in tc.
IP mappings define which circuit/class packets are currently steered toward.
During ordinary shape -> shape updates, LibreQoS tries to preserve stable mappings and stable queue handles where safe.
During disruptive transitions, especially observe <-> shape, LibreQoS can intentionally clear and later republish mappings so packets are not pointed at queue state that no longer exists.

This sequencing matters as much as the queue commands themselves. A healthy backend design must reason about queue-tree changes and mapping changes together.

6) Queue Hierarchy: `mq` -> `HTB` -> leaf qdisc

LibreQoS queueing is intentionally layered:

mq root for multi-queue distribution
HTB for hierarchical rate envelopes
leaf qdisc (CAKE or fq_codel) for fairness/AQM behavior inside each envelope

        flowchart TD
    A[mq root qdisc] --> B[HTB parent class CPU/RXQ 0]
    A --> C[HTB parent class CPU/RXQ 1]

    B --> D[HTB topology class: Site/AP/POP]
    D --> E[HTB circuit class: Subscriber/Circuit]
    E --> F[Leaf qdisc: CAKE or fq_codel]

    C --> G[HTB topology class: Site/AP/POP]
    G --> H[HTB circuit class: Subscriber/Circuit]
    H --> I[Leaf qdisc: CAKE or fq_codel]

    F --> J[Shaped egress packets]
    I --> J

6.1 HTB internals that matter in production

Important mechanics:

Tokens: each packet consumes tokens based on size.
Refill timing: token refill follows kernel timing (jiffies).
quantum: bytes served before scheduler rotates class focus.
r2q: influences derived quantum defaults and behavior.

Why operators care:

very small or very large quantum values can affect fairness smoothness
parent/child shaping relationships matter more than single-class tuning folklore
HTB is the rate envelope; leaf qdiscs are not a drop-in replacement for HTB policy

7) AQM in LibreQoS: `fq_codel` and `CAKE`

7.1 Responsibility split

Practical split:

HTB: hierarchical bandwidth policy and limits
fq_codel/CAKE: queue fairness and delay control within that policy

7.2 Why AQM still helps even below line-rate saturation

Even when a link is not saturated, fq_codel and CAKE still affect packet behavior. The drop logic (CoDel/BLUE/COBALT) usually stays idle, but the fair-queueing scheduler continues to interleave flows, preventing bursts from one flow from monopolizing the serialization point of the link. This keeps latency-sensitive traffic responsive.

Let’s think about the way these work:

A packet is enqueued for sending (from any source). It goes into TC and is matched to the SQM qdisc.

A flow key is generated (and the tin determined if using CAKE with diffserv). The enqueue time is recorded so we know how long the packet has been waiting.

With both fq_codel and CAKE, the packet is now sitting in a queue specific to that flow (how specific depends on configuration and hashing).

Now dequeue happens – the interface indicates that it can accept more packets. When the link is not saturated, this tends to happen quickly.

fq_codel and CAKE then schedule packets between flows (conceptually round-robin using deficit scheduling; CAKE also applies tin priority).

At dequeue time, the sojourn time (time spent in the queue) is evaluated. In congested conditions this can trigger the CoDel or BLUE/COBALT drop logic. However, when the link is not saturated those mechanisms are rarely active.

However, the flow queueing itself still matters.

Packets are drawn from multiple flow queues in a fair order before reaching the device queue. At the physical layer the link ultimately serializes traffic one bit at a time, so the order packets reach that serialization point still affects latency behavior.

Even without sustained congestion:

Responsiveness of well-behaved flows remains stable. Short control packets (DNS, SSH, TCP ACKs, etc.) are less likely to get stuck behind bursts from another flow.
Burstiness is reduced before packets hit the serialization point. Instead of one flow dumping a large burst, flows are interleaved by the scheduler.

So even when the AQM drop logic is mostly idle, the fair-queueing part of SQM is still doing useful work by controlling how packets reach the wire.

7.3 CAKE vs fq_codel in LibreQoS terms

General pattern:

Prefer CAKE when mixed-traffic smoothness and default behavior are the priority.
Prefer fq_codel when queue-count/resource pressure is dominant and observed QoE remains acceptable.
Re-test after major topology or queue-count changes.

Resource reality:

both are flow-aware and keep state
CAKE can have higher memory/CPU footprint in large queue populations

7.4 When below-line-rate gains may be limited

Lower latency under mixed load is common, but not guaranteed in every scenario.

Expect smaller gains when:

The bottleneck is outside the controlled queue path.
Traffic is sparse with little real queue contention.
Upstream/downstream shaping is applied in only one direction while the pain point is the opposite direction.
Hardware constraints force a queue design that cannot maintain enough isolation at peak moments.

Operator takeaway:

Treat AQM gains as a result of queue dynamics and contention control, then validate empirically on your own traffic mix.

8) Bakery and Reload Behavior

Bakery exists to avoid unnecessary queue rebuilds and reduce reload penalties.

High-level flow:

Build desired state.
Diff desired vs active state.
Apply smallest safe delta.
Trigger full reload when a change is outside live-mutation support, or when runtime verification/drift detection marks incremental topology mutation unsafe.

8.1 Lazy queueing and expiration

Key controls:

lazy_queues: defer creating parts of the hierarchy until active use.
lazy_expire_seconds: remove inactive queue state after timeout.

Practical effect:

reduced memory overhead for dormant endpoints
lower churn for large but partially active subscriber populations

8.2 Incremental vs reload boundary

Change type	Usually incremental-safe	Often requires full reload	Why
Circuit IP-only change	Yes	No	Mapping updates can usually be applied without rebuilding the queue tree
Circuit SQM-only change	Yes	No	Leaf qdisc kind/parameter changes can usually be applied live
Circuit/site speed change (subset)	Yes	Sometimes	Depends on structural impact, queue-count pressure, and available class handles
Ordinary circuit parent move	Yes	Sometimes	Bakery uses staged live migration for common active parent/class moves, including qdisc-handle rotation and final-state verification, but it still escalates to reload if the migration cannot be applied or verified safely
TreeGuard runtime node virtualization (supported subtree/top-level rebalance path)	Yes	Sometimes	Bakery can apply supported runtime virtualization live, but deferred cleanup, live-state verification failures, or accumulated dirty runtime subtrees can mark `reload required` and freeze further incremental topology mutation until a full reload
Bulk all-circuit changes	Sometimes	Often	Scale and transaction/cardinality limits still matter, even with better incremental behavior
Site add/remove or broader structural topology change	Rarely	Yes	HTB subtree mutation constraints remain much stricter at site/topology level than at per-circuit level
Add/remove circuits	Yes (small/moderate)	Sometimes	Handle availability, tree size, and diff correctness boundaries

This table reflects Bakery design behavior and Linux tc mutation constraints discussed in the devblog material.

        flowchart TD
    A[Config or runtime change arrives] --> B[Build desired or runtime target state]
    B --> C{Any effective state change?}
    C -->|No| D[No-op]
    C -->|Yes| E{Supported live mutation path?}
    E -->|No| F[Controlled full reload]
    E -->|Yes| G[Apply staged incremental/runtime mutation]
    G --> H{Live verification and cleanup safe?}
    H -->|Yes| I[Keep incremental state authoritative]
    H -->|No| J[Mark reload required and freeze further incremental topology mutation]
    J --> F
    F --> K[Re-establish single authoritative queue model]

8.3 Queue modes and transition semantics

LibreQoS currently has two explicit queue modes:

shape
- normal shaping mode
- root mq present
- HTB hierarchy present
- leaf qdiscs present
- per-circuit IP mappings present
observe
- true-baseline mode
- root mq retained
- subscriber shaping tree removed
- per-circuit IP mappings cleared before teardown
- mappings republished after returning to shape

Important operator boundary:

observe is intentionally honest, not hitless
switching observe <-> shape can briefly interrupt traffic because the shaping tree really is removed and later rebuilt
this is different from ordinary shape -> shape updates, where LibreQoS tries to preserve stable handles and queue placement where possible

8.4 Retained-root full reload behavior

Current Bakery full reloads try to avoid unnecessary root churn.

Verify live kernel tc state, not just planned state.
If the root mq is healthy and matches the expected layout, retain it.
Prune child qdiscs beneath the retained root and verify the subtree is clean.
Rebuild the shaping tree beneath that retained root.
Fall back to root recovery only when retained-root reuse is unsafe or verification fails.

This retained-root strategy reduces avoidable root-level churn, while still preferring explicit recovery when live state is ambiguous.

8.5 Reload boundary quick rules

Prefer frequent small mapping/speed deltas over large structural churn.
Batch topology surgery into planned windows.
Expect higher risk when many circuits and many structure-affecting changes happen together.
Build operations cadence around incremental-safe updates by default.

8.6 Full-reload safety guards

Current Bakery full reloads apply two conservative safety checks before and during large queue rebuilds:

A qdisc preflight estimates planned qdiscs per interface and also separates infrastructure, cake, and fq_codel leaf qdiscs.
That same preflight applies a conservative memory forecast and hard-blocks clearly unsafe full reloads before tc -batch starts.
During chunked full reload apply, Bakery re-checks host memory at chunk boundaries and aborts the remaining apply if available memory drops below its safety floor.
These guards are intentionally biased toward false positives on large reloads so the system fails early with diagnostics instead of spiraling into an OOM event.

8.7 Runtime safety model

Bakery’s newer runtime-safety direction is intentionally narrow:

Reconcile enough live state to decide whether deferred cleanup is safe.
Detect material drift between Bakery’s intended state and live kernel tc state.
Stop trusting incremental/runtime mutation once drift is real.
Escalate to a controlled full reload as the recovery path.

This is intentionally not a broad self-healing reconciler. LibreQoS is biased toward:

lightweight cleanup gating for expected lag
explicit reload required escalation on material live-state drift
one controlled full reload to re-establish a single authoritative queue model

That design keeps failure handling easier to reason about than trying to incrementally repair arbitrary split-brain queue state.

8.8 Explicit qdisc-handle management

Bakery now treats leaf qdisc handles as persistent runtime state rather than disposable auto-allocation details.

Circuit leaf qdiscs are assigned explicit handle majors and persisted across applies.
Handle assignments rotate when a live mutation changes the effective leaf qdisc kind or qdisc parent.
Full reload planning reserves live handle majors so rebuilds do not collide with surviving kernel state.
Parent-changed live migration is rejected if Bakery detects stale-handle reuse that would make final state ambiguous.

This handle model is one of the mechanisms that makes common live circuit migration safer than earlier Bakery generations.

8.9 Runtime virtualization limits and operator expectations

Current runtime virtualization support is intentionally constrained.

Non-top-level runtime virtualization is limited to same-queue / same-major-domain subtree paths.
Top-level runtime virtualization uses a separate rebalance/promote path and only applies when Bakery can derive a deterministic split.
Runtime operations may remain in AppliedAwaitingCleanup while deferred prune work completes.
Runtime operations can become Dirty; repeated dirty subtree states escalate to reload required rather than attempting broad self-healing.

Operator takeaway:

treat runtime virtualization as a narrow live-mutation feature with verification gates
treat reload required as the authoritative signal that Bakery no longer trusts incremental topology mutation

9) Design Boundaries for Operators

9.1 Observability boundaries

Signal	Strong for	Weak for
Queue counters and shaping metrics	Trend diagnosis, congestion behavior, policy validation	Exact per-packet causal proof
CAKE/fq_codel drops/marks	Detecting persistent queue pressure and policy effects	Full end-to-end application blame assignment
CPU/RAM and command timing	Capacity and reload risk planning	Isolating every microburst source

9.2 Metric sample semantics and clock domains

Backend metrics do not all come from the same sampling source or clock edge.

Some values are sampled from queue/kernel state.
Some values are sampled from flow telemetry.
Some rollups are built from canonical raw samples and only later rendered as percentages.

Practical implication:

do not casually combine unrelated numerator/denominator sources and assume they describe the same exact second
prefer transporting canonical samples/counts and deriving percentages at presentation time
treat “same field name” and “same clock domain” as separate questions

This matters especially for retransmit, packet, and rate-derived health metrics.

9.3 Capacity risk factors and mitigations

Risk factor	Typical symptom	Mitigation
Very high queue counts with CAKE everywhere	RAM growth and scheduler overhead	Use `lazy_queues`, expiry, selective fq_codel where appropriate
Frequent full-tree updates	Brief packet disruption windows	Increase incremental-safe update usage; batch structural changes, and let Bakery keep ordinary circuit moves incremental where possible
Incomplete parent mapping in hierarchy	Subscribers unexpectedly unshaped	Validate parent relationships in `network.json` and input data
Single-queue/weak NIC virtualization behavior	Poor spread and unstable shaping	Ensure multi-queue NIC path and verify queue mapping assumptions

10) Symptom-to-Cause Troubleshooting Matrix

Symptom	Common backend cause	First checks	Typical corrective direction
Latency spikes but interface throughput is not fully pegged	Microburst queue buildup, poor flow isolation, or direction mismatch	Compare latency vs queue/drop trends; verify both directions are shaped	Tune leaf qdisc strategy and verify directional shaping design
One CPU runs hot while others are underused	Queue steering imbalance or weak multi-queue path	Inspect CPU utilization and per-class counters by queue branch	Fix queue mapping assumptions and verify `mq`/class structure
Subscribers intermittently appear unshaped	Parent/hierarchy mapping mismatch	Validate parent node references and resulting class creation	Correct hierarchy mappings, then apply and verify class presence
Frequent short disruption during updates	Too many full-reload-triggering changes or runtime drift escalation	Classify recent changes as structural vs incremental, and check for `reload required` events	Re-batch operations to favor incremental-safe deltas and investigate live-state drift
RAM growth during scale-up	Too many active leaf qdiscs or aggressive CAKE footprint	Measure queue count and memory trends over update windows	Use lazy queue creation/expiry and consider selective fq_codel use
Dashboard traffic appears higher than expected user throughput	Counter scope differs from post-drop forwarded traffic	Compare dashboard metrics with `tc` drop/mark context	Align runbooks to metric semantics before escalating

11) Change Validation Workflow

Use this lightweight workflow for backend-impacting changes.

11.1 Pre-change

Classify change type: mapping/speed/structure.
Estimate impact scope: number of affected circuits/classes.
Capture baseline:
- latency trend
- drop/mark behavior
- CPU and RAM headroom
- tc class/qdisc snapshot

11.2 During change

Watch control-plane behavior:
- incremental apply vs full reload occurrence
- command/runtime warnings or errors
Watch data-plane signals:
- queue growth anomalies
- directional latency drift
- per-class counter discontinuities

11.3 Post-change

Re-check the same baseline signals.
Confirm hierarchy/class presence for changed circuits.
Verify subscriber-facing latency and throughput expectations.
If degraded, rollback or reduce change scope and re-apply in smaller batches.

For observe <-> shape transitions, add two specific checks:

confirm the queue mode actually matches the intended state
confirm per-circuit mappings were cleared or republished at the right phase of the transition

11.4 Minimal command checklist

Adjust device names to your environment.

tc -s qdisc show dev <ifname>
tc -s class show dev <ifname>
journalctl -u lqosd --since "15 min ago"

12) Practical Tuning Sequence

Recommended order:

Validate topology hierarchy and parent mappings first.
Confirm queue counts and memory headroom.
Validate mq/multi-core spread behavior.
Choose CAKE vs fq_codel by observed QoE and resource budget.
Tune update cadence to favor incremental-safe changes.
Re-test after major speed-plan, topology, or integration-cadence changes.

13) Glossary

XDP: earliest high-performance packet hook in Linux.
eBPF: in-kernel programmable packet processing.
LPM: longest-prefix-match lookup for identity mapping.
cpumap: XDP map for steering processing to CPUs.
tc: Linux traffic-control subsystem.
qdisc: queue discipline object in tc.
mq: multi-queue root structure.
HTB: hierarchical token bucket scheduler/shaper.
fq_codel: fair queueing + CoDel delay control.
CAKE: integrated shaper/fairness/AQM qdisc.
Bakery: LibreQoS state-diff and incremental update subsystem.
epoch/generation: state-aging approach used to reduce lock-heavy global clears.