Performance Jitter

Onload User Guide (UG1586)

Document ID

UG1586

Release Date

2023-07-31

Revision

1.2 English

On any system reducing or eliminating jitter is key to gaining optimum performance, however the causes of jitter leading to poor performance can be difficult to define and difficult to remedy. The following section identifies some key points that should be considered.

A first step towards reducing jitter should be to consider the configuration settings specified in the X2 Low Latency Quickstart - this includes the disabling of the irqbalance service, interrupt moderation settings and measures to prevent CPU cores switching to power saving modes.
Use isolcpus to isolate CPU cores that the application - or at least the critical threads of the application will use and prevent OS housekeeping tasks and other non-critical tasks from running on these cores.
Set an application thread running on one core and the interrupts for that thread on a separate core - but on the same physical CPU package. Even when spinning, interrupts can still occur, for example, if the application fails to call into the Onload stack for extended periods because it is busy doing other work.
Ideally each spinning thread will be allocated a separate core so that, in the event that it blocks or is de-scheduled, it will not prevent other important threads from doing work. A common cause of jitter is more than one spinning thread sharing the same CPU core. Jitter spikes might indicate that one thread is being held off the CPU core by another thread.
You can detect this using the scheduling statistics from cat /proc/<pid>/sched for the application threads. The nr_involuntary_switches counter records the number of times the process was descheduled, for example because of an interrupt handler or another task running on the same CPU core.
When EF_STACK_LOCK_BUZZ=1, threads will spin for the EF_BUZZ_USEC period while they wait to acquire the stack lock. Lock buzzing can lead to unfairness between threads competing for a lock, and so result in resource starvation for one. Occurrences of this are counted in the 'stack_lock_buzz' counter. EF_STACK_LOCK_BUZZ is enabled by default when EF_POLL_USEC (spinning) is enabled.
If a multi-thread application is doing lots of socket operations, stack lock contention will lead to send/receive performance jitter. In such cases improved performance can be had when each contending thread has its own stack. This can be managed with EF_STACK_PER_THREAD which creates a separate Onload stack for the sockets created by each thread. For an example see Minimizing Lock Contention.
If separate stacks are not an option then it might be beneficial to reduce the EF_BUZZ_USEC period or to disable stack lock buzzing altogether.
It is always important that threads that need to communicate with each other are running on the same CPU package so that these threads can share a memory cache.
See Onload Deployment on NUMA Systems for more information.
Jitter can also be introduced when some sockets are accelerated and others are not. Onload will ensure that accelerated sockets are given priority over non-accelerated sockets, although this delay will only be in the region of a few microseconds - not milliseconds, the penalty will always be on the side of the non-accelerated sockets. The environment variables EF_POLL_FAST_USEC and EF_POLL_NONBLOCK_FAST_USEC can be configured to manage the extent of priority of accelerated sockets over non-accelerated sockets.
If traffic is sparse, spinning will deliver the same latency benefits, but the user should ensure that the spin timeout period, configured using the EF_POLL_USEC variable, is sufficiently long to ensure the thread is still spinning when traffic is received.
See Spinning, Polling and Interrupts for more information.
When applications only need to send and receive occasionally it might be beneficial to implement a keepalive - heartbeat mechanism between peers. This has the effect of retaining the process data in the CPU memory cache. Calling send or receive after a delay can result in the call taking measurably longer, due to the cache effects, than if this is called in a tight loop.
Some adapters support warming the send path without actually transmitting data. This can similarly retain data in cache and so reduce jitter.
On some servers BIOS settings such as power and utilization monitoring can cause unnecessary jitter by performing monitoring tasks on all CPU cores. The user should check the BIOS and decide if periodic tasks (and the related SMIs) can be disabled.
The sysjitter utility can be used to identify and measure jitter on all cores of an idle system - refer to Sysjitter for details.