Eliminating Drops

Onload User Guide (UG1586)

Document ID
UG1586
Release Date
2023-07-31
Revision
1.2 English

The performance of networks is impacted by any packet loss. This is especially pronounced for reliable data transfer protocols that are built on top of unicast or multicast UDP sockets.

First check to see if packets have been dropped by the network adapter before reaching the Onload stack. Use ethtool to collect stats directly from the network adapter:

# ethtool -S enps0f0 | grep -E 'drop|discard'
Table 1. Ethtool Drop Counters
Counter Description
rx_noskb_drops Number of packets dropped when there are no further socket buffers to use.
port_rx_nodesc_drops Number of packets dropped when there are no further descriptors in the rx ring buffer to receive them.
port_rx_dp_di_dropped_packets Number of packets dropped because filters indicate the packets should be dropped - this can happen when packets do not match any filter or the matched filter indicates the packet should be dropped.
port_rx_dp_q_disabled_packets Number of packets sent to a queue which does not exist. A small number might be observed following initialization or teardown, a larger number or incrementing number might indicate a mismatch between the size of a VI set and the actual number of VIs.
port_rx_pm_discard_bb_overflow Number of packets discarded due to packet memory buffer overflow.
port_rx_pm_discard_vfifo_full Count of the number of packets dropped because of a lack of main packet memory on the adapter to receive the packet into.
port_rx_pm_discard_mapping Number of packets dropped because they have an 802.1p priority level configured to be dropped.
# ethtool -S enps0f0 | grep drop
     rx_noskb_drops: 0
     port_rx_nodesc_drops: 0
     port_rx_dp_di_dropped_packets: 681618610

Solution

The most common cause for this is the application being descheduled. You can detect this using the scheduling statistics from cat /proc/<pid>/sched for the application. The nr_involuntary_switches counter records the number of times the process was descheduled, for example because of an interrupt handler or another task running on the same CPU core.You should ensure that the application CPU cores are isolated to avoid descheduling. If it is not possible to isolate the cores, consider switching to interrupt mode.

If packet loss is observed at the network level due to a lack of receive buffering try increasing the size of the receive descriptor queue size via EF_RXQ_SIZE. If packet drops are observed at the socket level consult the application documentation. It might also be worth experimenting with socket buffer sizes (see EF_UDP_RCVBUF). Setting the EF_EVS_PER_POLL variable to a higher value can also improve efficiency. Refer to Parameter Reference for descriptions of these variables.