Pre-Test Configuration

Onload User Guide (UG1586)

Document ID
UG1586
Release Date
2023-07-31
Revision
1.2 English

The following configuration options are applicable to RHEL7 systems.

First, set some configuration options that decrease latency for Onload acceleration technologies. On both machines:

  1. Add the following options to the kernel configuration line in /boot/grub/grub.conf:
    isolcpus=<comma separated cpu list> nohz=off iommu=off intel_iommu=off mce=ignore_ce nmi_watchdog=0
  2. Stop the following services on the server:
    systemctl stop cpupower
    systemctl stop cpuspeed
    systemctl stop cpufreqd
    systemctl stop powerd
    systemctl stop irqbalance
    systemctl stop firewalld
  3. Allocate huge pages. For example, to configure 1024 huge pages:
    # sysctl -w vm.nr_hugepages=1024

    To make this change persistent, update /etc/sysctl.conf. For example:

    # echo "vm.nr_hugepages = 1024" >> /etc/sysctl.conf

    For more information refer to Allocating Huge Pages.

  4. Consider the selection of the NUMA node, as this affects latency on a NUMA-aware system. Refer to Onload Deployment on NUMA Systems.
  5. Disable interrupt moderation.
    # ethtool -C <interface> rx-usecs 0 adaptive-rx off
  6. Enable PIO in the Onload environment.

    EF_PIO=1

Now perform the following configuration to improve latency without Onload.

Note: These configuration changes have minimal effect on the performance of Onload.
  1. Set interrupt affinity such that interrupts and the application are running on different CPU cores but on the same processor package.
    1. Use the following command to identify the interrupts used by the receive queues created for an interface:
      # cat /proc/interrupts | grep <interface>

      The output lists the IRQs. For example:

      34:    ...   PCI-MSI-edge      p2p1-0
      35:    ...   PCI-MSI-edge      p2p1-1
      36:    ...   PCI-MSI-edge      p2p1-2
      37:    ...   PCI-MSI-edge      p2p1-3
      38:    ...   PCI-MSI-edge      p2p1-ptp
    2. Direct the listed IRQs to unused CPU cores that are on the same processor package as the application. For example, to direct IRQs 34-38 to CPU core 2 (where cores are numbered from 0 upwards), using bash:
      # for irq in {34..38}
      > do
      > echo 04 > /proc/irq/$irq/smp_affinity
      > done
  2. Set an appropriate tuned profile:
    • The tuned network-latency profile produces better kernel latency results:
      # tuned-adm profile network-latency
    • If available, the cpu-partitioning profile includes the network-latency profile, but also makes it easy to isolate cores that can be dedicated to interrupt handling or to an application. For example, to isolate cores 1-3:
      # echo "isolated_cores=1-3" \
           > /etc/tuned/cpu-partitioning-variables.conf
      # tuned-adm profile cpu-partitioning
  3. Enable the kernel “busy poll” feature to disable interrupts and allow polling of the socket receive queue. The following values are recommended:
    # sysctl net.core.busy_poll=50 && sysctl net.core.busy_read=50