Thursday, January 08, 2004

Using Device Polling and More to Improve Packet Capture

I just read a fascinating paper by Luca Deri, author of Ntop, about "Improving Passive Packet Capture: Beyond Device Polling" (.pdf). Luca claims that out of the box, Windows 2000 performs better as a traffic collection platform under high loads (~80 Kpps), capturing 68% of traffic compared to 34% for FreeBSD and 0.2% for Linux kernel 2.4.x. Linux's performance improves to 1% if the mmap libpcap version is used, and up to 4% if a Netfilter-based loadable kernel module is used. These percentages sound off to me. Luca explains the results:

"An explanation for the poor performance figures is something called interrupt livelock. Device drivers instrument network cards to generate an interrupt whenever the card needs attention (e.g. for informing the operating system that there is an incoming packet to handle). In case of high traffic rate, the operating system spends most of its time handling interrupts leaving little time for other tasks. A solution to this problem is something called device polling."

FreeBSD has had device polling available in the kernel since FreeBSD 4.5 REL, but you need to recompile the kernel and use a polling-aware NIC. From /usr/src/sys/i386/conf/LINT on my 4.9 REL box:

# DEVICE_POLLING adds support for mixed interrupt-polling handling
# of network device drivers, which has significant benefits in terms
# of robustness to overloads and responsivity, as well as permitting
# accurate scheduling of the CPU time between kernel network processing
# and other activities. The drawback is a moderate (up to 1/HZ seconds)
# potential increase in response times.
# It is strongly recommended to use HZ=1000 or 2000 with DEVICE_POLLING
# to achieve smoother behaviour.
# Additionally, you can enable/disable polling at runtime with the
# sysctl variable kern.polling.enable (defaults off), and select
# the CPU fraction reserved to userland with the sysctl variable
# kern.polling.user_frac (default 50, range 0..100).
# Only the "dc" "fxp" and "sis" devices support this mode of operation at
# the time of this writing.


However, the man page says other cards are supported, including the important em Gigabit Ethernet driver.

Polling requires explicit modifications to the device drivers. As of
this writing, the dc(4), em(4), fxp(4), rl(4), and sis(4) devices are
supported, with other in the works. The modifications are rather
straightforward, consisting in the extraction of the inner part of the
interrupt service routine and writing a callback function, *_poll(),
which is invoked to probe the device for events and process them. See
the conditionally compiled sections of the devices mentioned above for
more details.

According to Luca, the new Linux 2.6 kernel supports polling as well. He re-ran his tests against Linux kernel 2.6 and FreeBSD 4.8 (polling enabled). Linux with standard libpcap now captured 5.6% of traffic while FreeBSD captured 99.9%. (That's my boy!) When Linux implemented capture using a kernel module, Linux's performance matched FreeBSD at 99.5%.

Not happy with this outcome, Luca modified the Linux Gigabit driver and wrote patches to create a ring-buffer based version of libpcap. He found this works much better and plans to port it to FreeBSD as well.

Update: I asked the community to provide opinions on this paper. You can read the thread at The netbsd-tech-kern list hotly debated this paper in this thread.

Update: On 20 Feb 04 Luca released a significantly modified version of the paper at the same URL. The big issue for FreeBSD seems to be poor performance with higher packet sizes.